LLamaSharp/docs/xmldocs/llama.native.safellamaconte...

529 lines
13 KiB
Markdown

# SafeLLamaContextHandle
Namespace: LLama.Native
A safe wrapper around a llama_context
```csharp
public sealed class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
## Properties
### **VocabCount**
Total number of tokens in vocabulary of this model
```csharp
public int VocabCount { get; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **ContextSize**
Total number of tokens in the context
```csharp
public uint ContextSize { get; }
```
#### Property Value
[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
### **EmbeddingSize**
Dimension of embedding vectors
```csharp
public int EmbeddingSize { get; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **BatchSize**
Get the maximum batch size for this context
```csharp
public uint BatchSize { get; }
```
#### Property Value
[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
### **ModelHandle**
Get the model which this context is using
```csharp
public SafeLlamaModelHandle ModelHandle { get; }
```
#### Property Value
[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
### **IsInvalid**
```csharp
public bool IsInvalid { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **IsClosed**
```csharp
public bool IsClosed { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
## Constructors
### **SafeLLamaContextHandle()**
```csharp
public SafeLLamaContextHandle()
```
## Methods
### **ReleaseHandle()**
```csharp
protected bool ReleaseHandle()
```
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Create(SafeLlamaModelHandle, LLamaContextParams)**
Create a new llama_state for the given model
```csharp
public static SafeLLamaContextHandle Create(SafeLlamaModelHandle model, LLamaContextParams lparams)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
`lparams` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
#### Returns
[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
#### Exceptions
[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
### **GetLogits()**
Token logits obtained from the last call to llama_decode
The logits for the last token are stored in the last row
Can be mutated in order to change the probabilities of the next token.<br>
Rows: n_tokens<br>
Cols: n_vocab
```csharp
public Span<float> GetLogits()
```
#### Returns
[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
### **GetLogitsIth(Int32)**
Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
```csharp
public Span<float> GetLogitsIth(int i)
```
#### Parameters
`i` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
#### Returns
[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
### **Tokenize(String, Boolean, Boolean, Encoding)**
Convert the given text into tokens
```csharp
public LLamaToken[] Tokenize(string text, bool add_bos, bool special, Encoding encoding)
```
#### Parameters
`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
The text to tokenize
`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
Whether the "BOS" token should be added
`special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.
`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
Encoding to use for the text
#### Returns
[LLamaToken[]](./llama.native.llamatoken.md)<br>
#### Exceptions
[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
### **TokenToSpan(LLamaToken, Span&lt;Byte&gt;)**
Convert a single llama token into bytes
```csharp
public uint TokenToSpan(LLamaToken token, Span<byte> dest)
```
#### Parameters
`token` [LLamaToken](./llama.native.llamatoken.md)<br>
Token to decode
`dest` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
A span to attempt to write into. If this is too small nothing will be written
#### Returns
[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
The size of this token. **nothing will be written** if this is larger than `dest`
### **Decode(LLamaBatch)**
```csharp
public DecodeResult Decode(LLamaBatch batch)
```
#### Parameters
`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
#### Returns
[DecodeResult](./llama.native.decoderesult.md)<br>
Positive return values does not mean a fatal error, but rather a warning:<br>
- 0: success<br>
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)<br>
- &lt; 0: error<br>
### **Decode(List&lt;LLamaToken&gt;, LLamaSeqId, LLamaBatch, Int32&)**
Decode a set of tokens in batch-size chunks.
```csharp
internal ValueTuple<DecodeResult, int> Decode(List<LLamaToken> tokens, LLamaSeqId id, LLamaBatch batch, Int32& n_past)
```
#### Parameters
`tokens` [List&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
`id` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
`n_past` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
#### Returns
[ValueTuple&lt;DecodeResult, Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.valuetuple-2)<br>
A tuple, containing the decode result and the number of tokens that have not been decoded yet.
### **GetStateSize()**
Get the size of the state, when saved as bytes
```csharp
public ulong GetStateSize()
```
#### Returns
[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
### **GetState(Byte*, UInt64)**
Get the raw state of this context, encoded as bytes. Data is written into the `dest` pointer.
```csharp
public ulong GetState(Byte* dest, ulong size)
```
#### Parameters
`dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
Destination to write to
`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
Number of bytes available to write to in dest (check required size with `GetStateSize()`)
#### Returns
[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
The number of bytes written to dest
#### Exceptions
[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
Thrown if dest is too small
### **GetState(IntPtr, UInt64)**
Get the raw state of this context, encoded as bytes. Data is written into the `dest` pointer.
```csharp
public ulong GetState(IntPtr dest, ulong size)
```
#### Parameters
`dest` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
Destination to write to
`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
Number of bytes available to write to in dest (check required size with `GetStateSize()`)
#### Returns
[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
The number of bytes written to dest
#### Exceptions
[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
Thrown if dest is too small
### **SetState(Byte*)**
Set the raw state of this context
```csharp
public ulong SetState(Byte* src)
```
#### Parameters
`src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
The pointer to read the state from
#### Returns
[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
Number of bytes read from the src pointer
### **SetState(IntPtr)**
Set the raw state of this context
```csharp
public ulong SetState(IntPtr src)
```
#### Parameters
`src` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
The pointer to read the state from
#### Returns
[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
Number of bytes read from the src pointer
### **SetSeed(UInt32)**
Set the RNG seed
```csharp
public void SetSeed(uint seed)
```
#### Parameters
`seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
### **SetThreads(UInt32, UInt32)**
Set the number of threads used for decoding
```csharp
public void SetThreads(uint threads, uint threadsBatch)
```
#### Parameters
`threads` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
n_threads is the number of threads used for generation (single token)
`threadsBatch` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
### **KvCacheGetDebugView(Int32)**
Get a new KV cache view that can be used to debug the KV cache
```csharp
public LLamaKvCacheViewSafeHandle KvCacheGetDebugView(int maxSequences)
```
#### Parameters
`maxSequences` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
#### Returns
[LLamaKvCacheViewSafeHandle](./llama.native.llamakvcacheviewsafehandle.md)<br>
### **KvCacheCountCells()**
Count the number of used cells in the KV cache (i.e. have at least one sequence assigned to them)
```csharp
public int KvCacheCountCells()
```
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **KvCacheCountTokens()**
Returns the number of tokens in the KV cache (slow, use only for debug)
If a KV cell has multiple sequences assigned to it, it will be counted multiple times
```csharp
public int KvCacheCountTokens()
```
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **KvCacheClear()**
Clear the KV cache
```csharp
public void KvCacheClear()
```
### **KvCacheRemove(LLamaSeqId, LLamaPos, LLamaPos)**
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
```csharp
public void KvCacheRemove(LLamaSeqId seq, LLamaPos p0, LLamaPos p1)
```
#### Parameters
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
`p0` [LLamaPos](./llama.native.llamapos.md)<br>
`p1` [LLamaPos](./llama.native.llamapos.md)<br>
### **KvCacheSequenceCopy(LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)**
Copy all tokens that belong to the specified sequence to another sequence. Note that
this does not allocate extra KV cache memory - it simply assigns the tokens to the
new sequence
```csharp
public void KvCacheSequenceCopy(LLamaSeqId src, LLamaSeqId dest, LLamaPos p0, LLamaPos p1)
```
#### Parameters
`src` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
`dest` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
`p0` [LLamaPos](./llama.native.llamapos.md)<br>
`p1` [LLamaPos](./llama.native.llamapos.md)<br>
### **KvCacheSequenceKeep(LLamaSeqId)**
Removes all tokens that do not belong to the specified sequence
```csharp
public void KvCacheSequenceKeep(LLamaSeqId seq)
```
#### Parameters
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
### **KvCacheSequenceAdd(LLamaSeqId, LLamaPos, LLamaPos, Int32)**
Adds relative position "delta" to all tokens that belong to the specified sequence
and have positions in [p0, p1. If the KV cache is RoPEd, the KV data is updated
accordingly
```csharp
public void KvCacheSequenceAdd(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)
```
#### Parameters
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
`p0` [LLamaPos](./llama.native.llamapos.md)<br>
`p1` [LLamaPos](./llama.native.llamapos.md)<br>
`delta` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **KvCacheSequenceDivide(LLamaSeqId, LLamaPos, LLamaPos, Int32)**
Integer division of the positions by factor of `d &gt; 1`.
If the KV cache is RoPEd, the KV data is updated accordingly.<br>
p0 &lt; 0 : [0, p1]<br>
p1 &lt; 0 : [p0, inf)
```csharp
public void KvCacheSequenceDivide(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int divisor)
```
#### Parameters
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
`p0` [LLamaPos](./llama.native.llamapos.md)<br>
`p1` [LLamaPos](./llama.native.llamapos.md)<br>
`divisor` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>