Merge pull request #141 from SciSharp/rinne-dev

docs: update the docs to follow new version.
2023-09-02 17:57:23 +08:00 · 2023-09-02 17:57:23 +08:00 · b82e9f8fb0
parent 4e83e48ad1 20b5363601
commit b82e9f8fb0
77 changed files with 4970 additions and 596 deletions
--- a/LLama/LLamaSharp.csproj
+++ b/LLama/LLamaSharp.csproj
@ -7,8 +7,8 @@
    <Platforms>AnyCPU;x64;Arm64</Platforms>
    <AllowUnsafeBlocks>True</AllowUnsafeBlocks>

-    <Version>0.4.2</Version>
-    <Authors>Yaohui Liu, Haiping Chen</Authors>
+    <Version>0.5.0</Version>
+    <Authors>Yaohui Liu, Martin Devans, Haiping Chen</Authors>
    <Company>SciSharp STACK</Company>
    <GeneratePackageOnBuild>true</GeneratePackageOnBuild>
    <Copyright>MIT, SciSharp STACK $([System.DateTime]::UtcNow.ToString(yyyy))</Copyright>
@ -21,7 +21,7 @@
      weights to run, please go to https://github.com/SciSharp/LLamaSharp for more information.
    </Description>
    <PackageReleaseNotes>
-      LLamaSharp 0.4.1 followed up the master branch of llama.cpp. (commit id: aacdbd4)
+      LLamaSharp 0.5.0 adds support for GGUF, grammar and integration with semantic-kernel.
    </PackageReleaseNotes>
    <PackageLicenseExpression>MIT</PackageLicenseExpression>
    <PackageOutputPath>packages</PackageOutputPath>
--- a/docs/Architecture.md
+++ b/docs/Architecture.md
@ -4,9 +4,9 @@

 The figure below shows the core framework structure, which is separated to four levels.

- **LLamaModel**: The holder of a model which directly interact with native library and provide some basic APIs such as tokenization and embedding. Currently it includes three classes: `LLamaModel`, `LLamaEmbedder` and `LLamaQuantizer`.
+- **LLamaContext**: The holder of a model which directly interact with native library and provide some basic APIs such as tokenization and embedding. Currently it includes three classes: `LLamaContext`, `LLamaEmbedder` and `LLamaQuantizer`.
 - **LLamaExecutors**: Executors which define the way to run the LLama model. It provides text-to-text APIs to make it easy to use. Currently we provide three kinds of executors: `InteractiveExecutor`, `InstructuExecutor` and `StatelessExecutor`.
- **ChatSession**: A wrapping for `InteractiveExecutor` and `LLamaModel`, which supports interactive tasks and saving/re-loading sessions. It also provides a flexible way to customize the text process by `IHistoryTransform`, `ITextTransform` and `ITextStreamTransform`.
+- **ChatSession**: A wrapping for `InteractiveExecutor` and `LLamaContext`, which supports interactive tasks and saving/re-loading sessions. It also provides a flexible way to customize the text process by `IHistoryTransform`, `ITextTransform` and `ITextStreamTransform`.
 - **High-level Applications**: Some applications that provides higher-level integration. For example, [BotSharp](https://github.com/SciSharp/BotSharp) provides integration for vector search, Chatbot UI and Web APIs. [semantic-kernel](https://github.com/microsoft/semantic-kernel) provides various APIs for manipulations related with LLM. If you've made an integration, please tell us and add it to the doc!


@ -14,7 +14,7 @@ The figure below shows the core framework structure, which is separated to four

 ## Recommended Use

-Since `LLamaModel` interact with native library, it's not recommended to use the methods of it directly unless you know what you are doing. So does the `NativeApi`, which is not included in the architecture figure above.
+Since `LLamaContext` interact with native library, it's not recommended to use the methods of it directly unless you know what you are doing. So does the `NativeApi`, which is not included in the architecture figure above.

 `ChatSession` is recommended to be used when you want to build an application similar to ChatGPT, or the ChatBot, because it works best with `InteractiveExecutor`. Though other executors are also allowed to passed as a parameter to initialize a `ChatSession`, it's not encouraged if you are new to LLamaSharp and LLM.

--- a/docs/HighLevelApps/semantic-kernel.md
+++ b/docs/HighLevelApps/semantic-kernel.md
@ -0,0 +1,3 @@
+# The Usage of semantic-kernel Integration
+
+Please see [this doc](../../LLama.SemanticKernel/README.md)
--- a/docs/index.md
+++ b/docs/index.md
@ -9,6 +9,7 @@ LLamaSharp is the C#/.NET binding of [llama.cpp](https://github.com/ggerganov/ll
 - Model inference
 - Model quantization
 - Generating embeddings
+- Grammar parse
 - Interactive/Instruct/Stateless executor mode
 - Chat session APIs
 - Save/load the state
--- a/docs/media/structure.jpg
+++ b/docs/media/structure.jpg
--- a/docs/media/structure.vsdx
+++ b/docs/media/structure.vsdx
--- a/docs/xmldocs/index.md
+++ b/docs/xmldocs/index.md
@ -8,26 +8,32 @@

 [InteractiveExecutor](./llama.interactiveexecutor.md)

-[LLamaEmbedder](./llama.llamaembedder.md)
+[LLamaContext](./llama.llamacontext.md)

-[LLamaModel](./llama.llamamodel.md)
+[LLamaEmbedder](./llama.llamaembedder.md)

 [LLamaQuantizer](./llama.llamaquantizer.md)

 [LLamaTransforms](./llama.llamatransforms.md)

-[ResettableLLamaModel](./llama.resettablellamamodel.md)
+[LLamaWeights](./llama.llamaweights.md)

 [StatefulExecutorBase](./llama.statefulexecutorbase.md)

 [StatelessExecutor](./llama.statelessexecutor.md)

+[Utils](./llama.utils.md)
+
 ## LLama.Abstractions

 [IHistoryTransform](./llama.abstractions.ihistorytransform.md)

+[IInferenceParams](./llama.abstractions.iinferenceparams.md)
+
 [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)

+[IModelParams](./llama.abstractions.imodelparams.md)
+
 [ITextStreamTransform](./llama.abstractions.itextstreamtransform.md)

 [ITextTransform](./llama.abstractions.itexttransform.md)
@ -46,17 +52,45 @@

 [LLamaDefaultLogger](./llama.common.llamadefaultlogger.md)

-[MiroStateType](./llama.common.mirostatetype.md)
+[MirostatType](./llama.common.mirostattype.md)

 [ModelParams](./llama.common.modelparams.md)

 ## LLama.Exceptions

+[GrammarExpectedName](./llama.exceptions.grammarexpectedname.md)
+
+[GrammarExpectedNext](./llama.exceptions.grammarexpectednext.md)
+
+[GrammarExpectedPrevious](./llama.exceptions.grammarexpectedprevious.md)
+
+[GrammarFormatException](./llama.exceptions.grammarformatexception.md)
+
+[GrammarUnexpectedCharAltElement](./llama.exceptions.grammarunexpectedcharaltelement.md)
+
+[GrammarUnexpectedCharRngElement](./llama.exceptions.grammarunexpectedcharrngelement.md)
+
+[GrammarUnexpectedEndElement](./llama.exceptions.grammarunexpectedendelement.md)
+
+[GrammarUnexpectedEndOfInput](./llama.exceptions.grammarunexpectedendofinput.md)
+
+[GrammarUnexpectedHexCharsCount](./llama.exceptions.grammarunexpectedhexcharscount.md)
+
+[GrammarUnknownEscapeCharacter](./llama.exceptions.grammarunknownescapecharacter.md)
+
 [RuntimeError](./llama.exceptions.runtimeerror.md)

 ## LLama.Extensions

-[DictionaryExtension](./llama.extensions.dictionaryextension.md)
+[IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)
+
+[KeyValuePairExtensions](./llama.extensions.keyvaluepairextensions.md)
+
+## LLama.Grammars
+
+[Grammar](./llama.grammars.grammar.md)
+
+[GrammarRule](./llama.grammars.grammarrule.md)

 ## LLama.Native

@ -64,6 +98,12 @@

 [LLamaFtype](./llama.native.llamaftype.md)

+[LLamaGrammarElement](./llama.native.llamagrammarelement.md)
+
+[LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)
+
+[LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)
+
 [LLamaTokenData](./llama.native.llamatokendata.md)

 [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)
@ -74,8 +114,14 @@

 [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)

+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)
+
 [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md)

+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
+
+[SamplingApi](./llama.native.samplingapi.md)
+
 ## LLama.OldVersion

 [ChatCompletion](./llama.oldversion.chatcompletion.md)
--- a/docs/xmldocs/llama.abstractions.iinferenceparams.md
+++ b/docs/xmldocs/llama.abstractions.iinferenceparams.md
@ -0,0 +1,268 @@
+# IInferenceParams
+
+Namespace: LLama.Abstractions
+
+The paramters used for inference.
+
+```csharp
+public interface IInferenceParams
+```
+
+## Properties
+
+### **TokensKeep**
+
+number of tokens to keep from initial prompt
+
+```csharp
+public abstract int TokensKeep { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **MaxTokens**
+
+how many new tokens to predict (n_predict), set to -1 to inifinitely generate response
+ until it complete.
+
+```csharp
+public abstract int MaxTokens { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **LogitBias**
+
+logit bias for specific tokens
+
+```csharp
+public abstract Dictionary<int, float> LogitBias { get; set; }
+```
+
+#### Property Value
+
+[Dictionary&lt;Int32, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
+
+### **AntiPrompts**
+
+Sequences where the model will stop generating further tokens.
+
+```csharp
+public abstract IEnumerable<string> AntiPrompts { get; set; }
+```
+
+#### Property Value
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+### **PathSession**
+
+path to file for saving/loading model eval state
+
+```csharp
+public abstract string PathSession { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **InputSuffix**
+
+string to suffix user inputs with
+
+```csharp
+public abstract string InputSuffix { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **InputPrefix**
+
+string to prefix user inputs with
+
+```csharp
+public abstract string InputPrefix { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **TopK**
+
+0 or lower to use vocab size
+
+```csharp
+public abstract int TopK { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **TopP**
+
+1.0 = disabled
+
+```csharp
+public abstract float TopP { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **TfsZ**
+
+1.0 = disabled
+
+```csharp
+public abstract float TfsZ { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **TypicalP**
+
+1.0 = disabled
+
+```csharp
+public abstract float TypicalP { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **Temperature**
+
+1.0 = disabled
+
+```csharp
+public abstract float Temperature { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RepeatPenalty**
+
+1.0 = disabled
+
+```csharp
+public abstract float RepeatPenalty { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RepeatLastTokensCount**
+
+last n tokens to penalize (0 = disable penalty, -1 = context size) (repeat_last_n)
+
+```csharp
+public abstract int RepeatLastTokensCount { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **FrequencyPenalty**
+
+frequency penalty coefficient
+ 0.0 = disabled
+
+```csharp
+public abstract float FrequencyPenalty { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **PresencePenalty**
+
+presence penalty coefficient
+ 0.0 = disabled
+
+```csharp
+public abstract float PresencePenalty { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **Mirostat**
+
+Mirostat uses tokens instead of words.
+ algorithm described in the paper https://arxiv.org/abs/2007.14966.
+ 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
+
+```csharp
+public abstract MirostatType Mirostat { get; set; }
+```
+
+#### Property Value
+
+[MirostatType](./llama.common.mirostattype.md)<br>
+
+### **MirostatTau**
+
+target entropy
+
+```csharp
+public abstract float MirostatTau { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **MirostatEta**
+
+learning rate
+
+```csharp
+public abstract float MirostatEta { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **PenalizeNL**
+
+consider newlines as a repeatable token (penalize_nl)
+
+```csharp
+public abstract bool PenalizeNL { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Grammar**
+
+Grammar to constrain possible tokens
+
+```csharp
+public abstract SafeLLamaGrammarHandle Grammar { get; set; }
+```
+
+#### Property Value
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
--- a/docs/xmldocs/llama.abstractions.illamaexecutor.md
+++ b/docs/xmldocs/llama.abstractions.illamaexecutor.md
@ -10,26 +10,26 @@ public interface ILLamaExecutor

 ## Properties

-### **Model**
+### **Context**

-The loaded model for this executor.
+The loaded context for this executor.

 ```csharp
-public abstract LLamaModel Model { get; }
+public abstract LLamaContext Context { get; }
 ```

 #### Property Value

-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>

 ## Methods

-### **Infer(String, InferenceParams, CancellationToken)**
+### **Infer(String, IInferenceParams, CancellationToken)**

 Infers a response from the model.

 ```csharp
-IEnumerable<string> Infer(string text, InferenceParams inferenceParams, CancellationToken token)
+IEnumerable<string> Infer(string text, IInferenceParams inferenceParams, CancellationToken token)
 ```

 #### Parameters
@ -37,7 +37,7 @@ IEnumerable<string> Infer(string text, InferenceParams inferenceParams, Cancella
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 Your prompt

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 Any additional parameters

 `token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
@ -47,19 +47,24 @@ A cancellation token.

 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>

-### **InferAsync(String, InferenceParams, CancellationToken)**
+### **InferAsync(String, IInferenceParams, CancellationToken)**
+
+Asynchronously infers a response from the model.

 ```csharp
-IAsyncEnumerable<string> InferAsync(string text, InferenceParams inferenceParams, CancellationToken token)
+IAsyncEnumerable<string> InferAsync(string text, IInferenceParams inferenceParams, CancellationToken token)
 ```

 #### Parameters

 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Your prompt

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+Any additional parameters

 `token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+A cancellation token.

 #### Returns

--- a/docs/xmldocs/llama.abstractions.imodelparams.md
+++ b/docs/xmldocs/llama.abstractions.imodelparams.md
@ -0,0 +1,276 @@
+# IModelParams
+
+Namespace: LLama.Abstractions
+
+The parameters for initializing a LLama model.
+
+```csharp
+public interface IModelParams
+```
+
+## Properties
+
+### **ContextSize**
+
+Model context size (n_ctx)
+
+```csharp
+public abstract int ContextSize { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **MainGpu**
+
+the GPU that is used for scratch and small tensors
+
+```csharp
+public abstract int MainGpu { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **LowVram**
+
+if true, reduce VRAM usage at the cost of performance
+
+```csharp
+public abstract bool LowVram { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GpuLayerCount**
+
+Number of layers to run in VRAM / GPU memory (n_gpu_layers)
+
+```csharp
+public abstract int GpuLayerCount { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Seed**
+
+Seed for the random number generator (seed)
+
+```csharp
+public abstract int Seed { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **UseFp16Memory**
+
+Use f16 instead of f32 for memory kv (memory_f16)
+
+```csharp
+public abstract bool UseFp16Memory { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **UseMemorymap**
+
+Use mmap for faster loads (use_mmap)
+
+```csharp
+public abstract bool UseMemorymap { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **UseMemoryLock**
+
+Use mlock to keep model in memory (use_mlock)
+
+```csharp
+public abstract bool UseMemoryLock { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Perplexity**
+
+Compute perplexity over the prompt (perplexity)
+
+```csharp
+public abstract bool Perplexity { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **ModelPath**
+
+Model path (model)
+
+```csharp
+public abstract string ModelPath { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **ModelAlias**
+
+model alias
+
+```csharp
+public abstract string ModelAlias { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **LoraAdapter**
+
+lora adapter path (lora_adapter)
+
+```csharp
+public abstract string LoraAdapter { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **LoraBase**
+
+base model path for the lora adapter (lora_base)
+
+```csharp
+public abstract string LoraBase { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Threads**
+
+Number of threads (-1 = autodetect) (n_threads)
+
+```csharp
+public abstract int Threads { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **BatchSize**
+
+batch size for prompt processing (must be &gt;=32 to use BLAS) (n_batch)
+
+```csharp
+public abstract int BatchSize { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ConvertEosToNewLine**
+
+Whether to convert eos to newline during the inference.
+
+```csharp
+public abstract bool ConvertEosToNewLine { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **EmbeddingMode**
+
+Whether to use embedding mode. (embedding) Note that if this is set to true, 
+ The LLamaModel won't produce text response anymore.
+
+```csharp
+public abstract bool EmbeddingMode { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **TensorSplits**
+
+how split tensors should be distributed across GPUs
+
+```csharp
+public abstract Single[] TensorSplits { get; set; }
+```
+
+#### Property Value
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RopeFrequencyBase**
+
+RoPE base frequency
+
+```csharp
+public abstract float RopeFrequencyBase { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RopeFrequencyScale**
+
+RoPE frequency scaling factor
+
+```csharp
+public abstract float RopeFrequencyScale { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **MulMatQ**
+
+Use experimental mul_mat_q kernels
+
+```csharp
+public abstract bool MulMatQ { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Encoding**
+
+The encoding to use for models
+
+```csharp
+public abstract Encoding Encoding { get; set; }
+```
+
+#### Property Value
+
+[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
--- a/docs/xmldocs/llama.chatsession.md
+++ b/docs/xmldocs/llama.chatsession.md
@ -161,19 +161,19 @@ public void LoadSession(string path)
 `path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 The directory name to load the session.

-### **Chat(ChatHistory, InferenceParams, CancellationToken)**
+### **Chat(ChatHistory, IInferenceParams, CancellationToken)**

 Get the response from the LLama model with chat histories.

 ```csharp
-public IEnumerable<string> Chat(ChatHistory history, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IEnumerable<string> Chat(ChatHistory history, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```

 #### Parameters

 `history` [ChatHistory](./llama.common.chathistory.md)<br>

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>

@ -181,20 +181,20 @@ public IEnumerable<string> Chat(ChatHistory history, InferenceParams inferencePa

 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>

-### **Chat(String, InferenceParams, CancellationToken)**
+### **Chat(String, IInferenceParams, CancellationToken)**

 Get the response from the LLama model. Note that prompt could not only be the preset words, 
 but also the question you want to ask.

 ```csharp
-public IEnumerable<string> Chat(string prompt, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IEnumerable<string> Chat(string prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```

 #### Parameters

 `prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>

@ -202,19 +202,19 @@ public IEnumerable<string> Chat(string prompt, InferenceParams inferenceParams,

 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>

-### **ChatAsync(ChatHistory, InferenceParams, CancellationToken)**
+### **ChatAsync(ChatHistory, IInferenceParams, CancellationToken)**

 Get the response from the LLama model with chat histories.

 ```csharp
-public IAsyncEnumerable<string> ChatAsync(ChatHistory history, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IAsyncEnumerable<string> ChatAsync(ChatHistory history, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```

 #### Parameters

 `history` [ChatHistory](./llama.common.chathistory.md)<br>

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>

@ -222,19 +222,19 @@ public IAsyncEnumerable<string> ChatAsync(ChatHistory history, InferenceParams i

 [IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>

-### **ChatAsync(String, InferenceParams, CancellationToken)**
+### **ChatAsync(String, IInferenceParams, CancellationToken)**

 Get the response from the LLama model with chat histories asynchronously.

 ```csharp
-public IAsyncEnumerable<string> ChatAsync(string prompt, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IAsyncEnumerable<string> ChatAsync(string prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```

 #### Parameters

 `prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>

--- a/docs/xmldocs/llama.common.authorrole.md
+++ b/docs/xmldocs/llama.common.authorrole.md
@ -2,6 +2,8 @@

 Namespace: LLama.Common

+Role of the message author, e.g. user/assistant/system
+
 ```csharp
 public enum AuthorRole
 ```
@ -13,3 +15,7 @@ Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icom

 | Name | Value | Description |
 | --- | --: | --- |
+| Unknown | -1 | Role is unknown |
+| System | 0 | Message comes from a "system" prompt, not written by a user or language model |
+| User | 1 | Message comes from the user |
+| Assistant | 2 | Messages was generated by the language model |
--- a/docs/xmldocs/llama.common.fixedsizequeue-1.md
+++ b/docs/xmldocs/llama.common.fixedsizequeue-1.md
@ -20,6 +20,8 @@ Implements IEnumerable&lt;T&gt;, [IEnumerable](https://docs.microsoft.com/en-us/

 ### **Count**

+Number of items in this queue
+
 ```csharp
 public int Count { get; }
 ```
@ -30,6 +32,8 @@ public int Count { get; }

 ### **Capacity**

+Maximum number of items allowed in this queue
+
 ```csharp
 public int Capacity { get; }
 ```
@ -42,6 +46,8 @@ public int Capacity { get; }

 ### **FixedSizeQueue(Int32)**

+Create a new queue
+
 ```csharp
 public FixedSizeQueue(int size)
 ```
@ -49,9 +55,12 @@ public FixedSizeQueue(int size)
 #### Parameters

 `size` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+the maximum number of items to store in this queue

 ### **FixedSizeQueue(Int32, IEnumerable&lt;T&gt;)**

+Fill the quene with the data. Please ensure that data.Count &lt;= size
+
 ```csharp
 public FixedSizeQueue(int size, IEnumerable<T> data)
 ```
@ -66,6 +75,8 @@ public FixedSizeQueue(int size, IEnumerable<T> data)

 ### **FillWith(T)**

+Replace every item in the queue with the given value
+
 ```csharp
 public FixedSizeQueue<T> FillWith(T value)
 ```
@ -73,10 +84,12 @@ public FixedSizeQueue<T> FillWith(T value)
 #### Parameters

 `value` T<br>
+The value to replace all items with

 #### Returns

 [FixedSizeQueue&lt;T&gt;](./llama.common.fixedsizequeue-1.md)<br>
+returns this

 ### **Enqueue(T)**

@ -90,16 +103,6 @@ public void Enqueue(T item)

 `item` T<br>

-### **ToArray()**
-
-```csharp
-public T[] ToArray()
-```
-
-#### Returns
-
-T[]<br>
-
 ### **GetEnumerator()**

 ```csharp
--- a/docs/xmldocs/llama.common.illamalogger.md
+++ b/docs/xmldocs/llama.common.illamalogger.md
@ -2,6 +2,8 @@

 Namespace: LLama.Common

+receives log messages from LLamaSharp
+
 ```csharp
 public interface ILLamaLogger
 ```
@ -10,7 +12,7 @@ public interface ILLamaLogger

 ### **Log(String, String, LogLevel)**

-Write the log in cosutomized way
+Write the log in customized way

 ```csharp
 void Log(string source, string message, LogLevel level)
--- a/docs/xmldocs/llama.common.inferenceparams.md
+++ b/docs/xmldocs/llama.common.inferenceparams.md
@ -2,11 +2,14 @@

 Namespace: LLama.Common

+The paramters used for inference.
+
 ```csharp
-public class InferenceParams
+public class InferenceParams : LLama.Abstractions.IInferenceParams
 ```

-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [InferenceParams](./llama.common.inferenceparams.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [InferenceParams](./llama.common.inferenceparams.md)<br>
+Implements [IInferenceParams](./llama.abstractions.iinferenceparams.md)

 ## Properties

@ -212,12 +215,12 @@ Mirostat uses tokens instead of words.
 0 = disabled, 1 = mirostat, 2 = mirostat 2.0

 ```csharp
-public MiroStateType Mirostat { get; set; }
+public MirostatType Mirostat { get; set; }
 ```

 #### Property Value

-[MiroStateType](./llama.common.mirostatetype.md)<br>
+[MirostatType](./llama.common.mirostattype.md)<br>

 ### **MirostatTau**

@ -255,6 +258,18 @@ public bool PenalizeNL { get; set; }

 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

+### **Grammar**
+
+A grammar to constrain the possible tokens
+
+```csharp
+public SafeLLamaGrammarHandle Grammar { get; set; }
+```
+
+#### Property Value
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
 ## Constructors

 ### **InferenceParams()**
--- a/docs/xmldocs/llama.common.llamadefaultlogger.md
+++ b/docs/xmldocs/llama.common.llamadefaultlogger.md
@ -2,8 +2,8 @@

 Namespace: LLama.Common

-The default logger of LLamaSharp. On default it write to console. User methods of `LLamaLogger.Default` to change the behavior.
- It's more recommended to inherit `ILLamaLogger` to cosutomize the behavior.
+The default logger of LLamaSharp. On default it write to console. Use methods of `LLamaLogger.Default` to change the behavior.
+ It's recommended to inherit `ILLamaLogger` to customize the behavior.

 ```csharp
 public sealed class LLamaDefaultLogger : ILLamaLogger
@ -16,6 +16,8 @@ Implements [ILLamaLogger](./llama.common.illamalogger.md)

 ### **Default**

+Get the default logger instance
+
 ```csharp
 public static LLamaDefaultLogger Default { get; }
 ```
@ -26,8 +28,22 @@ public static LLamaDefaultLogger Default { get; }

 ## Methods

+### **EnableNative()**
+
+Enable logging output from llama.cpp
+
+```csharp
+public LLamaDefaultLogger EnableNative()
+```
+
+#### Returns
+
+[LLamaDefaultLogger](./llama.common.llamadefaultlogger.md)<br>
+
 ### **EnableConsole()**

+Enable writing log messages to console
+
 ```csharp
 public LLamaDefaultLogger EnableConsole()
 ```
@ -38,6 +54,8 @@ public LLamaDefaultLogger EnableConsole()

 ### **DisableConsole()**

+Disable writing messages to console
+
 ```csharp
 public LLamaDefaultLogger DisableConsole()
 ```
@ -48,6 +66,8 @@ public LLamaDefaultLogger DisableConsole()

 ### **EnableFile(String, FileMode)**

+Enable writing log messages to file
+
 ```csharp
 public LLamaDefaultLogger EnableFile(string filename, FileMode mode)
 ```
@ -64,6 +84,14 @@ public LLamaDefaultLogger EnableFile(string filename, FileMode mode)

 ### **DisableFile(String)**

+#### Caution
+
+Use DisableFile method without 'filename' parameter
+
+---
+
+Disable writing log messages to file
+
 ```csharp
 public LLamaDefaultLogger DisableFile(string filename)
 ```
@ -71,6 +99,19 @@ public LLamaDefaultLogger DisableFile(string filename)
 #### Parameters

 `filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+unused!
+
+#### Returns
+
+[LLamaDefaultLogger](./llama.common.llamadefaultlogger.md)<br>
+
+### **DisableFile()**
+
+Disable writing log messages to file
+
+```csharp
+public LLamaDefaultLogger DisableFile()
+```

 #### Returns

@ -78,6 +119,8 @@ public LLamaDefaultLogger DisableFile(string filename)

 ### **Log(String, String, LogLevel)**

+Log a message
+
 ```csharp
 public void Log(string source, string message, LogLevel level)
 ```
@ -85,13 +128,18 @@ public void Log(string source, string message, LogLevel level)
 #### Parameters

 `source` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The source of this message (e.g. class name)

 `message` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The message to log

 `level` [LogLevel](./llama.common.illamalogger.loglevel.md)<br>
+Severity level of this message

 ### **Info(String)**

+Write a log message with "Info" severity
+
 ```csharp
 public void Info(string message)
 ```
@ -102,6 +150,8 @@ public void Info(string message)

 ### **Warn(String)**

+Write a log message with "Warn" severity
+
 ```csharp
 public void Warn(string message)
 ```
@ -112,6 +162,8 @@ public void Warn(string message)

 ### **Error(String)**

+Write a log message with "Error" severity
+
 ```csharp
 public void Error(string message)
 ```
--- a/docs/xmldocs/llama.common.mirostatetype.md
+++ b/docs/xmldocs/llama.common.mirostatetype.md
@ -1,15 +1,21 @@
-# MiroStateType
+# MirostatType

 Namespace: LLama.Common

+Type of "mirostat" sampling to use.
+ https://github.com/basusourya/mirostat
+
 ```csharp
-public enum MiroStateType
+public enum MirostatType
 ```

-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [MiroStateType](./llama.common.mirostatetype.md)<br>
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [MirostatType](./llama.common.mirostattype.md)<br>
 Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)

 ## Fields

 | Name | Value | Description |
 | --- | --: | --- |
+| Disable | 0 | Disable Mirostat sampling |
+| Mirostat | 1 | Original mirostat algorithm |
+| Mirostat2 | 2 | Mirostat 2.0 algorithm |
--- a/docs/xmldocs/llama.common.modelparams.md
+++ b/docs/xmldocs/llama.common.modelparams.md
@ -2,11 +2,14 @@

 Namespace: LLama.Common

+The parameters for initializing a LLama model.
+
 ```csharp
-public class ModelParams
+public class ModelParams : LLama.Abstractions.IModelParams, System.IEquatable`1[[LLama.Common.ModelParams, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ModelParams](./llama.common.modelparams.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ModelParams](./llama.common.modelparams.md)<br>
+Implements [IModelParams](./llama.abstractions.imodelparams.md), [IEquatable&lt;ModelParams&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)

 ## Properties

@ -22,6 +25,30 @@ public int ContextSize { get; set; }

 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>

+### **MainGpu**
+
+the GPU that is used for scratch and small tensors
+
+```csharp
+public int MainGpu { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **LowVram**
+
+if true, reduce VRAM usage at the cost of performance
+
+```csharp
+public bool LowVram { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **GpuLayerCount**

 Number of layers to run in VRAM / GPU memory (n_gpu_layers)
@ -106,6 +133,18 @@ public string ModelPath { get; set; }

 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

+### **ModelAlias**
+
+model alias
+
+```csharp
+public string ModelAlias { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
 ### **LoraAdapter**

 lora adapter path (lora_adapter)
@ -179,14 +218,93 @@ public bool EmbeddingMode { get; set; }

 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

+### **TensorSplits**
+
+how split tensors should be distributed across GPUs
+
+```csharp
+public Single[] TensorSplits { get; set; }
+```
+
+#### Property Value
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RopeFrequencyBase**
+
+RoPE base frequency
+
+```csharp
+public float RopeFrequencyBase { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RopeFrequencyScale**
+
+RoPE frequency scaling factor
+
+```csharp
+public float RopeFrequencyScale { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **MulMatQ**
+
+Use experimental mul_mat_q kernels
+
+```csharp
+public bool MulMatQ { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Encoding**
+
+The encoding to use to convert text for the model
+
+```csharp
+public Encoding Encoding { get; set; }
+```
+
+#### Property Value
+
+[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
 ## Constructors

-### **ModelParams(String, Int32, Int32, Int32, Boolean, Boolean, Boolean, Boolean, String, String, Int32, Int32, Boolean, Boolean)**
+### **ModelParams(String)**



 ```csharp
-public ModelParams(string modelPath, int contextSize, int gpuLayerCount, int seed, bool useFp16Memory, bool useMemorymap, bool useMemoryLock, bool perplexity, string loraAdapter, string loraBase, int threads, int batchSize, bool convertEosToNewLine, bool embeddingMode)
+public ModelParams(string modelPath)
+```
+
+#### Parameters
+
+`modelPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The model path.
+
+### **ModelParams(String, Int32, Int32, Int32, Boolean, Boolean, Boolean, Boolean, String, String, Int32, Int32, Boolean, Boolean, Single, Single, Boolean, String)**
+
+#### Caution
+
+Use object initializer to set all optional parameters
+
+---
+
+
+
+```csharp
+public ModelParams(string modelPath, int contextSize, int gpuLayerCount, int seed, bool useFp16Memory, bool useMemorymap, bool useMemoryLock, bool perplexity, string loraAdapter, string loraBase, int threads, int batchSize, bool convertEosToNewLine, bool embeddingMode, float ropeFrequencyBase, float ropeFrequencyScale, bool mulMatQ, string encoding)
 ```

 #### Parameters
@ -232,3 +350,89 @@ Whether to convert eos to newline during the inference.

 `embeddingMode` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 Whether to use embedding mode. (embedding) Note that if this is set to true, The LLamaModel won't produce text response anymore.
+
+`ropeFrequencyBase` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+RoPE base frequency.
+
+`ropeFrequencyScale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+RoPE frequency scaling factor
+
+`mulMatQ` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Use experimental mul_mat_q kernels
+
+`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The encoding to use to convert text for the model
+
+## Methods
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **PrintMembers(StringBuilder)**
+
+```csharp
+protected bool PrintMembers(StringBuilder builder)
+```
+
+#### Parameters
+
+`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GetHashCode()**
+
+```csharp
+public int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Equals(Object)**
+
+```csharp
+public bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(ModelParams)**
+
+```csharp
+public bool Equals(ModelParams other)
+```
+
+#### Parameters
+
+`other` [ModelParams](./llama.common.modelparams.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **&lt;Clone&gt;$()**
+
+```csharp
+public ModelParams <Clone>$()
+```
+
+#### Returns
+
+[ModelParams](./llama.common.modelparams.md)<br>
--- a/docs/xmldocs/llama.exceptions.grammarexpectedname.md
+++ b/docs/xmldocs/llama.exceptions.grammarexpectedname.md
@ -0,0 +1,94 @@
+# GrammarExpectedName
+
+Namespace: LLama.Exceptions
+
+Failed to parse a "name" element when one was expected
+
+```csharp
+public class GrammarExpectedName : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedName](./llama.exceptions.grammarexpectedname.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarexpectednext.md
+++ b/docs/xmldocs/llama.exceptions.grammarexpectednext.md
@ -0,0 +1,94 @@
+# GrammarExpectedNext
+
+Namespace: LLama.Exceptions
+
+A specified string was expected when parsing
+
+```csharp
+public class GrammarExpectedNext : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedNext](./llama.exceptions.grammarexpectednext.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarexpectedprevious.md
+++ b/docs/xmldocs/llama.exceptions.grammarexpectedprevious.md
@ -0,0 +1,94 @@
+# GrammarExpectedPrevious
+
+Namespace: LLama.Exceptions
+
+A specified character was expected to preceded another when parsing
+
+```csharp
+public class GrammarExpectedPrevious : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedPrevious](./llama.exceptions.grammarexpectedprevious.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarformatexception.md
+++ b/docs/xmldocs/llama.exceptions.grammarformatexception.md
@ -0,0 +1,94 @@
+# GrammarFormatException
+
+Namespace: LLama.Exceptions
+
+Base class for all grammar exceptions
+
+```csharp
+public abstract class GrammarFormatException : System.Exception, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md
@ -0,0 +1,94 @@
+# GrammarUnexpectedCharAltElement
+
+Namespace: LLama.Exceptions
+
+A CHAR_ALT was created without a preceding CHAR element
+
+```csharp
+public class GrammarUnexpectedCharAltElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedCharAltElement](./llama.exceptions.grammarunexpectedcharaltelement.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md
@ -0,0 +1,94 @@
+# GrammarUnexpectedCharRngElement
+
+Namespace: LLama.Exceptions
+
+A CHAR_RNG was created without a preceding CHAR element
+
+```csharp
+public class GrammarUnexpectedCharRngElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedCharRngElement](./llama.exceptions.grammarunexpectedcharrngelement.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedendelement.md
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedendelement.md
@ -0,0 +1,94 @@
+# GrammarUnexpectedEndElement
+
+Namespace: LLama.Exceptions
+
+An END was encountered before the last element
+
+```csharp
+public class GrammarUnexpectedEndElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedEndElement](./llama.exceptions.grammarunexpectedendelement.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedendofinput.md
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedendofinput.md
@ -0,0 +1,94 @@
+# GrammarUnexpectedEndOfInput
+
+Namespace: LLama.Exceptions
+
+End-of-file was encountered while parsing
+
+```csharp
+public class GrammarUnexpectedEndOfInput : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedEndOfInput](./llama.exceptions.grammarunexpectedendofinput.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md
@ -0,0 +1,94 @@
+# GrammarUnexpectedHexCharsCount
+
+Namespace: LLama.Exceptions
+
+An incorrect number of characters were encountered while parsing a hex literal
+
+```csharp
+public class GrammarUnexpectedHexCharsCount : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedHexCharsCount](./llama.exceptions.grammarunexpectedhexcharscount.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.exceptions.grammarunknownescapecharacter.md
+++ b/docs/xmldocs/llama.exceptions.grammarunknownescapecharacter.md
@ -0,0 +1,94 @@
+# GrammarUnknownEscapeCharacter
+
+Namespace: LLama.Exceptions
+
+An unexpected character was encountered after an escape sequence
+
+```csharp
+public class GrammarUnknownEscapeCharacter : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnknownEscapeCharacter](./llama.exceptions.grammarunknownescapecharacter.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.extensions.dictionaryextension.md
+++ b/docs/xmldocs/llama.extensions.dictionaryextension.md
@ -1,73 +0,0 @@
-# DictionaryExtension
-
-Namespace: LLama.Extensions
-
-```csharp
-public static class DictionaryExtension
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [DictionaryExtension](./llama.extensions.dictionaryextension.md)
-
-## Methods
-
-### **Deconstruct&lt;T1, T2&gt;(KeyValuePair&lt;T1, T2&gt;, T1&, T2&)**
-
-```csharp
-public static void Deconstruct<T1, T2>(KeyValuePair<T1, T2> pair, T1& first, T2& second)
-```
-
-#### Type Parameters
-
-`T1`<br>
-
-`T2`<br>
-
-#### Parameters
-
-`pair` KeyValuePair&lt;T1, T2&gt;<br>
-
-`first` T1&<br>
-
-`second` T2&<br>
-
-### **Update&lt;T1, T2&gt;(Dictionary&lt;T1, T2&gt;, IDictionary&lt;T1, T2&gt;)**
-
-```csharp
-public static void Update<T1, T2>(Dictionary<T1, T2> dic, IDictionary<T1, T2> other)
-```
-
-#### Type Parameters
-
-`T1`<br>
-
-`T2`<br>
-
-#### Parameters
-
-`dic` Dictionary&lt;T1, T2&gt;<br>
-
-`other` IDictionary&lt;T1, T2&gt;<br>
-
-### **GetOrDefault&lt;T1, T2&gt;(Dictionary&lt;T1, T2&gt;, T1, T2)**
-
-```csharp
-public static T2 GetOrDefault<T1, T2>(Dictionary<T1, T2> dic, T1 key, T2 defaultValue)
-```
-
-#### Type Parameters
-
-`T1`<br>
-
-`T2`<br>
-
-#### Parameters
-
-`dic` Dictionary&lt;T1, T2&gt;<br>
-
-`key` T1<br>
-
-`defaultValue` T2<br>
-
-#### Returns
-
-T2<br>
--- a/docs/xmldocs/llama.extensions.imodelparamsextensions.md
+++ b/docs/xmldocs/llama.extensions.imodelparamsextensions.md
@ -0,0 +1,37 @@
+# IModelParamsExtensions
+
+Namespace: LLama.Extensions
+
+Extention methods to the IModelParams interface
+
+```csharp
+public static class IModelParamsExtensions
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)
+
+## Methods
+
+### **ToLlamaContextParams(IModelParams, LLamaContextParams&)**
+
+Convert the given `IModelParams` into a `LLamaContextParams`
+
+```csharp
+public static MemoryHandle ToLlamaContextParams(IModelParams params, LLamaContextParams& result)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+`result` [LLamaContextParams&](./llama.native.llamacontextparams&.md)<br>
+
+#### Returns
+
+[MemoryHandle](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memoryhandle)<br>
+
+#### Exceptions
+
+[FileNotFoundException](https://docs.microsoft.com/en-us/dotnet/api/system.io.filenotfoundexception)<br>
+
+[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
--- a/docs/xmldocs/llama.extensions.keyvaluepairextensions.md
+++ b/docs/xmldocs/llama.extensions.keyvaluepairextensions.md
@ -0,0 +1,40 @@
+# KeyValuePairExtensions
+
+Namespace: LLama.Extensions
+
+Extensions to the KeyValuePair struct
+
+```csharp
+public static class KeyValuePairExtensions
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [KeyValuePairExtensions](./llama.extensions.keyvaluepairextensions.md)
+
+## Methods
+
+### **Deconstruct&lt;TKey, TValue&gt;(KeyValuePair&lt;TKey, TValue&gt;, TKey&, TValue&)**
+
+Deconstruct a KeyValuePair into it's constituent parts.
+
+```csharp
+public static void Deconstruct<TKey, TValue>(KeyValuePair<TKey, TValue> pair, TKey& first, TValue& second)
+```
+
+#### Type Parameters
+
+`TKey`<br>
+Type of the Key
+
+`TValue`<br>
+Type of the Value
+
+#### Parameters
+
+`pair` KeyValuePair&lt;TKey, TValue&gt;<br>
+The KeyValuePair to deconstruct
+
+`first` TKey&<br>
+First element, the Key
+
+`second` TValue&<br>
+Second element, the Value
--- a/docs/xmldocs/llama.grammars.grammar.md
+++ b/docs/xmldocs/llama.grammars.grammar.md
@ -0,0 +1,110 @@
+# Grammar
+
+Namespace: LLama.Grammars
+
+A grammar is a set of [GrammarRule](./llama.grammars.grammarrule.md)s for deciding which characters are valid next. Can be used to constrain
+ output to certain formats - e.g. force the model to output JSON
+
+```csharp
+public sealed class Grammar
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Grammar](./llama.grammars.grammar.md)
+
+## Properties
+
+### **StartRuleIndex**
+
+Index of the initial rule to start from
+
+```csharp
+public ulong StartRuleIndex { get; set; }
+```
+
+#### Property Value
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **Rules**
+
+The rules which make up this grammar
+
+```csharp
+public IReadOnlyList<GrammarRule> Rules { get; }
+```
+
+#### Property Value
+
+[IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+
+## Constructors
+
+### **Grammar(IReadOnlyList&lt;GrammarRule&gt;, UInt64)**
+
+Create a new grammar from a set of rules
+
+```csharp
+public Grammar(IReadOnlyList<GrammarRule> rules, ulong startRuleIndex)
+```
+
+#### Parameters
+
+`rules` [IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+The rules which make up this grammar
+
+`startRuleIndex` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Index of the initial rule to start from
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+
+## Methods
+
+### **CreateInstance()**
+
+Create a `SafeLLamaGrammarHandle` instance to use for parsing
+
+```csharp
+public SafeLLamaGrammarHandle CreateInstance()
+```
+
+#### Returns
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
+### **Parse(String, String)**
+
+Parse a string of GGML BNF into a Grammar
+
+```csharp
+public static Grammar Parse(string gbnf, string startRule)
+```
+
+#### Parameters
+
+`gbnf` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The string to parse
+
+`startRule` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Name of the start rule of this grammar
+
+#### Returns
+
+[Grammar](./llama.grammars.grammar.md)<br>
+A Grammar which can be converted into a SafeLLamaGrammarHandle for sampling
+
+#### Exceptions
+
+[GrammarFormatException](./llama.exceptions.grammarformatexception.md)<br>
+Thrown if input is malformed
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/docs/xmldocs/llama.grammars.grammarrule.md
+++ b/docs/xmldocs/llama.grammars.grammarrule.md
@ -0,0 +1,118 @@
+# GrammarRule
+
+Namespace: LLama.Grammars
+
+A single rule in a [Grammar](./llama.grammars.grammar.md)
+
+```csharp
+public sealed class GrammarRule : System.IEquatable`1[[LLama.Grammars.GrammarRule, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [GrammarRule](./llama.grammars.grammarrule.md)<br>
+Implements [IEquatable&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+
+## Properties
+
+### **Name**
+
+Name of this rule
+
+```csharp
+public string Name { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Elements**
+
+The elements of this grammar rule
+
+```csharp
+public IReadOnlyList<LLamaGrammarElement> Elements { get; }
+```
+
+#### Property Value
+
+[IReadOnlyList&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+
+## Constructors
+
+### **GrammarRule(String, IReadOnlyList&lt;LLamaGrammarElement&gt;)**
+
+Create a new GrammarRule containing the given elements
+
+```csharp
+public GrammarRule(string name, IReadOnlyList<LLamaGrammarElement> elements)
+```
+
+#### Parameters
+
+`name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`elements` [IReadOnlyList&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+
+#### Exceptions
+
+[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
+
+## Methods
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **GetHashCode()**
+
+```csharp
+public int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Equals(Object)**
+
+```csharp
+public bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(GrammarRule)**
+
+```csharp
+public bool Equals(GrammarRule other)
+```
+
+#### Parameters
+
+`other` [GrammarRule](./llama.grammars.grammarrule.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **&lt;Clone&gt;$()**
+
+```csharp
+public GrammarRule <Clone>$()
+```
+
+#### Returns
+
+[GrammarRule](./llama.grammars.grammarrule.md)<br>
--- a/docs/xmldocs/llama.instructexecutor.md
+++ b/docs/xmldocs/llama.instructexecutor.md
@ -13,31 +13,31 @@ Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)

 ## Properties

-### **Model**
+### **Context**

-The mode used by the executor.
+The context used by the executor.

 ```csharp
-public LLamaModel Model { get; }
+public LLamaContext Context { get; }
 ```

 #### Property Value

-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>

 ## Constructors

-### **InstructExecutor(LLamaModel, String, String)**
+### **InstructExecutor(LLamaContext, String, String)**



 ```csharp
-public InstructExecutor(LLamaModel model, string instructionPrefix, string instructionSuffix)
+public InstructExecutor(LLamaContext context, string instructionPrefix, string instructionSuffix)
 ```

 #### Parameters

-`model` [LLamaModel](./llama.llamamodel.md)<br>
+`context` [LLamaContext](./llama.llamacontext.md)<br>

 `instructionPrefix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

@ -111,15 +111,15 @@ protected void PreprocessInputs(string text, InferStateArgs args)

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>

-### **PostProcess(InferenceParams, InferStateArgs, IEnumerable`1&)**
+### **PostProcess(IInferenceParams, InferStateArgs, IEnumerable`1&)**

 ```csharp
-protected bool PostProcess(InferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
+protected bool PostProcess(IInferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
 ```

 #### Parameters

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>

@ -129,14 +129,14 @@ protected bool PostProcess(InferenceParams inferenceParams, InferStateArgs args,

 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

-### **InferInternal(InferenceParams, InferStateArgs)**
+### **InferInternal(IInferenceParams, InferStateArgs)**

 ```csharp
-protected void InferInternal(InferenceParams inferenceParams, InferStateArgs args)
+protected void InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
 ```

 #### Parameters

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
--- a/docs/xmldocs/llama.interactiveexecutor.md
+++ b/docs/xmldocs/llama.interactiveexecutor.md
@ -13,31 +13,31 @@ Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)

 ## Properties

-### **Model**
+### **Context**

-The mode used by the executor.
+The context used by the executor.

 ```csharp
-public LLamaModel Model { get; }
+public LLamaContext Context { get; }
 ```

 #### Property Value

-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>

 ## Constructors

-### **InteractiveExecutor(LLamaModel)**
+### **InteractiveExecutor(LLamaContext)**



 ```csharp
-public InteractiveExecutor(LLamaModel model)
+public InteractiveExecutor(LLamaContext context)
 ```

 #### Parameters

-`model` [LLamaModel](./llama.llamamodel.md)<br>
+`context` [LLamaContext](./llama.llamacontext.md)<br>

 ## Methods

@ -109,17 +109,17 @@ protected void PreprocessInputs(string text, InferStateArgs args)

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>

-### **PostProcess(InferenceParams, InferStateArgs, IEnumerable`1&)**
+### **PostProcess(IInferenceParams, InferStateArgs, IEnumerable`1&)**

 Return whether to break the generation.

 ```csharp
-protected bool PostProcess(InferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
+protected bool PostProcess(IInferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
 ```

 #### Parameters

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>

@ -129,14 +129,14 @@ protected bool PostProcess(InferenceParams inferenceParams, InferStateArgs args,

 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

-### **InferInternal(InferenceParams, InferStateArgs)**
+### **InferInternal(IInferenceParams, InferStateArgs)**

 ```csharp
-protected void InferInternal(InferenceParams inferenceParams, InferStateArgs args)
+protected void InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
 ```

 #### Parameters

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
--- a/docs/xmldocs/llama.llamacontext.md
+++ b/docs/xmldocs/llama.llamacontext.md
@ -1,21 +1,33 @@
-# LLamaModel
+# LLamaContext

 Namespace: LLama

-The abstraction of a LLama model, which holds the context in the native library.
+A llama_context, which holds all the context required to interact with a model

 ```csharp
-public class LLamaModel : System.IDisposable
+public sealed class LLamaContext : System.IDisposable
 ```

-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaModel](./llama.llamamodel.md)<br>
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaContext](./llama.llamacontext.md)<br>
 Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)

 ## Properties

+### **VocabCount**
+
+Total number of tokens in vocabulary of this model
+
+```csharp
+public int VocabCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **ContextSize**

-The context size.
+Total number of tokens in the context

 ```csharp
 public int ContextSize { get; }
@ -25,22 +37,33 @@ public int ContextSize { get; }

 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>

+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **Params**

 The model params set for this model.

 ```csharp
-public ModelParams Params { get; set; }
+public IModelParams Params { get; set; }
 ```

 #### Property Value

-[ModelParams](./llama.common.modelparams.md)<br>
+[IModelParams](./llama.abstractions.imodelparams.md)<br>

 ### **NativeHandle**

-The native handle, which is used to be passed to the native APIs. Please avoid using it 
- unless you know what is the usage of the Native API.
+The native handle, which is used to be passed to the native APIs

 ```csharp
 public SafeLLamaContextHandle NativeHandle { get; }
@ -50,6 +73,10 @@ public SafeLLamaContextHandle NativeHandle { get; }

 [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>

+**Remarks:**
+
+Be careful how you use this!
+
 ### **Encoding**

 The encoding set for this model to deal with text input.
@ -62,35 +89,82 @@ public Encoding Encoding { get; }

 [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>

+### **EmbeddingLength**
+
+The embedding length of the model, also known as `n_embed`
+
+```csharp
+public int EmbeddingLength { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ## Constructors

-### **LLamaModel(ModelParams, String, ILLamaLogger)**
+### **LLamaContext(IModelParams, ILLamaLogger)**
+
+#### Caution
+
+Use the LLamaWeights.CreateContext instead
+
+---



 ```csharp
-public LLamaModel(ModelParams Params, string encoding, ILLamaLogger logger)
+public LLamaContext(IModelParams params, ILLamaLogger logger)
 ```

 #### Parameters

-`Params` [ModelParams](./llama.common.modelparams.md)<br>
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
 Model params.

-`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-Encoding to deal with text input.
-
 `logger` [ILLamaLogger](./llama.common.illamalogger.md)<br>
 The logger.

+### **LLamaContext(LLamaWeights, IModelParams, ILLamaLogger)**
+
+Create a new LLamaContext for the given LLamaWeights
+
+```csharp
+public LLamaContext(LLamaWeights model, IModelParams params, ILLamaLogger logger)
+```
+
+#### Parameters
+
+`model` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+`logger` [ILLamaLogger](./llama.common.illamalogger.md)<br>
+
+#### Exceptions
+
+[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
+
 ## Methods

+### **Clone()**
+
+Create a copy of the current state of this context
+
+```csharp
+public LLamaContext Clone()
+```
+
+#### Returns
+
+[LLamaContext](./llama.llamacontext.md)<br>
+
 ### **Tokenize(String, Boolean)**

 Tokenize a string.

 ```csharp
-public IEnumerable<int> Tokenize(string text, bool addBos)
+public Int32[] Tokenize(string text, bool addBos)
 ```

 #### Parameters
@ -102,7 +176,7 @@ Whether to add a bos to the text.

 #### Returns

-[IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+[Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>

 ### **DeTokenize(IEnumerable&lt;Int32&gt;)**

@ -134,6 +208,12 @@ public void SaveState(string filename)

 ### **GetStateData()**

+#### Caution
+
+Use `GetState` instead, this supports larger states (over 2GB)
+
+---
+
 Get the state data as a byte array.

 ```csharp
@ -144,6 +224,18 @@ public Byte[] GetStateData()

 [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>

+### **GetState()**
+
+Get the state data as an opaque handle
+
+```csharp
+public State GetState()
+```
+
+#### Returns
+
+[State](./llama.llamacontext.state.md)<br>
+
 ### **LoadState(String)**

 Load the state from specified path.
@ -176,21 +268,39 @@ public void LoadState(Byte[] stateData)

 [RuntimeError](./llama.exceptions.runtimeerror.md)<br>

-### **Sample(LLamaTokenDataArray, Single, MiroStateType, Single, Single, Int32, Single, Single, Single)**
+### **LoadState(State)**
+
+Load the state from memory.
+
+```csharp
+public void LoadState(State state)
+```
+
+#### Parameters
+
+`state` [State](./llama.llamacontext.state.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Sample(LLamaTokenDataArray, Nullable`1&, Single, MirostatType, Single, Single, Int32, Single, Single, Single, SafeLLamaGrammarHandle)**

 Perform the sampling. Please don't use it unless you fully know what it does.

 ```csharp
-public int Sample(LLamaTokenDataArray candidates, float temperature, MiroStateType mirostat, float mirostatTau, float mirostatEta, int topK, float topP, float tfsZ, float typicalP)
+public int Sample(LLamaTokenDataArray candidates, Nullable`1& mirostat_mu, float temperature, MirostatType mirostat, float mirostatTau, float mirostatEta, int topK, float topP, float tfsZ, float typicalP, SafeLLamaGrammarHandle grammar)
 ```

 #### Parameters

 `candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>

+`mirostat_mu` [Nullable`1&](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1&)<br>
+
 `temperature` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>

-`mirostat` [MiroStateType](./llama.common.mirostatetype.md)<br>
+`mirostat` [MirostatType](./llama.common.mirostattype.md)<br>

 `mirostatTau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>

@ -204,6 +314,8 @@ public int Sample(LLamaTokenDataArray candidates, float temperature, MiroStateTy

 `typicalP` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>

+`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
 #### Returns

 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
@ -259,6 +371,75 @@ The updated `pastTokensCount`.

 [RuntimeError](./llama.exceptions.runtimeerror.md)<br>

+### **Eval(List&lt;Int32&gt;, Int32)**
+
+
+
+```csharp
+public int Eval(List<int> tokens, int pastTokensCount)
+```
+
+#### Parameters
+
+`tokens` [List&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+
+`pastTokensCount` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The updated `pastTokensCount`.
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Eval(ReadOnlyMemory&lt;Int32&gt;, Int32)**
+
+
+
+```csharp
+public int Eval(ReadOnlyMemory<int> tokens, int pastTokensCount)
+```
+
+#### Parameters
+
+`tokens` [ReadOnlyMemory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlymemory-1)<br>
+
+`pastTokensCount` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The updated `pastTokensCount`.
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Eval(ReadOnlySpan&lt;Int32&gt;, Int32)**
+
+
+
+```csharp
+public int Eval(ReadOnlySpan<int> tokens, int pastTokensCount)
+```
+
+#### Parameters
+
+`tokens` [ReadOnlySpan&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+
+`pastTokensCount` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The updated `pastTokensCount`.
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
 ### **GenerateResult(IEnumerable&lt;Int32&gt;)**

 ```csharp
@ -273,10 +454,24 @@ internal IEnumerable<string> GenerateResult(IEnumerable<int> ids)

 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>

+### **TokenToString(Int32)**
+
+Convert a token into a string
+
+```csharp
+public string TokenToString(int token)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
 ### **Dispose()**

-
-
 ```csharp
 public void Dispose()
 ```
--- a/docs/xmldocs/llama.llamaembedder.md
+++ b/docs/xmldocs/llama.llamaembedder.md
@ -5,30 +5,62 @@ Namespace: LLama
 The embedder for LLama, which supports getting embeddings from text.

 ```csharp
-public class LLamaEmbedder : System.IDisposable
+public sealed class LLamaEmbedder : System.IDisposable
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaEmbedder](./llama.llamaembedder.md)<br>
 Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)

+## Properties
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ## Constructors

-### **LLamaEmbedder(ModelParams)**
+### **LLamaEmbedder(IModelParams)**



 ```csharp
-public LLamaEmbedder(ModelParams params)
+public LLamaEmbedder(IModelParams params)
 ```

 #### Parameters

-`params` [ModelParams](./llama.common.modelparams.md)<br>
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+### **LLamaEmbedder(LLamaWeights, IModelParams)**
+
+```csharp
+public LLamaEmbedder(LLamaWeights weights, IModelParams params)
+```
+
+#### Parameters
+
+`weights` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>

 ## Methods

 ### **GetEmbeddings(String, Int32, Boolean, String)**

+#### Caution
+
+'threads' and 'encoding' parameters are no longer used
+
+---
+
 Get the embeddings of the text.

 ```csharp
@ -40,12 +72,56 @@ public Single[] GetEmbeddings(string text, int threads, bool addBos, string enco
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

 `threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Threads used for inference.
+unused

 `addBos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 Add bos to the text.

 `encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+unused
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **GetEmbeddings(String)**
+
+Get the embeddings of the text.
+
+```csharp
+public Single[] GetEmbeddings(string text)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **GetEmbeddings(String, Boolean)**
+
+Get the embeddings of the text.
+
+```csharp
+public Single[] GetEmbeddings(string text, bool addBos)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`addBos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Add bos to the text.

 #### Returns

--- a/docs/xmldocs/llama.llamaquantizer.md
+++ b/docs/xmldocs/llama.llamaquantizer.md
@ -12,12 +12,12 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)

 ## Methods

-### **Quantize(String, String, LLamaFtype, Int32)**
+### **Quantize(String, String, LLamaFtype, Int32, Boolean, Boolean)**

 Quantize the model.

 ```csharp
-public static bool Quantize(string srcFileName, string dstFilename, LLamaFtype ftype, int nthread)
+public static bool Quantize(string srcFileName, string dstFilename, LLamaFtype ftype, int nthread, bool allowRequantize, bool quantizeOutputTensor)
 ```

 #### Parameters
@ -34,6 +34,10 @@ The type of quantization.
 `nthread` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 Thread to be used during the quantization. By default it's the physical core number.

+`allowRequantize` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`quantizeOutputTensor` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 #### Returns

 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
@ -43,12 +47,12 @@ Whether the quantization is successful.

 [ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>

-### **Quantize(String, String, String, Int32)**
+### **Quantize(String, String, String, Int32, Boolean, Boolean)**

 Quantize the model.

 ```csharp
-public static bool Quantize(string srcFileName, string dstFilename, string ftype, int nthread)
+public static bool Quantize(string srcFileName, string dstFilename, string ftype, int nthread, bool allowRequantize, bool quantizeOutputTensor)
 ```

 #### Parameters
@ -65,6 +69,10 @@ The type of quantization.
 `nthread` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 Thread to be used during the quantization. By default it's the physical core number.

+`allowRequantize` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`quantizeOutputTensor` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 #### Returns

 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
--- a/docs/xmldocs/llama.llamaweights.md
+++ b/docs/xmldocs/llama.llamaweights.md
@ -0,0 +1,118 @@
+# LLamaWeights
+
+Namespace: LLama
+
+A set of model weights, loaded into memory.
+
+```csharp
+public sealed class LLamaWeights : System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaWeights](./llama.llamaweights.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+
+## Properties
+
+### **NativeHandle**
+
+The native handle, which is used in the native APIs
+
+```csharp
+public SafeLlamaModelHandle NativeHandle { get; }
+```
+
+#### Property Value
+
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+**Remarks:**
+
+Be careful how you use this!
+
+### **Encoding**
+
+Encoding to use to convert text into bytes for the model
+
+```csharp
+public Encoding Encoding { get; }
+```
+
+#### Property Value
+
+[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+### **VocabCount**
+
+Total number of tokens in vocabulary of this model
+
+```csharp
+public int VocabCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ContextSize**
+
+Total number of tokens in the context
+
+```csharp
+public int ContextSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+## Methods
+
+### **LoadFromFile(IModelParams)**
+
+Load weights into memory
+
+```csharp
+public static LLamaWeights LoadFromFile(IModelParams params)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+#### Returns
+
+[LLamaWeights](./llama.llamaweights.md)<br>
+
+### **Dispose()**
+
+```csharp
+public void Dispose()
+```
+
+### **CreateContext(IModelParams)**
+
+Create a llama_context using this model
+
+```csharp
+public LLamaContext CreateContext(IModelParams params)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+#### Returns
+
+[LLamaContext](./llama.llamacontext.md)<br>
--- a/docs/xmldocs/llama.native.llamacontextparams.md
+++ b/docs/xmldocs/llama.native.llamacontextparams.md
@ -2,6 +2,8 @@

 Namespace: LLama.Native

+A C# representation of the llama.cpp `llama_context_params` struct
+
 ```csharp
 public struct LLamaContextParams
 ```
@ -10,6 +12,14 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)

 ## Fields

+### **seed**
+
+RNG seed, -1 for random
+
+```csharp
+public int seed;
+```
+
 ### **n_ctx**

 text context
@ -18,6 +28,14 @@ text context
 public int n_ctx;
 ```

+### **n_batch**
+
+prompt processing batch size
+
+```csharp
+public int n_batch;
+```
+
 ### **n_gpu_layers**

 number of layers to store in VRAM
@ -26,60 +44,38 @@ number of layers to store in VRAM
 public int n_gpu_layers;
 ```

-### **seed**
+### **main_gpu**

-RNG seed, -1 for random
+the GPU that is used for scratch and small tensors

 ```csharp
-public int seed;
+public int main_gpu;
 ```

-### **f16_kv**
+### **tensor_split**

-use fp16 for KV cache
+how to split layers across multiple GPUs

 ```csharp
-public bool f16_kv;
+public IntPtr tensor_split;
 ```

-### **logits_all**
+### **rope_freq_base**

-the llama_eval() call computes all logits, not just the last one
+ref: https://github.com/ggerganov/llama.cpp/pull/2054
+ RoPE base frequency

 ```csharp
-public bool logits_all;
+public float rope_freq_base;
 ```

-### **vocab_only**
+### **rope_freq_scale**

-only load the vocabulary, no weights
+ref: https://github.com/ggerganov/llama.cpp/pull/2054
+ RoPE frequency scaling factor

 ```csharp
-public bool vocab_only;
-```
-
-### **use_mmap**
-
-use mmap if possible
-
-```csharp
-public bool use_mmap;
-```
-
-### **use_mlock**
-
-force system to keep model in RAM
-
-```csharp
-public bool use_mlock;
-```
-
-### **embedding**
-
-embedding mode only
-
-```csharp
-public bool embedding;
+public float rope_freq_scale;
 ```

 ### **progress_callback**
@ -97,3 +93,101 @@ context pointer passed to the progress callback
 ```csharp
 public IntPtr progress_callback_user_data;
 ```
+
+## Properties
+
+### **low_vram**
+
+if true, reduce VRAM usage at the cost of performance
+
+```csharp
+public bool low_vram { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **mul_mat_q**
+
+if true, use experimental mul_mat_q kernels
+
+```csharp
+public bool mul_mat_q { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **f16_kv**
+
+use fp16 for KV cache
+
+```csharp
+public bool f16_kv { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **logits_all**
+
+the llama_eval() call computes all logits, not just the last one
+
+```csharp
+public bool logits_all { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **vocab_only**
+
+only load the vocabulary, no weights
+
+```csharp
+public bool vocab_only { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **use_mmap**
+
+use mmap if possible
+
+```csharp
+public bool use_mmap { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **use_mlock**
+
+force system to keep model in RAM
+
+```csharp
+public bool use_mlock { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **embedding**
+
+embedding mode only
+
+```csharp
+public bool embedding { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
--- a/docs/xmldocs/llama.native.llamaftype.md
+++ b/docs/xmldocs/llama.native.llamaftype.md
@ -2,6 +2,8 @@

 Namespace: LLama.Native

+Supported model file types
+
 ```csharp
 public enum LLamaFtype
 ```
@ -13,3 +15,21 @@ Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icom

 | Name | Value | Description |
 | --- | --: | --- |
+| LLAMA_FTYPE_ALL_F32 | 0 | All f32 |
+| LLAMA_FTYPE_MOSTLY_F16 | 1 | Mostly f16 |
+| LLAMA_FTYPE_MOSTLY_Q8_0 | 7 | Mostly 8 bit |
+| LLAMA_FTYPE_MOSTLY_Q4_0 | 2 | Mostly 4 bit |
+| LLAMA_FTYPE_MOSTLY_Q4_1 | 3 | Mostly 4 bit |
+| LLAMA_FTYPE_MOSTLY_Q4_1_SOME_F16 | 4 | Mostly 4 bit, tok_embeddings.weight and output.weight are f16 |
+| LLAMA_FTYPE_MOSTLY_Q5_0 | 8 | Mostly 5 bit |
+| LLAMA_FTYPE_MOSTLY_Q5_1 | 9 | Mostly 5 bit |
+| LLAMA_FTYPE_MOSTLY_Q2_K | 10 | K-Quant 2 bit |
+| LLAMA_FTYPE_MOSTLY_Q3_K_S | 11 | K-Quant 3 bit (Small) |
+| LLAMA_FTYPE_MOSTLY_Q3_K_M | 12 | K-Quant 3 bit (Medium) |
+| LLAMA_FTYPE_MOSTLY_Q3_K_L | 13 | K-Quant 3 bit (Large) |
+| LLAMA_FTYPE_MOSTLY_Q4_K_S | 14 | K-Quant 4 bit (Small) |
+| LLAMA_FTYPE_MOSTLY_Q4_K_M | 15 | K-Quant 4 bit (Medium) |
+| LLAMA_FTYPE_MOSTLY_Q5_K_S | 16 | K-Quant 5 bit (Small) |
+| LLAMA_FTYPE_MOSTLY_Q5_K_M | 17 | K-Quant 5 bit (Medium) |
+| LLAMA_FTYPE_MOSTLY_Q6_K | 18 | K-Quant 6 bit |
+| LLAMA_FTYPE_GUESSED | 1024 | File type was not specified |
--- a/docs/xmldocs/llama.native.llamagrammarelement.md
+++ b/docs/xmldocs/llama.native.llamagrammarelement.md
@ -0,0 +1,96 @@
+# LLamaGrammarElement
+
+Namespace: LLama.Native
+
+An element of a grammar
+
+```csharp
+public struct LLamaGrammarElement
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaGrammarElement](./llama.native.llamagrammarelement.md)<br>
+Implements [IEquatable&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+
+## Fields
+
+### **Type**
+
+The type of this element
+
+```csharp
+public LLamaGrammarElementType Type;
+```
+
+### **Value**
+
+Unicode code point or rule ID
+
+```csharp
+public uint Value;
+```
+
+## Constructors
+
+### **LLamaGrammarElement(LLamaGrammarElementType, UInt32)**
+
+Construct a new LLamaGrammarElement
+
+```csharp
+LLamaGrammarElement(LLamaGrammarElementType type, uint value)
+```
+
+#### Parameters
+
+`type` [LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)<br>
+
+`value` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+## Methods
+
+### **Equals(LLamaGrammarElement)**
+
+```csharp
+bool Equals(LLamaGrammarElement other)
+```
+
+#### Parameters
+
+`other` [LLamaGrammarElement](./llama.native.llamagrammarelement.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(Object)**
+
+```csharp
+bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GetHashCode()**
+
+```csharp
+int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **IsCharElement()**
+
+```csharp
+bool IsCharElement()
+```
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
--- a/docs/xmldocs/llama.native.llamagrammarelementtype.md
+++ b/docs/xmldocs/llama.native.llamagrammarelementtype.md
@ -0,0 +1,24 @@
+# LLamaGrammarElementType
+
+Namespace: LLama.Native
+
+grammar element type
+
+```csharp
+public enum LLamaGrammarElementType
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)<br>
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+
+## Fields
+
+| Name | Value | Description |
+| --- | --: | --- |
+| END | 0 | end of rule definition |
+| ALT | 1 | start of alternate definition for rule |
+| RULE_REF | 2 | non-terminal element: reference to rule |
+| CHAR | 3 | terminal element: character (code point) |
+| CHAR_NOT | 4 | inverse char(s) ([^a], [^a-b] [^abc]) |
+| CHAR_RNG_UPPER | 5 | modifies a preceding CHAR or CHAR_ALT to be an inclusive range ([a-z]) |
+| CHAR_ALT | 6 | modifies a preceding CHAR or CHAR_RNG_UPPER to add an alternate char to match ([ab], [a-zA]) |
--- a/docs/xmldocs/llama.native.llamamodelquantizeparams.md
+++ b/docs/xmldocs/llama.native.llamamodelquantizeparams.md
@ -0,0 +1,55 @@
+# LLamaModelQuantizeParams
+
+Namespace: LLama.Native
+
+Quantizer parameters used in the native API
+
+```csharp
+public struct LLamaModelQuantizeParams
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)
+
+## Fields
+
+### **nthread**
+
+number of threads to use for quantizing, if &lt;=0 will use std::thread::hardware_concurrency()
+
+```csharp
+public int nthread;
+```
+
+### **ftype**
+
+quantize to this llama_ftype
+
+```csharp
+public LLamaFtype ftype;
+```
+
+## Properties
+
+### **allow_requantize**
+
+allow quantizing non-f32/f16 tensors
+
+```csharp
+public bool allow_requantize { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **quantize_output_tensor**
+
+quantize output.weight
+
+```csharp
+public bool quantize_output_tensor { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
--- a/docs/xmldocs/llama.native.llamatokendataarray.md
+++ b/docs/xmldocs/llama.native.llamatokendataarray.md
@ -2,6 +2,8 @@

 Namespace: LLama.Native

+Contains an array of LLamaTokenData, potentially sorted.
+
 ```csharp
 public struct LLamaTokenDataArray
 ```
@ -12,34 +14,50 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)

 ### **data**

+The LLamaTokenData
+
 ```csharp
 public Memory<LLamaTokenData> data;
 ```

-### **size**
-
-```csharp
-public ulong size;
-```
-
 ### **sorted**

+Indicates if `data` is sorted by logits in descending order. If this is false the token data is in _no particular order_.
+
 ```csharp
 public bool sorted;
 ```

 ## Constructors

-### **LLamaTokenDataArray(LLamaTokenData[], UInt64, Boolean)**
+### **LLamaTokenDataArray(Memory&lt;LLamaTokenData&gt;, Boolean)**
+
+Create a new LLamaTokenDataArray

 ```csharp
-LLamaTokenDataArray(LLamaTokenData[] data, ulong size, bool sorted)
+LLamaTokenDataArray(Memory<LLamaTokenData> tokens, bool isSorted)
 ```

 #### Parameters

-`data` [LLamaTokenData[]](./llama.native.llamatokendata.md)<br>
+`tokens` [Memory&lt;LLamaTokenData&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>

-`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`isSorted` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

-`sorted` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+## Methods
+
+### **Create(ReadOnlySpan&lt;Single&gt;)**
+
+Create a new LLamaTokenDataArray, copying the data from the given logits
+
+```csharp
+LLamaTokenDataArray Create(ReadOnlySpan<float> logits)
+```
+
+#### Parameters
+
+`logits` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+
+#### Returns
+
+[LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
--- a/docs/xmldocs/llama.native.llamatokendataarraynative.md
+++ b/docs/xmldocs/llama.native.llamatokendataarraynative.md
@ -2,6 +2,8 @@

 Namespace: LLama.Native

+Contains a pointer to an array of LLamaTokenData which is pinned in memory.
+
 ```csharp
 public struct LLamaTokenDataArrayNative
 ```
@ -12,18 +14,57 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)

 ### **data**

+A pointer to an array of LlamaTokenData
+
 ```csharp
 public IntPtr data;
 ```

+**Remarks:**
+
+Memory must be pinned in place for all the time this LLamaTokenDataArrayNative is in use
+
 ### **size**

+Number of LLamaTokenData in the array
+
 ```csharp
 public ulong size;
 ```

+## Properties
+
 ### **sorted**

+Indicates if the items in the array are sorted
+
 ```csharp
-public bool sorted;
+public bool sorted { get; set; }
 ```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Create(LLamaTokenDataArray, LLamaTokenDataArrayNative&)**
+
+Create a new LLamaTokenDataArrayNative around the data in the LLamaTokenDataArray
+
+```csharp
+MemoryHandle Create(LLamaTokenDataArray array, LLamaTokenDataArrayNative& native)
+```
+
+#### Parameters
+
+`array` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Data source
+
+`native` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+Created native array
+
+#### Returns
+
+[MemoryHandle](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memoryhandle)<br>
+A memory handle, pinning the data in place until disposed
--- a/docs/xmldocs/llama.native.nativeapi.md
+++ b/docs/xmldocs/llama.native.nativeapi.md
--- a/docs/xmldocs/llama.native.safellamacontexthandle.md
+++ b/docs/xmldocs/llama.native.safellamacontexthandle.md
@ -2,8 +2,10 @@

 Namespace: LLama.Native

+A safe wrapper around a llama_context
+
 ```csharp
-public class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable
+public sealed class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
@ -11,6 +13,54 @@ Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idis

 ## Properties

+### **VocabCount**
+
+Total number of tokens in vocabulary of this model
+
+```csharp
+public int VocabCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ContextSize**
+
+Total number of tokens in the context
+
+```csharp
+public int ContextSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ModelHandle**
+
+Get the model which this context is using
+
+```csharp
+public SafeLlamaModelHandle ModelHandle { get; }
+```
+
+#### Property Value
+
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
 ### **IsInvalid**

 ```csharp
@ -33,15 +83,21 @@ public bool IsClosed { get; }

 ## Constructors

-### **SafeLLamaContextHandle(IntPtr)**
+### **SafeLLamaContextHandle(IntPtr, SafeLlamaModelHandle)**
+
+Create a new SafeLLamaContextHandle

 ```csharp
-public SafeLLamaContextHandle(IntPtr handle)
+public SafeLLamaContextHandle(IntPtr handle, SafeLlamaModelHandle model)
 ```

 #### Parameters

 `handle` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+pointer to an allocated llama_context
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+the model which this context was created from

 ## Methods

@ -54,3 +110,265 @@ protected bool ReleaseHandle()
 #### Returns

 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Create(SafeLlamaModelHandle, LLamaContextParams)**
+
+Create a new llama_state for the given model
+
+```csharp
+public static SafeLLamaContextHandle Create(SafeLlamaModelHandle model, LLamaContextParams lparams)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+`lparams` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Clone(LLamaContextParams)**
+
+Create a new llama context with a clone of the current llama context state
+
+```csharp
+public SafeLLamaContextHandle Clone(LLamaContextParams lparams)
+```
+
+#### Parameters
+
+`lparams` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+### **Tokenize(String, Boolean, Encoding)**
+
+Convert the given text into tokens
+
+```csharp
+public Int32[] Tokenize(string text, bool add_bos, Encoding encoding)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The text to tokenize
+
+`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether the "BOS" token should be added
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+Encoding to use for the text
+
+#### Returns
+
+[Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **GetLogits()**
+
+Token logits obtained from the last call to llama_eval()
+ The logits for the last token are stored in the last row
+ Can be mutated in order to change the probabilities of the next token.<br>
+ Rows: n_tokens<br>
+ Cols: n_vocab
+
+```csharp
+public Span<float> GetLogits()
+```
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+### **TokenToString(Int32, Encoding)**
+
+Convert a token into a string
+
+```csharp
+public string TokenToString(int token, Encoding encoding)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode into a string
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **TokenToString(Int32, Encoding, StringBuilder)**
+
+Append a single llama token to a string builder
+
+```csharp
+public void TokenToString(int token, Encoding encoding, StringBuilder dest)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+`dest` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+string builder to append the result to
+
+### **TokenToSpan(Int32, Span&lt;Byte&gt;)**
+
+Convert a single llama token into bytes
+
+```csharp
+public int TokenToSpan(int token, Span<byte> dest)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode
+
+`dest` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+A span to attempt to write into. If this is too small nothing will be written
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The size of this token. **nothing will be written** if this is larger than `dest`
+
+### **Eval(ReadOnlySpan&lt;Int32&gt;, Int32, Int32)**
+
+Run the llama inference to obtain the logits and probabilities for the next token.
+
+```csharp
+public bool Eval(ReadOnlySpan<int> tokens, int n_past, int n_threads)
+```
+
+#### Parameters
+
+`tokens` [ReadOnlySpan&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+The provided batch of new tokens to process
+
+`n_past` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+the number of tokens to use from previous eval calls
+
+`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Returns true on success
+
+### **GetStateSize()**
+
+Get the size of the state, when saved as bytes
+
+```csharp
+public ulong GetStateSize()
+```
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **GetState(Byte*, UInt64)**
+
+Get the raw state of this context, encoded as bytes. Data is written into the `dest` pointer.
+
+```csharp
+public ulong GetState(Byte* dest, ulong size)
+```
+
+#### Parameters
+
+`dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+Destination to write to
+
+`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Number of bytes available to write to in dest (check required size with `GetStateSize()`)
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+The number of bytes written to dest
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+Thrown if dest is too small
+
+### **GetState(IntPtr, UInt64)**
+
+Get the raw state of this context, encoded as bytes. Data is written into the `dest` pointer.
+
+```csharp
+public ulong GetState(IntPtr dest, ulong size)
+```
+
+#### Parameters
+
+`dest` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Destination to write to
+
+`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Number of bytes available to write to in dest (check required size with `GetStateSize()`)
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+The number of bytes written to dest
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+Thrown if dest is too small
+
+### **SetState(Byte*)**
+
+Set the raw state of this context
+
+```csharp
+public ulong SetState(Byte* src)
+```
+
+#### Parameters
+
+`src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+The pointer to read the state from
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Number of bytes read from the src pointer
+
+### **SetState(IntPtr)**
+
+Set the raw state of this context
+
+```csharp
+public ulong SetState(IntPtr src)
+```
+
+#### Parameters
+
+`src` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+The pointer to read the state from
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Number of bytes read from the src pointer
--- a/docs/xmldocs/llama.native.safellamagrammarhandle.md
+++ b/docs/xmldocs/llama.native.safellamagrammarhandle.md
@ -0,0 +1,97 @@
+# SafeLLamaGrammarHandle
+
+Namespace: LLama.Native
+
+A safe reference to a `llama_grammar`
+
+```csharp
+public class SafeLLamaGrammarHandle : SafeLLamaHandleBase, System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+
+## Properties
+
+### **IsInvalid**
+
+```csharp
+public bool IsInvalid { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsClosed**
+
+```csharp
+public bool IsClosed { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **ReleaseHandle()**
+
+```csharp
+protected bool ReleaseHandle()
+```
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Create(IReadOnlyList&lt;GrammarRule&gt;, UInt64)**
+
+Create a new llama_grammar
+
+```csharp
+public static SafeLLamaGrammarHandle Create(IReadOnlyList<GrammarRule> rules, ulong start_rule_index)
+```
+
+#### Parameters
+
+`rules` [IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+A list of list of elements, each inner list makes up one grammar rule
+
+`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+The index (in the outer list) of the start rule
+
+#### Returns
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Create(LLamaGrammarElement**, UInt64, UInt64)**
+
+Create a new llama_grammar
+
+```csharp
+public static SafeLLamaGrammarHandle Create(LLamaGrammarElement** rules, ulong nrules, ulong start_rule_index)
+```
+
+#### Parameters
+
+`rules` [LLamaGrammarElement**](./llama.native.llamagrammarelement**.md)<br>
+rules list, each rule is a list of rule elements (terminated by a LLamaGrammarElementType.END element)
+
+`nrules` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+total number of rules
+
+`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+index of the start rule of the grammar
+
+#### Returns
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
--- a/docs/xmldocs/llama.native.safellamahandlebase.md
+++ b/docs/xmldocs/llama.native.safellamahandlebase.md
@ -2,6 +2,8 @@

 Namespace: LLama.Native

+Base class for all llama handles to native resources
+
 ```csharp
 public abstract class SafeLLamaHandleBase : System.Runtime.InteropServices.SafeHandle, System.IDisposable
 ```
--- a/docs/xmldocs/llama.native.safellamamodelhandle.md
+++ b/docs/xmldocs/llama.native.safellamamodelhandle.md
@ -0,0 +1,220 @@
+# SafeLlamaModelHandle
+
+Namespace: LLama.Native
+
+A reference to a set of llama model weights
+
+```csharp
+public sealed class SafeLlamaModelHandle : SafeLLamaHandleBase, System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+
+## Properties
+
+### **VocabCount**
+
+Total number of tokens in vocabulary of this model
+
+```csharp
+public int VocabCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ContextSize**
+
+Total number of tokens in the context
+
+```csharp
+public int ContextSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **IsInvalid**
+
+```csharp
+public bool IsInvalid { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsClosed**
+
+```csharp
+public bool IsClosed { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **ReleaseHandle()**
+
+```csharp
+protected bool ReleaseHandle()
+```
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **LoadFromFile(String, LLamaContextParams)**
+
+Load a model from the given file path into memory
+
+```csharp
+public static SafeLlamaModelHandle LoadFromFile(string modelPath, LLamaContextParams lparams)
+```
+
+#### Parameters
+
+`modelPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`lparams` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **ApplyLoraFromFile(String, String, Int32)**
+
+Apply a LoRA adapter to a loaded model
+
+```csharp
+public void ApplyLoraFromFile(string lora, string modelBase, int threads)
+```
+
+#### Parameters
+
+`lora` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`modelBase` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+A path to a higher quality model to use as a base for the layers modified by the
+ adapter. Can be NULL to use the current loaded model.
+
+`threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **TokenToSpan(Int32, Span&lt;Byte&gt;)**
+
+Convert a single llama token into bytes
+
+```csharp
+public int TokenToSpan(int llama_token, Span<byte> dest)
+```
+
+#### Parameters
+
+`llama_token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode
+
+`dest` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+A span to attempt to write into. If this is too small nothing will be written
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The size of this token. **nothing will be written** if this is larger than `dest`
+
+### **TokenToString(Int32, Encoding)**
+
+Convert a single llama token into a string
+
+```csharp
+public string TokenToString(int llama_token, Encoding encoding)
+```
+
+#### Parameters
+
+`llama_token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+Encoding to use to decode the bytes into a string
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **TokenToString(Int32, Encoding, StringBuilder)**
+
+Append a single llama token to a string builder
+
+```csharp
+public void TokenToString(int llama_token, Encoding encoding, StringBuilder dest)
+```
+
+#### Parameters
+
+`llama_token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+`dest` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+string builder to append the result to
+
+### **Tokenize(String, Boolean, Encoding)**
+
+Convert a string of text into tokens
+
+```csharp
+public Int32[] Tokenize(string text, bool add_bos, Encoding encoding)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **CreateContext(LLamaContextParams)**
+
+Create a new context for this model
+
+```csharp
+public SafeLLamaContextHandle CreateContext(LLamaContextParams params)
+```
+
+#### Parameters
+
+`params` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
--- a/docs/xmldocs/llama.native.samplingapi.md
+++ b/docs/xmldocs/llama.native.samplingapi.md
@ -0,0 +1,338 @@
+# SamplingApi
+
+Namespace: LLama.Native
+
+Direct translation of the llama.cpp sampling API
+
+```csharp
+public class SamplingApi
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [SamplingApi](./llama.native.samplingapi.md)
+
+## Constructors
+
+### **SamplingApi()**
+
+```csharp
+public SamplingApi()
+```
+
+## Methods
+
+### **llama_sample_grammar(SafeLLamaContextHandle, LLamaTokenDataArray, SafeLLamaGrammarHandle)**
+
+Apply grammar rules to candidate tokens
+
+```csharp
+public static void llama_sample_grammar(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, SafeLLamaGrammarHandle grammar)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+
+`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
+### **llama_sample_repetition_penalty(SafeLLamaContextHandle, LLamaTokenDataArray, Memory&lt;Int32&gt;, UInt64, Single)**
+
+#### Caution
+
+last_tokens_size parameter is no longer needed
+
+---
+
+Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
+
+```csharp
+public static void llama_sample_repetition_penalty(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, ulong last_tokens_size, float penalty)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`last_tokens` [Memory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+`penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_repetition_penalty(SafeLLamaContextHandle, LLamaTokenDataArray, Memory&lt;Int32&gt;, Single)**
+
+Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
+
+```csharp
+public static void llama_sample_repetition_penalty(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, float penalty)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`last_tokens` [Memory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle, LLamaTokenDataArray, Memory&lt;Int32&gt;, UInt64, Single, Single)**
+
+#### Caution
+
+last_tokens_size parameter is no longer needed
+
+---
+
+Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
+
+```csharp
+public static void llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, ulong last_tokens_size, float alpha_frequency, float alpha_presence)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`last_tokens` [Memory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+`alpha_frequency` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`alpha_presence` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle, LLamaTokenDataArray, Memory&lt;Int32&gt;, Single, Single)**
+
+Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
+
+```csharp
+public static void llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, float alpha_frequency, float alpha_presence)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`last_tokens` [Memory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`alpha_frequency` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`alpha_presence` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_softmax(SafeLLamaContextHandle, LLamaTokenDataArray)**
+
+Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
+
+```csharp
+public static void llama_sample_softmax(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+### **llama_sample_top_k(SafeLLamaContextHandle, LLamaTokenDataArray, Int32, UInt64)**
+
+Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+
+```csharp
+public static void llama_sample_top_k(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, int k, ulong min_keep)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_top_p(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)**
+
+Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+
+```csharp
+public static void llama_sample_top_p(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float p, ulong min_keep)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_tail_free(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)**
+
+Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
+
+```csharp
+public static void llama_sample_tail_free(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float z, ulong min_keep)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)**
+
+Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
+
+```csharp
+public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float p, ulong min_keep)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_temperature(SafeLLamaContextHandle, LLamaTokenDataArray, Single)**
+
+Sample with temperature.
+ As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual
+
+```csharp
+public static void llama_sample_temperature(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float temp)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+
+`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_token_mirostat(SafeLLamaContextHandle, LLamaTokenDataArray, Single, Single, Int32, Single&)**
+
+Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+
+```csharp
+public static int llama_sample_token_mirostat(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float tau, float eta, int m, Single& mu)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+A vector of `LLamaTokenData` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+
+`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+
+`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+
+`m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
+
+`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
+Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token_mirostat_v2(SafeLLamaContextHandle, LLamaTokenDataArray, Single, Single, Single&)**
+
+Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+
+```csharp
+public static int llama_sample_token_mirostat_v2(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float tau, float eta, Single& mu)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+A vector of `LLamaTokenData` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+
+`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+
+`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+
+`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
+Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token_greedy(SafeLLamaContextHandle, LLamaTokenDataArray)**
+
+Selects the token with the highest probability.
+
+```csharp
+public static int llama_sample_token_greedy(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token(SafeLLamaContextHandle, LLamaTokenDataArray)**
+
+Randomly selects a token from the candidates based on their probabilities.
+
+```csharp
+public static int llama_sample_token(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
--- a/docs/xmldocs/llama.oldversion.chatcompletion.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletion.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletion : System.IEquatable`1[[LLama.OldVersion.ChatCompletion, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletion : System.IEquatable`1[[LLama.OldVersion.ChatCompletion, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletion](./llama.oldversion.chatcompletion.md)<br>
--- a/docs/xmldocs/llama.oldversion.chatcompletionchoice.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionchoice.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionChoice : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChoice, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionChoice : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChoice, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChoice](./llama.oldversion.chatcompletionchoice.md)<br>
--- a/docs/xmldocs/llama.oldversion.chatcompletionchunk.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionchunk.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionChunk : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunk, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionChunk : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunk, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunk](./llama.oldversion.chatcompletionchunk.md)<br>
--- a/docs/xmldocs/llama.oldversion.chatcompletionchunkchoice.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionchunkchoice.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionChunkChoice : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunkChoice, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionChunkChoice : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunkChoice, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunkChoice](./llama.oldversion.chatcompletionchunkchoice.md)<br>
--- a/docs/xmldocs/llama.oldversion.chatcompletionchunkdelta.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionchunkdelta.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionChunkDelta : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunkDelta, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionChunkDelta : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunkDelta, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunkDelta](./llama.oldversion.chatcompletionchunkdelta.md)<br>
--- a/docs/xmldocs/llama.oldversion.chatcompletionmessage.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionmessage.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionMessage : System.IEquatable`1[[LLama.OldVersion.ChatCompletionMessage, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionMessage : System.IEquatable`1[[LLama.OldVersion.ChatCompletionMessage, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionMessage](./llama.oldversion.chatcompletionmessage.md)<br>
--- a/docs/xmldocs/llama.oldversion.chatmessagerecord.md
+++ b/docs/xmldocs/llama.oldversion.chatmessagerecord.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatMessageRecord : System.IEquatable`1[[LLama.OldVersion.ChatMessageRecord, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatMessageRecord : System.IEquatable`1[[LLama.OldVersion.ChatMessageRecord, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatMessageRecord](./llama.oldversion.chatmessagerecord.md)<br>
--- a/docs/xmldocs/llama.oldversion.chatsession-1.md
+++ b/docs/xmldocs/llama.oldversion.chatsession-1.md
@ -2,6 +2,12 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public class ChatSession<T>
 ```
@ -78,7 +84,7 @@ public ChatSession<T> WithPromptFile(string promptFilename, string encoding)

 ### **WithAntiprompt(String[])**

-Set the keyword to split the return value of chat AI.
+Set the keywords to split the return value of chat AI.

 ```csharp
 public ChatSession<T> WithAntiprompt(String[] antiprompt)
--- a/docs/xmldocs/llama.oldversion.completion.md
+++ b/docs/xmldocs/llama.oldversion.completion.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class Completion : System.IEquatable`1[[LLama.OldVersion.Completion, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class Completion : System.IEquatable`1[[LLama.OldVersion.Completion, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Completion](./llama.oldversion.completion.md)<br>
--- a/docs/xmldocs/llama.oldversion.completionchoice.md
+++ b/docs/xmldocs/llama.oldversion.completionchoice.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class CompletionChoice : System.IEquatable`1[[LLama.OldVersion.CompletionChoice, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class CompletionChoice : System.IEquatable`1[[LLama.OldVersion.CompletionChoice, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionChoice](./llama.oldversion.completionchoice.md)<br>
--- a/docs/xmldocs/llama.oldversion.completionchunk.md
+++ b/docs/xmldocs/llama.oldversion.completionchunk.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class CompletionChunk : System.IEquatable`1[[LLama.OldVersion.CompletionChunk, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class CompletionChunk : System.IEquatable`1[[LLama.OldVersion.CompletionChunk, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionChunk](./llama.oldversion.completionchunk.md)<br>
--- a/docs/xmldocs/llama.oldversion.completionlogprobs.md
+++ b/docs/xmldocs/llama.oldversion.completionlogprobs.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class CompletionLogprobs : System.IEquatable`1[[LLama.OldVersion.CompletionLogprobs, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class CompletionLogprobs : System.IEquatable`1[[LLama.OldVersion.CompletionLogprobs, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionLogprobs](./llama.oldversion.completionlogprobs.md)<br>
--- a/docs/xmldocs/llama.oldversion.completionusage.md
+++ b/docs/xmldocs/llama.oldversion.completionusage.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class CompletionUsage : System.IEquatable`1[[LLama.OldVersion.CompletionUsage, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class CompletionUsage : System.IEquatable`1[[LLama.OldVersion.CompletionUsage, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionUsage](./llama.oldversion.completionusage.md)<br>
--- a/docs/xmldocs/llama.oldversion.embedding.md
+++ b/docs/xmldocs/llama.oldversion.embedding.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class Embedding : System.IEquatable`1[[LLama.OldVersion.Embedding, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class Embedding : System.IEquatable`1[[LLama.OldVersion.Embedding, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Embedding](./llama.oldversion.embedding.md)<br>
--- a/docs/xmldocs/llama.oldversion.embeddingdata.md
+++ b/docs/xmldocs/llama.oldversion.embeddingdata.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class EmbeddingData : System.IEquatable`1[[LLama.OldVersion.EmbeddingData, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class EmbeddingData : System.IEquatable`1[[LLama.OldVersion.EmbeddingData, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [EmbeddingData](./llama.oldversion.embeddingdata.md)<br>
--- a/docs/xmldocs/llama.oldversion.embeddingusage.md
+++ b/docs/xmldocs/llama.oldversion.embeddingusage.md
@ -2,8 +2,14 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class EmbeddingUsage : System.IEquatable`1[[LLama.OldVersion.EmbeddingUsage, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class EmbeddingUsage : System.IEquatable`1[[LLama.OldVersion.EmbeddingUsage, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```

 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [EmbeddingUsage](./llama.oldversion.embeddingusage.md)<br>
--- a/docs/xmldocs/llama.oldversion.ichatmodel.md
+++ b/docs/xmldocs/llama.oldversion.ichatmodel.md
@ -2,6 +2,12 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public interface IChatModel
 ```
--- a/docs/xmldocs/llama.oldversion.llamaembedder.md
+++ b/docs/xmldocs/llama.oldversion.llamaembedder.md
@ -2,6 +2,12 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public class LLamaEmbedder : System.IDisposable
 ```
--- a/docs/xmldocs/llama.oldversion.llamamodel.md
+++ b/docs/xmldocs/llama.oldversion.llamamodel.md
@ -2,6 +2,12 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public class LLamaModel : IChatModel, System.IDisposable
 ```
--- a/docs/xmldocs/llama.oldversion.llamaparams.md
+++ b/docs/xmldocs/llama.oldversion.llamaparams.md
@ -2,6 +2,12 @@

 Namespace: LLama.OldVersion

+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public struct LLamaParams
 ```
--- a/docs/xmldocs/llama.resettablellamamodel.md
+++ b/docs/xmldocs/llama.resettablellamamodel.md
@ -1,101 +0,0 @@
-# ResettableLLamaModel
-
-Namespace: LLama
-
-A LLamaModel what could be reset. Note that using this class will consume about 10% more memories.
-
-```csharp
-public class ResettableLLamaModel : LLamaModel, System.IDisposable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaModel](./llama.llamamodel.md) → [ResettableLLamaModel](./llama.resettablellamamodel.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
-
-## Properties
-
-### **OriginalState**
-
-The initial state of the model
-
-```csharp
-public Byte[] OriginalState { get; set; }
-```
-
-#### Property Value
-
-[Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
-
-### **ContextSize**
-
-The context size.
-
-```csharp
-public int ContextSize { get; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Params**
-
-The model params set for this model.
-
-```csharp
-public ModelParams Params { get; set; }
-```
-
-#### Property Value
-
-[ModelParams](./llama.common.modelparams.md)<br>
-
-### **NativeHandle**
-
-The native handle, which is used to be passed to the native APIs. Please avoid using it 
- unless you know what is the usage of the Native API.
-
-```csharp
-public SafeLLamaContextHandle NativeHandle { get; }
-```
-
-#### Property Value
-
-[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-### **Encoding**
-
-The encoding set for this model to deal with text input.
-
-```csharp
-public Encoding Encoding { get; }
-```
-
-#### Property Value
-
-[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
-
-## Constructors
-
-### **ResettableLLamaModel(ModelParams, String)**
-
-
-
-```csharp
-public ResettableLLamaModel(ModelParams Params, string encoding)
-```
-
-#### Parameters
-
-`Params` [ModelParams](./llama.common.modelparams.md)<br>
-
-`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Methods
-
-### **Reset()**
-
-Reset the state to the initial state.
-
-```csharp
-public void Reset()
-```
--- a/docs/xmldocs/llama.statefulexecutorbase.md
+++ b/docs/xmldocs/llama.statefulexecutorbase.md
@ -13,17 +13,17 @@ Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)

 ## Properties

-### **Model**
+### **Context**

-The mode used by the executor.
+The context used by the executor.

 ```csharp
-public LLamaModel Model { get; }
+public LLamaContext Context { get; }
 ```

 #### Property Value

-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>

 ## Methods

@ -111,17 +111,17 @@ protected abstract void PreprocessInputs(string text, InferStateArgs args)

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>

-### **PostProcess(InferenceParams, InferStateArgs, IEnumerable`1&)**
+### **PostProcess(IInferenceParams, InferStateArgs, IEnumerable`1&)**

 Do some post processing after the inference.

 ```csharp
-protected abstract bool PostProcess(InferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
+protected abstract bool PostProcess(IInferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
 ```

 #### Parameters

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>

@ -131,17 +131,17 @@ protected abstract bool PostProcess(InferenceParams inferenceParams, InferStateA

 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

-### **InferInternal(InferenceParams, InferStateArgs)**
+### **InferInternal(IInferenceParams, InferStateArgs)**

 The core inference logic.

 ```csharp
-protected abstract void InferInternal(InferenceParams inferenceParams, InferStateArgs args)
+protected abstract void InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
 ```

 #### Parameters

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>

@ -193,19 +193,19 @@ public abstract void LoadState(string filename)

 `filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

-### **Infer(String, InferenceParams, CancellationToken)**
+### **Infer(String, IInferenceParams, CancellationToken)**

 Execute the inference.

 ```csharp
-public IEnumerable<string> Infer(string text, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IEnumerable<string> Infer(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```

 #### Parameters

 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>

@ -213,19 +213,19 @@ public IEnumerable<string> Infer(string text, InferenceParams inferenceParams, C

 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>

-### **InferAsync(String, InferenceParams, CancellationToken)**
+### **InferAsync(String, IInferenceParams, CancellationToken)**

 Execute the inference asynchronously.

 ```csharp
-public IAsyncEnumerable<string> InferAsync(string text, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IAsyncEnumerable<string> InferAsync(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```

 #### Parameters

 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>

--- a/docs/xmldocs/llama.statelessexecutor.md
+++ b/docs/xmldocs/llama.statelessexecutor.md
@ -14,46 +14,65 @@ Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)

 ## Properties

-### **Model**
+### **Context**

-The mode used by the executor when running the inference.
+The context used by the executor when running the inference.

 ```csharp
-public LLamaModel Model { get; }
+public LLamaContext Context { get; private set; }
 ```

 #### Property Value

-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>

 ## Constructors

-### **StatelessExecutor(LLamaModel)**
-
+### **StatelessExecutor(LLamaWeights, IModelParams)**

+Create a new stateless executor which will use the given model

 ```csharp
-public StatelessExecutor(LLamaModel model)
+public StatelessExecutor(LLamaWeights weights, IModelParams params)
 ```

 #### Parameters

-`model` [LLamaModel](./llama.llamamodel.md)<br>
-The LLama model.
+`weights` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+### **StatelessExecutor(LLamaContext)**
+
+#### Caution
+
+Use the constructor which automatically creates contexts using the LLamaWeights
+
+---
+
+Create a new stateless executor which will use the model used to create the given context
+
+```csharp
+public StatelessExecutor(LLamaContext context)
+```
+
+#### Parameters
+
+`context` [LLamaContext](./llama.llamacontext.md)<br>

 ## Methods

-### **Infer(String, InferenceParams, CancellationToken)**
+### **Infer(String, IInferenceParams, CancellationToken)**

 ```csharp
-public IEnumerable<string> Infer(string text, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IEnumerable<string> Infer(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```

 #### Parameters

 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>

@ -61,19 +80,19 @@ public IEnumerable<string> Infer(string text, InferenceParams inferenceParams, C

 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>

-### **InferAsync(String, InferenceParams, CancellationToken)**
+### **InferAsync(String, IInferenceParams, CancellationToken)**

 ```csharp
-public IAsyncEnumerable<string> InferAsync(string text, InferenceParams inferenceParams, CancellationToken token)
+public IAsyncEnumerable<string> InferAsync(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```

 #### Parameters

 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>

-`token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>

 #### Returns

--- a/docs/xmldocs/llama.utils.md
+++ b/docs/xmldocs/llama.utils.md
@ -0,0 +1,157 @@
+# Utils
+
+Namespace: LLama
+
+Assorted llama utilities
+
+```csharp
+public static class Utils
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Utils](./llama.utils.md)
+
+## Methods
+
+### **InitLLamaContextFromModelParams(IModelParams)**
+
+#### Caution
+
+Use LLamaWeights.LoadFromFile and LLamaWeights.CreateContext instead
+
+---
+
+```csharp
+public static SafeLLamaContextHandle InitLLamaContextFromModelParams(IModelParams params)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+#### Returns
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+### **Tokenize(SafeLLamaContextHandle, String, Boolean, Encoding)**
+
+#### Caution
+
+Use SafeLLamaContextHandle Tokenize method instead
+
+---
+
+```csharp
+public static IEnumerable<int> Tokenize(SafeLLamaContextHandle ctx, string text, bool add_bos, Encoding encoding)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+### **GetLogits(SafeLLamaContextHandle, Int32)**
+
+#### Caution
+
+Use SafeLLamaContextHandle GetLogits method instead
+
+---
+
+```csharp
+public static Span<float> GetLogits(SafeLLamaContextHandle ctx, int length)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+### **Eval(SafeLLamaContextHandle, Int32[], Int32, Int32, Int32, Int32)**
+
+#### Caution
+
+Use SafeLLamaContextHandle Eval method instead
+
+---
+
+```csharp
+public static int Eval(SafeLLamaContextHandle ctx, Int32[] tokens, int startIndex, int n_tokens, int n_past, int n_threads)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`tokens` [Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`startIndex` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`n_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`n_past` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **TokenToString(Int32, SafeLLamaContextHandle, Encoding)**
+
+#### Caution
+
+Use SafeLLamaContextHandle TokenToString method instead
+
+---
+
+```csharp
+public static string TokenToString(int token, SafeLLamaContextHandle ctx, Encoding encoding)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **PtrToString(IntPtr, Encoding)**
+
+#### Caution
+
+No longer used internally by LlamaSharp
+
+---
+
+```csharp
+public static string PtrToString(IntPtr ptr, Encoding encoding)
+```
+
+#### Parameters
+
+`ptr` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -5,12 +5,12 @@ nav:
    - Architecture: Architecture.md
    - Tricks for FAQ: Tricks.md
    - Contributing Guide: ContributingGuide.md
-    - LLamaModel:
-        - Model Parameters: LLamaModel/parameters.md
-        - Tokenization: LLamaModel/tokenization.md
-        - Get Embeddings: LLamaModel/embeddings.md
-        - Quantization: LLamaModel/quantization.md
-        - Save/Load State: LLamaModel/save-load-state.md
+    - LLamaContext:
+        - Context Parameters: LLamaContext/parameters.md
+        - Tokenization: LLamaContext/tokenization.md
+        - Get Embeddings: LLamaContext/embeddings.md
+        - Quantization: LLamaContext/quantization.md
+        - Save/Load State: LLamaContext/save-load-state.md
    - LLamaExecutors:
        - Inference Parameters: LLamaExecutors/parameters.md
        - Text-to-Text APIs: LLamaExecutors/text-to-text-apis.md
@ -24,6 +24,7 @@ nav:
        - Chinese: NonEnglishUsage/Chinese.md
    - High-level Applications:
        - BotSharp: HighLevelApps/bot-sharp.md
+        - semantic-kernel: HighLevelApps/semantic-kernel.md
    - More:
        - Logger: More/log.md
    - Examples:
@ -39,7 +40,9 @@ nav:
    - API Reference:
        - index: ./xmldocs/index.md
        - llama.abstractions.ihistorytransform: ./xmldocs/llama.abstractions.ihistorytransform.md
+        - llama.abstractions.iinferenceparams: ./xmldocs/llama.abstractions.iinferenceparams.md
        - llama.abstractions.illamaexecutor: ./xmldocs/llama.abstractions.illamaexecutor.md
+        - llama.abstractions.imodelparams: ./xmldocs/llama.abstractions.imodelparams.md
        - llama.abstractions.itextstreamtransform: ./xmldocs/llama.abstractions.itextstreamtransform.md
        - llama.abstractions.itexttransform: ./xmldocs/llama.abstractions.itexttransform.md
        - llama.chatsession: ./xmldocs/llama.chatsession.md
@ -49,24 +52,44 @@ nav:
        - llama.common.illamalogger: ./xmldocs/llama.common.illamalogger.md
        - llama.common.inferenceparams: ./xmldocs/llama.common.inferenceparams.md
        - llama.common.llamadefaultlogger: ./xmldocs/llama.common.llamadefaultlogger.md
-        - llama.common.mirostatetype: ./xmldocs/llama.common.mirostatetype.md
+        - llama.common.mirostattype: ./xmldocs/llama.common.mirostattype.md
        - llama.common.modelparams: ./xmldocs/llama.common.modelparams.md
+        - llama.exceptions.grammarexpectedname: ./xmldocs/llama.exceptions.grammarexpectedname.md
+        - llama.exceptions.grammarexpectednext: ./xmldocs/llama.exceptions.grammarexpectednext.md
+        - llama.exceptions.grammarexpectedprevious: ./xmldocs/llama.exceptions.grammarexpectedprevious.md
+        - llama.exceptions.grammarformatexception: ./xmldocs/llama.exceptions.grammarformatexception.md
+        - llama.exceptions.grammarunexpectedcharaltelement: ./xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md
+        - llama.exceptions.grammarunexpectedcharrngelement: ./xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md
+        - llama.exceptions.grammarunexpectedendelement: ./xmldocs/llama.exceptions.grammarunexpectedendelement.md
+        - llama.exceptions.grammarunexpectedendofinput: ./xmldocs/llama.exceptions.grammarunexpectedendofinput.md
+        - llama.exceptions.grammarunexpectedhexcharscount: ./xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md
+        - llama.exceptions.grammarunknownescapecharacter: ./xmldocs/llama.exceptions.grammarunknownescapecharacter.md
        - llama.exceptions.runtimeerror: ./xmldocs/llama.exceptions.runtimeerror.md
-        - llama.extensions.dictionaryextension: ./xmldocs/llama.extensions.dictionaryextension.md
+        - llama.extensions.imodelparamsextensions: ./xmldocs/llama.extensions.imodelparamsextensions.md
+        - llama.extensions.keyvaluepairextensions: ./xmldocs/llama.extensions.keyvaluepairextensions.md
+        - llama.grammars.grammar: ./xmldocs/llama.grammars.grammar.md
+        - llama.grammars.grammarrule: ./xmldocs/llama.grammars.grammarrule.md
        - llama.instructexecutor: ./xmldocs/llama.instructexecutor.md
        - llama.interactiveexecutor: ./xmldocs/llama.interactiveexecutor.md
+        - llama.llamacontext: ./xmldocs/llama.llamacontext.md
        - llama.llamaembedder: ./xmldocs/llama.llamaembedder.md
-        - llama.llamamodel: ./xmldocs/llama.llamamodel.md
        - llama.llamaquantizer: ./xmldocs/llama.llamaquantizer.md
        - llama.llamatransforms: ./xmldocs/llama.llamatransforms.md
+        - llama.llamaweights: ./xmldocs/llama.llamaweights.md
        - llama.native.llamacontextparams: ./xmldocs/llama.native.llamacontextparams.md
        - llama.native.llamaftype: ./xmldocs/llama.native.llamaftype.md
+        - llama.native.llamagrammarelement: ./xmldocs/llama.native.llamagrammarelement.md
+        - llama.native.llamagrammarelementtype: ./xmldocs/llama.native.llamagrammarelementtype.md
+        - llama.native.llamamodelquantizeparams: ./xmldocs/llama.native.llamamodelquantizeparams.md
        - llama.native.llamatokendata: ./xmldocs/llama.native.llamatokendata.md
        - llama.native.llamatokendataarray: ./xmldocs/llama.native.llamatokendataarray.md
        - llama.native.llamatokendataarraynative: ./xmldocs/llama.native.llamatokendataarraynative.md
        - llama.native.nativeapi: ./xmldocs/llama.native.nativeapi.md
        - llama.native.safellamacontexthandle: ./xmldocs/llama.native.safellamacontexthandle.md
+        - llama.native.safellamagrammarhandle: ./xmldocs/llama.native.safellamagrammarhandle.md
        - llama.native.safellamahandlebase: ./xmldocs/llama.native.safellamahandlebase.md
+        - llama.native.safellamamodelhandle: ./xmldocs/llama.native.safellamamodelhandle.md
+        - llama.native.samplingapi: ./xmldocs/llama.native.samplingapi.md
        - llama.oldversion.chatcompletion: ./xmldocs/llama.oldversion.chatcompletion.md
        - llama.oldversion.chatcompletionchoice: ./xmldocs/llama.oldversion.chatcompletionchoice.md
        - llama.oldversion.chatcompletionchunk: ./xmldocs/llama.oldversion.chatcompletionchunk.md
@ -88,9 +111,9 @@ nav:
        - llama.oldversion.llamaembedder: ./xmldocs/llama.oldversion.llamaembedder.md
        - llama.oldversion.llamamodel: ./xmldocs/llama.oldversion.llamamodel.md
        - llama.oldversion.llamaparams: ./xmldocs/llama.oldversion.llamaparams.md
-        - llama.resettablellamamodel: ./xmldocs/llama.resettablellamamodel.md
        - llama.statefulexecutorbase: ./xmldocs/llama.statefulexecutorbase.md
        - llama.statelessexecutor: ./xmldocs/llama.statelessexecutor.md
+        - llama.utils: ./xmldocs/llama.utils.md

 theme:
  name: material