docs: refactor the documentations.

This commit is contained in:
Rinne 2024-03-31 20:44:29 +08:00
parent 7e53bac1f0
commit b9444452eb
No known key found for this signature in database
GPG Key ID: E86D01E1809BD23E
193 changed files with 11935 additions and 5971 deletions

Binary file not shown.

View File

@ -3,6 +3,7 @@ using LLama.Common;
namespace LLama.Examples.Examples;
// This example shows how to deal with Chinese input with gb2312 encoding.
public class ChatChineseGB2312
{
private static string ConvertEncoding(string input, Encoding original, Encoding target)

View File

@ -2,6 +2,8 @@
namespace LLama.Examples.Examples;
// When using chatsession, it's a common case that you want to strip the role names
// rather than display them. This example shows how to use transforms to strip them.
public class ChatSessionStripRoleName
{
public static async Task Run()

View File

@ -2,6 +2,7 @@
namespace LLama.Examples.Examples
{
// This example shows how to use InstructExecutor to generate the response.
public class InstructModeExecute
{
public static async Task Run()

View File

@ -2,6 +2,7 @@
namespace LLama.Examples.Examples
{
// This is an example which shows how to chat with LLM with InteractiveExecutor.
public class InteractiveModeExecute
{
public static async Task Run()

View File

@ -5,6 +5,8 @@ using Spectre.Console;
namespace LLama.Examples.Examples
{
// This example shows how to chat with LLaVA model with both image and text as input.
// It uses the interactive executor to inference.
public class LlavaInteractiveModeExecute
{
public static async Task Run()

View File

@ -2,6 +2,7 @@
namespace LLama.Examples.Examples
{
// This example shows how to save/load state of the executor.
public class LoadAndSaveState
{
public static async Task Run()

View File

@ -82,7 +82,7 @@ The following examples show how to build APPs with LLamaSharp.
To gain high performance, LLamaSharp interacts with a native library compiled from c++, which is called `backend`. We provide backend packages for Windows, Linux and MAC with CPU, Cuda, Metal and OpenCL. You **don't** need to handle anything about c++ but just install the backend packages.
If no published backend match your device, please open an issue to let us know. If compiling c++ code is not difficult for you, you could also follow [this guide]() to compile a backend and run LLamaSharp with it.
If no published backend match your device, please open an issue to let us know. If compiling c++ code is not difficult for you, you could also follow [this guide](./docs/ContributingGuide.md) to compile a backend and run LLamaSharp with it.
1. Install [LLamaSharp](https://www.nuget.org/packages/LLamaSharp) package on NuGet:
@ -102,7 +102,7 @@ PM> Install-Package LLamaSharp
### Model preparation
There are two popular format of model file of LLM now, which are PyTorch format (.bin) and Huggingface format (.safetensors). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. To get `GGUF` file, there are two options:
There are two popular format of model file of LLM now, which are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. To get `GGUF` file, there are two options:
1. Search model name + 'gguf' in [Huggingface](https://huggingface.co), you will find lots of model files that have already been converted to GGUF format. Please take care of the publishing time of them because some old ones could only work with old version of LLamaSharp.

View File

@ -2,22 +2,14 @@
## Architecture of main functions
The figure below shows the core framework structure, which is separated to four levels.
The figure below shows the core framework structure of LLamaSharp.
- **LLamaContext**: The holder of a model which directly interact with native library and provide some basic APIs such as tokenization and embedding. Currently it includes three classes: `LLamaContext`, `LLamaEmbedder` and `LLamaQuantizer`.
- **LLamaExecutors**: Executors which define the way to run the LLama model. It provides text-to-text APIs to make it easy to use. Currently we provide three kinds of executors: `InteractiveExecutor`, `InstructExecutor` and `StatelessExecutor`.
- **Native APIs**: LLamaSharp calls the exported C APIs to load and run the model. The APIs defined in LLamaSharp specially for calling C APIs are named `Native APIs`. We have made all the native APIs public under namespace `LLama.Native`. However, it's strongly recommended not to use them unless you know what you are doing.
- **LLamaWeights**: The holder of the model weight.
- **LLamaContext**: A context which directly interact with the native library and provide some basic APIs such as tokenization and embedding. It takes use of `LLamaWeights`.
- **LLamaExecutors**: Executors which define the way to run the LLama model. It provides text-to-text and image-to-text APIs to make it easy to use. Currently we provide four kinds of executors: `InteractiveExecutor`, `InstructExecutor`, `StatelessExecutor` and `BatchedExecutor`.
- **ChatSession**: A wrapping for `InteractiveExecutor` and `LLamaContext`, which supports interactive tasks and saving/re-loading sessions. It also provides a flexible way to customize the text process by `IHistoryTransform`, `ITextTransform` and `ITextStreamTransform`.
- **High-level Applications**: Some applications that provides higher-level integration. For example, [BotSharp](https://github.com/SciSharp/BotSharp) provides integration for vector search, Chatbot UI and Web APIs. [semantic-kernel](https://github.com/microsoft/semantic-kernel) provides various APIs for manipulations related with LLM. If you've made an integration, please tell us and add it to the doc!
- **Integrations**: Integrations with other libraries to expand the application of LLamaSharp. For example, if you want to do RAG ([Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation)), kernel-memory integration is a good option for you.
![structure_image](media/structure.jpg)
## Recommended Use
Since `LLamaContext` interact with native library, it's not recommended to use the methods of it directly unless you know what you are doing. So does the `NativeApi`, which is not included in the architecture figure above.
`ChatSession` is recommended to be used when you want to build an application similar to ChatGPT, or the ChatBot, because it works best with `InteractiveExecutor`. Though other executors are also allowed to passed as a parameter to initialize a `ChatSession`, it's not encouraged if you are new to LLamaSharp and LLM.
High-level applications, such as BotSharp, are supposed to be used when you concentrate on the part not related with LLM. For example, if you want to deploy a chat bot to help you remember your schedules, using BotSharp may be a good choice.
Note that the APIs of the high-level applications may not be stable now. Please take it into account when using them.

View File

@ -1,36 +0,0 @@
# Basic usages of ChatSession
`ChatSession` is a higher-level abstraction than the executors. In the context of a chat application like ChatGPT, a "chat session" refers to an interactive conversation or exchange of messages between the user and the chatbot. It represents a continuous flow of communication where the user enters input or asks questions, and the chatbot responds accordingly. A chat session typically starts when the user initiates a conversation with the chatbot and continues until the interaction comes to a natural end or is explicitly terminated by either the user or the system. During a chat session, the chatbot maintains the context of the conversation, remembers previous messages, and generates appropriate responses based on the user's inputs and the ongoing dialogue.
## Initialize a session
Currently, the only parameter that is accepted is an `ILLamaExecutor`, because this is the only parameter that we're sure to exist in all the future versions. Since it's the high-level abstraction, we're conservative to the API designs. In the future, there may be more kinds of constructors added.
```cs
InteractiveExecutor ex = new(new LLamaModel(new ModelParams(modelPath)));
ChatSession session = new ChatSession(ex);
```
## Chat with the bot
There'll be two kinds of input accepted by the `Chat` API, which are `ChatHistory` and `String`. The API with string is quite similar to that of the executors. Meanwhile, the API with `ChatHistory` is aimed to provide more flexible usages. For example, you have had a chat with the bot in session A before you open the session B. Now session B has no memory for what you said before. Therefore, you can feed the history of A to B.
```cs
string prompt = "What is C#?";
await foreach (var text in session.ChatAsync(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } })) // the inference params should be changed depending on your statement
{
Console.Write(text);
}
```
## Get the history
Currently `History` is a property of `ChatSession`.
```cs
foreach(var rec in session.History.Messages)
{
Console.WriteLine($"{rec.AuthorRole}: {rec.Content}");
}
```

View File

@ -1,14 +0,0 @@
# Save/Load Chat Session
Generally, the chat session could be switched, which requires the ability of loading and saving session.
When building a chat bot app, it's **NOT encouraged** to initialize many chat sessions and keep them in memory to wait for being switched, because the memory consumption of both CPU and GPU is expensive. It's recommended to save the current session before switching to a new session, and load the file when switching back to the session.
The API is also quite simple, the files will be saved into a directory you specified. If the path does not exist, a new directory will be created.
```cs
string savePath = "<save dir>";
session.SaveSession(savePath);
session.LoadSession(savePath);
```

View File

@ -2,21 +2,65 @@
Hi, welcome to develop LLamaSharp with us together! We are always open for every contributor and any format of contributions! If you want to maintain this library actively together, please contact us to get the write access after some PRs. (Email: AsakusaRinne@gmail.com)
In this page, we'd like to introduce how to make contributions here easily. 😊
In this page, we introduce how to make contributions here easily. 😊
## Compile the native library from source
## The goal of LLamaSharp
Firstly, please clone the [llama.cpp](https://github.com/ggerganov/llama.cpp) repository and following the instructions in [llama.cpp readme](https://github.com/ggerganov/llama.cpp#build) to configure your local environment.
At the beginning, LLamaSharp is a C# binding of [llama.cpp](https://github.com/ggerganov/llama.cpp). It provided only some wrappers for llama.cpp to let C#/.NET users could run LLM models on their local device efficiently even if without any experience with C++. After around a year of development, more tools and integrations has been added to LLamaSharp, significantly expanding the application of LLamaSharp. Though llama.cpp is still the only backend of LLamaSharp, the goal of this repository is more likely to be an efficient and easy-to-use library of LLM inference, rather than just a binding of llama.cpp.
If you want to support cublas in the compilation, please make sure that you've installed the cuda.
In this way, our development of LLamaSharp is divided into two main directions:
When building from source, please add `-DBUILD_SHARED_LIBS=ON` to the cmake instruction. For example, when building with cublas but without openblas, use the following instruction:
1. To make LLamaSharp more efficient. For example, `BatchedExecutor` could accept multiple queries and generate the response for them at the same time, which significantly improves the throughput. This part is always related with native APIs and executors in LLamaSharp.
2. To make it easier to use LLamaSharp. We believe the best library is to let users build powerful functionalities with simple code. Higher-level APIs and integrations with other libraries are the key points of it.
```bash
cmake .. -DLLAMA_CUBLAS=ON -DBUILD_SHARED_LIBS=ON
## How to compile the native library from source
If you want to contribute to the first direction of our goal, you may need to compile the native library yourself.
Firstly, please follow the instructions in [llama.cpp readme](https://github.com/ggerganov/llama.cpp#build) to configure your local environment. Most importantly, CMake with version higher than 3.14 should be installed on your device.
Secondly, clone the llama.cpp repositories. You could manually clone it and checkout to the right commit according to [Map of LLamaSharp and llama.cpp versions](https://github.com/SciSharp/LLamaSharp?tab=readme-ov-file#map-of-llamasharp-and-llama.cpp-versions), or use clone the submodule of LLamaSharp when cloning LLamaSharp.
```shell
git clone --recursive https://github.com/SciSharp/LLamaSharp.git
```
After running `cmake --build . --config Release`, you could find the `llama.dll`, `llama.so` or `llama.dylib` in your build directory. After pasting it to `LLamaSharp/LLama/runtimes` , you can use it as the native library in LLamaSharp.
If you want to support cublas in the compilation, please make sure that you've installed it. If you are using Intel CPU, please check the highest AVX ([Advanced Vector Extensions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)) level that is supported by your device.
As shown in [llama.cpp cmake file](https://github.com/ggerganov/llama.cpp/blob/master/CMakeLists.txt), there are many options that could be enabled or disabled when building the library. The following ones are commonly used when using it as a native library of LLamaSharp.
```cpp
option(BUILD_SHARED_LIBS "build shared libraries") // Please always enable it
option(LLAMA_NATIVE "llama: enable -march=native flag") // Could be disabled
option(LLAMA_AVX "llama: enable AVX") // Enable it if the highest supported avx level is AVX
option(LLAMA_AVX2 "llama: enable AVX2") // Enable it if the highest supported avx level is AVX2
option(LLAMA_AVX512 "llama: enable AVX512") // Enable it if the highest supported avx level is AVX512
option(LLAMA_BLAS "llama: use BLAS") // Enable it if you want to use BLAS library to acclerate the computation on CPU
option(LLAMA_CUDA "llama: use CUDA") // Enable it if you have CUDA device
option(LLAMA_CLBLAST "llama: use CLBlast") // Enable it if you have a device with CLBLast or OpenCL support, for example, some AMD GPUs.
option(LLAMA_VULKAN "llama: use Vulkan") // Enable it if you have a device with Vulkan support
option(LLAMA_METAL "llama: use Metal") // Enable it if you are using a MAC with Metal device.
option(LLAMA_BUILD_TESTS "llama: build tests") // Please disable it.
option(LLAMA_BUILD_EXAMPLES "llama: build examples") // Please disable it.
option(LLAMA_BUILD_SERVER "llama: build server example")// Please disable it.
```
Most importantly, `-DBUILD_SHARED_LIBS=ON` must be added to the cmake instruction and other options depends on you. For example, when building with cublas but without openblas, use the following instruction:
```bash
mkdir build && cd build
cmake .. -DLLAMA_CUBLAS=ON -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release
```
Now you could find the `llama.dll`, `libllama.so` or `llama.dylib` in your build directory (or `build/bin`).
To load the compiled native library, please add the following code to the very beginning of your code.
```cs
NativeLibraryConfig.Instance.WithLibrary("<Your native library path>");
```
## Add a new feature to LLamaSharp
@ -24,7 +68,7 @@ After running `cmake --build . --config Release`, you could find the `llama.dll`
After refactoring the framework in `v0.4.0`, LLamaSharp will try to maintain the backward compatibility. However, in the following cases a breaking change will be required:
1. Due to some break changes in [llama.cpp](https://github.com/ggerganov/llama.cpp), making a breaking change will help to maintain the good abstraction and friendly user APIs.
2. A very important feature cannot be implemented unless refactoring some parts.
2. An important feature cannot be implemented unless refactoring some parts.
3. After some discussions, an agreement was reached that making the break change is reasonable.
If a new feature could be added without introducing any break change, please **open a PR** rather than open an issue first. We will never refuse the PR but help to improve it, unless it's malicious.
@ -39,19 +83,19 @@ You could use exactly the same prompt, the same model and the same parameters to
If the experiment showed that it worked well in llama.cpp but didn't in LLamaSharp, a search for the problem could be started. While the reason of the problem could be various, the best way I think is to add log-print in the code of llama.cpp and use it in LLamaSharp after compilation. Thus, when running LLamaSharp, you could see what happened in the native library.
After finding out the reason, a painful but happy process comes. When working on the BUG fix, there's only one rule to follow, that is keeping the examples working well. If the modification fixed the BUG but impact on other functions, it would not be a good fix.
During the BUG fix process, please don't hesitate to discuss together when you stuck on something.
During the BUG fix process, please don't hesitate to discuss together when you are blocked.
## Add integrations
All kinds of integration are welcomed here! Currently the following integrations are under work or on our schedule:
All kinds of integration are welcomed here! Currently the following integrations have been added but still need improvement:
1. BotSharp
2. semantic-kernel
3. Unity
1. semantic-kernel
2. kernel-memory
3. BotSharp (maintained in SciSharp/BotSharp repo)
4. Langchain (maintained in tryAGI/LangChain repo)
If you find another library that is good to be integrated, please open an issue to let us know!
Besides, for some other integrations, like `ASP.NET core`, `SQL`, `Blazor` and so on, we'll appreciate it if you could help with that. If the time is limited for you, providing an example for it also means a lot!
## Add examples
@ -62,4 +106,4 @@ There're mainly two ways to add an example:
## Add documents
LLamaSharp uses [mkdocs](https://github.com/mkdocs/mkdocs) to build the documentation, please follow the tutorial of mkdocs to add or modify documents in LLamaSharp.
LLamaSharp uses [mkdocs](https://github.com/mkdocs/mkdocs) to build the documentation, please follow the tutorial of mkdocs to add or modify documents in LLamaSharp.

View File

@ -1,170 +0,0 @@
# Batch decoding
```cs
using System.Diagnostics;
using System.Text;
using LLama.Common;
using LLama.Native;
using LLama.Sampling;
public class BatchedDecoding
{
private const int n_parallel = 8;
private const int n_len = 32;
public static async Task Run()
{
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
Console.WriteLine("Prompt (leave blank to select automatically):");
var prompt = Console.ReadLine();
if (string.IsNullOrWhiteSpace(prompt))
prompt = "Not many people know that";
// Load model
var parameters = new ModelParams(modelPath);
using var model = LLamaWeights.LoadFromFile(parameters);
// Tokenize prompt
var prompt_tokens = model.Tokenize(prompt, true, false, Encoding.UTF8);
var n_kv_req = prompt_tokens.Length + (n_len - prompt_tokens.Length) * n_parallel;
// Create a context
parameters.ContextSize = (uint)model.ContextSize;
parameters.BatchSize = (uint)Math.Max(n_len, n_parallel);
using var context = model.CreateContext(parameters);
var n_ctx = context.ContextSize;
// make sure the KV cache is big enough to hold all the prompt and generated tokens
if (n_kv_req > n_ctx)
{
await Console.Error.WriteLineAsync($"error: n_kv_req ({n_kv_req}) > n_ctx, the required KV cache size is not big enough\n");
await Console.Error.WriteLineAsync(" either reduce n_parallel or increase n_ctx\n");
return;
}
var batch = new LLamaBatch();
// evaluate the initial prompt
batch.AddRange(prompt_tokens, 0, LLamaSeqId.Zero, true);
if (await context.DecodeAsync(batch) != DecodeResult.Ok)
{
await Console.Error.WriteLineAsync("llama_decode failed");
return;
}
// assign the system KV cache to all parallel sequences
// this way, the parallel sequences will "reuse" the prompt tokens without having to copy them
for (var i = 1; i < n_parallel; ++i)
{
context.NativeHandle.KvCacheSequenceCopy((LLamaSeqId)0, (LLamaSeqId)i, 0, batch.TokenCount);
}
if (n_parallel > 1)
{
Console.WriteLine();
Console.WriteLine($"generating {n_parallel} sequences...");
}
// remember the batch index of the last token for each parallel sequence
// we need this to determine which logits to sample from
List<int> i_batch = new();
for (var i = 0; i < n_parallel; i++)
i_batch.Add(batch.TokenCount - 1);
// Create per-stream decoder and sampler
var decoders = new StreamingTokenDecoder[n_parallel];
var samplers = new ISamplingPipeline[n_parallel];
for (var i = 0; i < n_parallel; i++)
{
decoders[i] = new StreamingTokenDecoder(context);
samplers[i] = new DefaultSamplingPipeline
{
Temperature = 0.1f + (float)i / n_parallel,
MinP = 0.25f,
};
}
var n_cur = batch.TokenCount;
var n_decode = 0;
var timer = new Stopwatch();
timer.Start();
while (n_cur <= n_len)
{
batch.Clear();
for (var i = 0; i < n_parallel; i++)
{
// Skip completed streams
if (i_batch[i] < 0)
continue;
// Use the sampling pipeline to select a token
var new_token_id = samplers[i].Sample(
context.NativeHandle,
context.NativeHandle.GetLogitsIth(i_batch[i]),
Array.Empty<LLamaToken>()
);
// Finish this stream early if necessary
if (new_token_id == model.EndOfSentenceToken || new_token_id == model.NewlineToken)
{
i_batch[i] = -1;
Console.WriteLine($"Completed Stream {i} early");
continue;
}
// Add this token to the decoder, so it will be turned into text
decoders[i].Add(new_token_id);
i_batch[i] = batch.TokenCount;
// push this new token for next evaluation
batch.Add(new_token_id, n_cur, (LLamaSeqId)i, true);
n_decode++;
}
// Check if all streams are finished
if (batch.TokenCount == 0)
{
break;
}
n_cur++;
// evaluate the current batch with the transformer model
if (await context.DecodeAsync(batch) != 0)
{
await Console.Error.WriteLineAsync("failed to eval");
return;
}
}
timer.Stop();
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine();
Console.WriteLine($"Decoded {n_decode} tokens in {timer.ElapsedMilliseconds}ms");
Console.WriteLine($"Rate: {n_decode / timer.Elapsed.TotalSeconds:##.000} tokens/second");
var index = 0;
foreach (var stream in decoders)
{
var text = stream.Read();
Console.ForegroundColor = ConsoleColor.Green;
Console.Write($"{index++}. {prompt}");
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine(text);
}
Console.WriteLine("Press any key to exit demo");
Console.ReadKey(true);
}
}
```

View File

@ -0,0 +1,148 @@
# Bacthed executor - multi-output to one input
```cs
using LLama.Batched;
using LLama.Common;
using LLama.Native;
using LLama.Sampling;
using Spectre.Console;
namespace LLama.Examples.Examples;
/// <summary>
/// This demonstrates generating multiple replies to the same prompt, with a shared cache
/// </summary>
public class BatchedExecutorFork
{
private const int n_split = 16;
private const int n_len = 72;
public static async Task Run()
{
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath);
using var model = LLamaWeights.LoadFromFile(parameters);
var prompt = AnsiConsole.Ask("Prompt (or ENTER for default):", "Not many people know that");
// Create an executor that can evaluate a batch of conversations together
using var executor = new BatchedExecutor(model, parameters);
// Print some info
var name = executor.Model.Metadata.GetValueOrDefault("general.name", "unknown model name");
Console.WriteLine($"Created executor with model: {name}");
// Evaluate the initial prompt to create one conversation
using var start = executor.Create();
start.Prompt(prompt);
await executor.Infer();
// Create the root node of the tree
var root = new Node(start);
await AnsiConsole
.Progress()
.StartAsync(async progress =>
{
var reporter = progress.AddTask("Running Inference (1)", maxValue: n_len);
// Run inference loop
for (var i = 0; i < n_len; i++)
{
if (i != 0)
await executor.Infer();
// Occasionally fork all the active conversations
if (i != 0 && i % n_split == 0)
root.Split();
// Sample all active conversations
root.Sample();
// Update progress bar
reporter.Increment(1);
reporter.Description($"Running Inference ({root.ActiveConversationCount})");
}
// Display results
var display = new Tree(prompt);
root.Display(display);
AnsiConsole.Write(display);
});
}
private class Node
{
private readonly StreamingTokenDecoder _decoder;
private readonly DefaultSamplingPipeline _sampler;
private Conversation? _conversation;
private Node? _left;
private Node? _right;
public int ActiveConversationCount => _conversation != null ? 1 : _left!.ActiveConversationCount + _right!.ActiveConversationCount;
public Node(Conversation conversation)
{
_sampler = new DefaultSamplingPipeline();
_conversation = conversation;
_decoder = new StreamingTokenDecoder(conversation.Executor.Context);
}
public void Sample()
{
if (_conversation == null)
{
_left?.Sample();
_right?.Sample();
return;
}
if (_conversation.RequiresInference)
return;
// Sample one token
var ctx = _conversation.Executor.Context.NativeHandle;
var token = _sampler.Sample(ctx, _conversation.Sample(), Array.Empty<LLamaToken>());
_sampler.Accept(ctx, token);
_decoder.Add(token);
// Prompt the conversation with this token, to continue generating from there
_conversation.Prompt(token);
}
public void Split()
{
if (_conversation != null)
{
_left = new Node(_conversation.Fork());
_right = new Node(_conversation.Fork());
_conversation.Dispose();
_conversation = null;
}
else
{
_left?.Split();
_right?.Split();
}
}
public void Display<T>(T tree, int depth = 0)
where T : IHasTreeNodes
{
var colors = new[] { "red", "green", "blue", "yellow", "white" };
var color = colors[depth % colors.Length];
var message = Markup.Escape(_decoder.Read().ReplaceLineEndings(""));
var n = tree.AddNode($"[{color}]{message}[/]");
_left?.Display(n, depth + 1);
_right?.Display(n, depth + 1);
}
}
}
```

View File

@ -0,0 +1,130 @@
# Batched executor - basic guidance
```cs
using LLama.Batched;
using LLama.Common;
using LLama.Native;
using LLama.Sampling;
using Spectre.Console;
namespace LLama.Examples.Examples;
/// <summary>
/// This demonstrates using a batch to generate two sequences and then using one
/// sequence as the negative guidance ("classifier free guidance") for the other.
/// </summary>
public class BatchedExecutorGuidance
{
private const int n_len = 32;
public static async Task Run()
{
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath);
using var model = LLamaWeights.LoadFromFile(parameters);
var positivePrompt = AnsiConsole.Ask("Positive Prompt (or ENTER for default):", "My favourite colour is").Trim();
var negativePrompt = AnsiConsole.Ask("Negative Prompt (or ENTER for default):", "I hate the colour red. My favourite colour is").Trim();
var weight = AnsiConsole.Ask("Guidance Weight (or ENTER for default):", 2.0f);
// Create an executor that can evaluate a batch of conversations together
using var executor = new BatchedExecutor(model, parameters);
// Print some info
var name = executor.Model.Metadata.GetValueOrDefault("general.name", "unknown model name");
Console.WriteLine($"Created executor with model: {name}");
// Load the two prompts into two conversations
using var guided = executor.Create();
guided.Prompt(positivePrompt);
using var guidance = executor.Create();
guidance.Prompt(negativePrompt);
// Run inference to evaluate prompts
await AnsiConsole
.Status()
.Spinner(Spinner.Known.Line)
.StartAsync("Evaluating Prompts...", _ => executor.Infer());
// Fork the "guided" conversation. We'll run this one without guidance for comparison
using var unguided = guided.Fork();
// Run inference loop
var unguidedSampler = new GuidedSampler(null, weight);
var unguidedDecoder = new StreamingTokenDecoder(executor.Context);
var guidedSampler = new GuidedSampler(guidance, weight);
var guidedDecoder = new StreamingTokenDecoder(executor.Context);
await AnsiConsole
.Progress()
.StartAsync(async progress =>
{
var reporter = progress.AddTask("Running Inference", maxValue: n_len);
for (var i = 0; i < n_len; i++)
{
if (i != 0)
await executor.Infer();
// Sample from the "unguided" conversation. This is just a conversation using the same prompt, without any
// guidance. This serves as a comparison to show the effect of guidance.
var u = unguidedSampler.Sample(executor.Context.NativeHandle, unguided.Sample(), Array.Empty<LLamaToken>());
unguidedDecoder.Add(u);
unguided.Prompt(u);
// Sample from the "guided" conversation. This sampler will internally use the "guidance" conversation
// to steer the conversation. See how this is done in GuidedSampler.ProcessLogits (bottom of this file).
var g = guidedSampler.Sample(executor.Context.NativeHandle, guided.Sample(), Array.Empty<LLamaToken>());
guidedDecoder.Add(g);
// Use this token to advance both guided _and_ guidance. Keeping them in sync (except for the initial prompt).
guided.Prompt(g);
guidance.Prompt(g);
// Early exit if we reach the natural end of the guided sentence
if (g == model.EndOfSentenceToken)
break;
// Update progress bar
reporter.Increment(1);
}
});
AnsiConsole.MarkupLine($"[green]Unguided:[/][white]{unguidedDecoder.Read().ReplaceLineEndings(" ")}[/]");
AnsiConsole.MarkupLine($"[green]Guided:[/][white]{guidedDecoder.Read().ReplaceLineEndings(" ")}[/]");
}
private class GuidedSampler(Conversation? guidance, float weight)
: BaseSamplingPipeline
{
public override void Accept(SafeLLamaContextHandle ctx, LLamaToken token)
{
}
public override ISamplingPipeline Clone()
{
throw new NotSupportedException();
}
protected override void ProcessLogits(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
{
if (guidance == null)
return;
// Get the logits generated by the guidance sequences
var guidanceLogits = guidance.Sample();
// Use those logits to guide this sequence
NativeApi.llama_sample_apply_guidance(ctx, logits, guidanceLogits, weight);
}
protected override LLamaToken ProcessTokenDataArray(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, ReadOnlySpan<LLamaToken> lastTokens)
{
candidates.Temperature(ctx, 0.8f);
candidates.TopK(ctx, 25);
return candidates.SampleToken(ctx);
}
}
}
```

View File

@ -0,0 +1,121 @@
# Batched executor - rewinding to an earlier state
```cs
using LLama.Batched;
using LLama.Common;
using LLama.Native;
using LLama.Sampling;
using Spectre.Console;
namespace LLama.Examples.Examples;
/// <summary>
/// This demonstrates generating tokens and then rewinding to an earlier state
/// </summary>
public class BatchedExecutorRewind
{
private const int n_generate = 24;
private const int n_rewind = 12;
private const int n_repeats = 6;
public static async Task Run()
{
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath);
using var model = LLamaWeights.LoadFromFile(parameters);
var prompt = AnsiConsole.Ask("Prompt (or ENTER for default):", "Not many people know that");
// Create an executor that can evaluate a batch of conversations together
using var executor = new BatchedExecutor(model, parameters);
// Print some info
var name = executor.Model.Metadata.GetValueOrDefault("general.name", "unknown model name");
Console.WriteLine($"Created executor with model: {name}");
// Evaluate the initial prompt to create one conversation
using var conversation = executor.Create();
conversation.Prompt(prompt);
// Create the start node wrapping the conversation
var node = new Node(executor.Context);
// Print the prompt
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine(prompt);
for (var i = 0; i < n_repeats; i++)
{
for (var j = 0; j < n_generate; j++)
{
// Run inference
await executor.Infer();
// Sample a token
var token = node.Sample(conversation);
// Continue conversation with this token
if (j != n_generate - 1)
conversation.Prompt(token);
}
// Write out what we generated
node.Write(n_rewind, i + 1);
// Rewind back a few tokens
conversation.Rewind(n_rewind + 1);
// Prompt with a token
conversation.Prompt(node.GetToken(n_generate - n_rewind - 1));
// Create a new node around the rewound conversation
node = new Node(executor.Context);
}
Console.WriteLine("Press any key to exit demo");
Console.ReadKey(true);
}
private class Node
{
private readonly LLamaContext _context;
private readonly List<LLamaToken> _tokens = new List<LLamaToken>();
private readonly DefaultSamplingPipeline Sampler;
public Node(LLamaContext context)
{
_context = context;
Sampler = new DefaultSamplingPipeline();
}
public LLamaToken Sample(Conversation conversation)
{
var token = Sampler.Sample(_context.NativeHandle, conversation.Sample(), Array.Empty<LLamaToken>());
_tokens.Add(token);
return token;
}
public void Write(int n_rewind, int depth)
{
var decoder = new StreamingTokenDecoder(_context);
for (var i = 0; i < _tokens.Count - n_rewind; i++)
decoder.Add(_tokens[i]);
AnsiConsole.MarkupLine($"[green]{new string(' ', depth * 3) + decoder.Read().ReplaceLineEndings(" ")}[/]");
for (var i = _tokens.Count - n_rewind; i < _tokens.Count; i++)
decoder.Add(_tokens[i]);
AnsiConsole.MarkupLine($"[maroon]{decoder.Read().ReplaceLineEndings(" ")}[/]");
}
public LLamaToken GetToken(int index)
{
return _tokens[index];
}
}
}
```

View File

@ -1,9 +1,12 @@
# Chat Chinese
# Chinese LLM - with GB2312 encoding
```cs
using System.Text;
using LLama.Common;
namespace LLama.Examples.Examples;
// This example shows how to deal with Chinese input with gb2312 encoding.
public class ChatChineseGB2312
{
private static string ConvertEncoding(string input, Encoding original, Encoding target)
@ -23,8 +26,7 @@ public class ChatChineseGB2312
" to use https://huggingface.co/hfl/chinese-alpaca-2-7b-gguf/blob/main/ggml-model-q5_0.gguf, which has been verified by LLamaSharp developers.");
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath)
{
@ -121,5 +123,4 @@ public class ChatChineseGB2312
}
}
}
```

View File

@ -1,19 +1,17 @@
# Use chat session and strip role names
# ChatSession - stripping role names
```cs
using LLama.Common;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LLama.Examples.Examples;
// When using chatsession, it's a common case that you want to strip the role names
// rather than display them. This example shows how to use transforms to strip them.
public class ChatSessionStripRoleName
{
public static void Run()
public static async Task Run()
{
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath)
{
@ -65,4 +63,5 @@ public class ChatSessionStripRoleName
}
}
}
```

View File

@ -1,16 +1,16 @@
# Chat session with history
# ChatSession - with history
```cs
using LLama.Common;
namespace LLama.Examples.Examples;
// This example shows how to save the state and history of chat session and load it again.
public class ChatSessionWithHistory
{
public static async Task Run()
{
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath)
{
@ -52,6 +52,10 @@ public class ChatSessionWithHistory
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The chat session has started.");
Console.WriteLine("Type 'exit' to end the chat session.");
Console.WriteLine("Type 'save' to save the chat session to disk.");
Console.WriteLine("Type 'load' to load the chat session from disk.");
Console.WriteLine("Type 'regenerate' to regenerate the last response.");
// show the prompt
Console.ForegroundColor = ConsoleColor.Green;
@ -59,12 +63,20 @@ public class ChatSessionWithHistory
while (userInput != "exit")
{
// Save the chat state to disk
if (userInput == "save")
{
session.SaveSession("Assets/chat-with-bob");
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Session saved.");
}
// Load the chat state from disk
else if (userInput == "load")
{
session.LoadSession("Assets/chat-with-bob");
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Session loaded.");
}
else if (userInput == "regenerate")
{
Console.ForegroundColor = ConsoleColor.Yellow;
@ -99,6 +111,4 @@ public class ChatSessionWithHistory
}
}
}
```

View File

@ -0,0 +1,112 @@
# ChatSession - restarting
```cs
using LLama.Common;
namespace LLama.Examples.Examples;
// This example shows how to restart the chat session
public class ChatSessionWithRestart
{
public static async Task Run()
{
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InteractiveExecutor(context);
var chatHistoryJson = File.ReadAllText("Assets/chat-with-bob.json");
ChatHistory chatHistory = ChatHistory.FromJson(chatHistoryJson) ?? new ChatHistory();
ChatSession prototypeSession =
await ChatSession.InitializeSessionFromHistoryAsync(executor, chatHistory);
prototypeSession.WithOutputTransform(new LLamaTransforms.KeywordTextOutputStreamTransform(
new string[] { "User:", "Assistant:" },
redundancyLength: 8));
var resetState = prototypeSession.GetSessionState();
ChatSession session = new ChatSession(executor);
session.LoadSession(resetState);
InferenceParams inferenceParams = new InferenceParams()
{
Temperature = 0.9f,
AntiPrompts = new List<string> { "User:" }
};
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The chat session has started. Starting point saved.");
Console.WriteLine("Type 'exit' to end the chat session.");
Console.WriteLine("Type 'save' to save chat session state in memory.");
Console.WriteLine("Type 'reset' to reset the chat session to its saved state.");
Console.WriteLine("Type 'answer for assistant' to add and process provided user and assistant messages.");
// show the prompt
Console.ForegroundColor = ConsoleColor.Green;
string userInput = Console.ReadLine() ?? "";
while (userInput != "exit")
{
// Load the session state from the reset state
if(userInput == "reset")
{
session.LoadSession(resetState);
Console.WriteLine($"Reset to history:\n{session.HistoryTransform.HistoryToText(session.History)}");
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Session reset.");
}
// Assign new reset state.
else if (userInput == "save")
{
resetState = session.GetSessionState();
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Session saved.");
}
// Provide user and override assistant answer with your own.
else if (userInput == "answer for assistant")
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Provide user input: ");
Console.ForegroundColor = ConsoleColor.Green;
string userInputOverride = Console.ReadLine() ?? "";
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Provide assistant input: ");
Console.ForegroundColor = ConsoleColor.Green;
string assistantInputOverride = Console.ReadLine() ?? "";
await session.AddAndProcessUserMessage(userInputOverride);
await session.AddAndProcessAssistantMessage(assistantInputOverride);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("User and assistant messages processed. Provide next user message:");
}
else
{
await foreach (
var text
in session.ChatAsync(
new ChatHistory.Message(AuthorRole.User, userInput),
inferenceParams))
{
Console.ForegroundColor = ConsoleColor.White;
Console.Write(text);
}
}
Console.ForegroundColor = ConsoleColor.Green;
userInput = Console.ReadLine() ?? "";
Console.ForegroundColor = ConsoleColor.White;
}
}
}
```

View File

@ -1,19 +1,16 @@
# Use chat session without removing role names
# ChatSession - Basic
```cs
using LLama.Common;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LLama.Examples.Examples;
// The basic example for using ChatSession
public class ChatSessionWithRoleName
{
public static void Run()
public static async Task Run()
{
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath)
{
@ -62,4 +59,5 @@ public class ChatSessionWithRoleName
}
}
}
```

View File

@ -1,97 +1,73 @@
# Coding Assistant
# Coding assistant
```cs
using LLama.Common;
using System;
using System.Reflection;
internal class CodingAssistant
namespace LLama.Examples.Examples
{
const string DefaultModelUri = "https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_S.gguf";
using LLama.Common;
using System;
// Source paper with example prompts:
// https://doi.org/10.48550/arXiv.2308.12950
const string InstructionPrefix = "[INST]";
const string InstructionSuffix = "[/INST]";
const string SystemInstruction = "You're an intelligent, concise coding assistant. Wrap code in ``` for readability. Don't repeat yourself. Use best practice and good coding standards.";
private static string ModelsDirectory = Path.Combine(Directory.GetParent(Assembly.GetExecutingAssembly().Location)!.FullName, "Models");
public static async Task Run()
// This example shows how to apply code completion as a coding assistant
internal class CodingAssistant
{
Console.Write("Please input your model path (if left empty, a default model will be downloaded for you): ");
var modelPath = Console.ReadLine();
// Source paper with example prompts:
// https://doi.org/10.48550/arXiv.2308.12950
const string InstructionPrefix = "[INST]";
const string InstructionSuffix = "[/INST]";
const string SystemInstruction = "You're an intelligent, concise coding assistant. " +
"Wrap code in ``` for readability. Don't repeat yourself. " +
"Use best practice and good coding standards.";
if(string.IsNullOrWhiteSpace(modelPath) )
public static async Task Run()
{
modelPath = await GetDefaultModel();
}
var parameters = new ModelParams(modelPath)
{
ContextSize = 4096
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InstructExecutor(context, InstructionPrefix, InstructionSuffix, null);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the LLM will follow your instructions." +
"\nIt's a 7B Code Llama, so it's trained for programming tasks like \"Write a C# function reading a file name from a given URI\" or \"Write some programming interview questions\"." +
"\nWrite 'exit' to exit");
Console.ForegroundColor = ConsoleColor.White;
var inferenceParams = new InferenceParams() {
Temperature = 0.8f,
MaxTokens = -1,
};
string instruction = $"{SystemInstruction}\n\n";
await Console.Out.WriteAsync("Instruction: ");
instruction += Console.ReadLine() ?? "Ask me for instructions.";
while (instruction != "exit")
{
Console.ForegroundColor = ConsoleColor.Green;
await foreach (var text in executor.InferAsync(instruction + System.Environment.NewLine, inferenceParams))
string modelPath = UserSettings.GetModelPath();
if (!modelPath.Contains("codellama", StringComparison.InvariantCultureIgnoreCase))
{
Console.Write(text);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("WARNING: the model you selected is not a Code LLama model!");
Console.WriteLine("For this example we specifically recommend 'codellama-7b-instruct.Q4_K_S.gguf'");
Console.WriteLine("Press ENTER to continue...");
Console.ReadLine();
}
var parameters = new ModelParams(modelPath)
{
ContextSize = 4096
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InstructExecutor(context, InstructionPrefix, InstructionSuffix, null);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the LLM will follow your instructions." +
"\nIt's a 7B Code Llama, so it's trained for programming tasks like \"Write a C# function reading " +
"a file name from a given URI\" or \"Write some programming interview questions\"." +
"\nWrite 'exit' to exit");
Console.ForegroundColor = ConsoleColor.White;
var inferenceParams = new InferenceParams()
{
Temperature = 0.8f,
MaxTokens = -1,
};
string instruction = $"{SystemInstruction}\n\n";
await Console.Out.WriteAsync("Instruction: ");
instruction = Console.ReadLine() ?? "Ask me for instructions.";
}
}
instruction += Console.ReadLine() ?? "Ask me for instructions.";
while (instruction != "exit")
{
private static async Task<string> GetDefaultModel()
{
var uri = new Uri(DefaultModelUri);
var modelName = uri.Segments[^1];
await Console.Out.WriteLineAsync($"The following model will be used: {modelName}");
var modelPath = Path.Combine(ModelsDirectory, modelName);
if(!Directory.Exists(ModelsDirectory))
{
Directory.CreateDirectory(ModelsDirectory);
}
Console.ForegroundColor = ConsoleColor.Green;
await foreach (var text in executor.InferAsync(instruction + Environment.NewLine, inferenceParams))
{
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.White;
if (File.Exists(modelPath))
{
await Console.Out.WriteLineAsync($"Existing model found, using {modelPath}");
await Console.Out.WriteAsync("Instruction: ");
instruction = Console.ReadLine() ?? "Ask me for instructions.";
}
}
else
{
await Console.Out.WriteLineAsync($"Model not found locally, downloading {DefaultModelUri}...");
using var http = new HttpClient();
await using var downloadStream = await http.GetStreamAsync(uri);
await using var fileStream = new FileStream(modelPath, FileMode.Create, FileAccess.Write);
await downloadStream.CopyToAsync(fileStream);
await Console.Out.WriteLineAsync($"Model downloaded and saved to {modelPath}");
}
return modelPath;
}
}
```

View File

@ -1,32 +1,49 @@
# Get embeddings
# Get embeddings
```cs
using LLama.Common;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
public class GetEmbeddings
namespace LLama.Examples.Examples
{
public static void Run()
// This example shows how to get embeddings from a text prompt.
public class GetEmbeddings
{
Console.Write("Please input your model path: ");
string modelPath = Console.ReadLine();
var modelParams = new ModelParams(modelPath) { EmbeddingMode = true };
var embedder = new LLamaEmbedder(modelParams);
while (true)
public static void Run()
{
Console.Write("Please input your text: ");
Console.ForegroundColor = ConsoleColor.Green;
var text = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
string modelPath = UserSettings.GetModelPath();
Console.WriteLine(string.Join(", ", embedder.GetEmbeddings(text)));
Console.WriteLine();
Console.ForegroundColor = ConsoleColor.DarkGray;
var @params = new ModelParams(modelPath) { EmbeddingMode = true };
using var weights = LLamaWeights.LoadFromFile(@params);
var embedder = new LLamaEmbedder(weights, @params);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine(
"""
This example displays embeddings from a text prompt.
Embeddings are numerical codes that represent information like words, images, or concepts.
These codes capture important relationships between those objects,
like how similar words are in meaning or how close images are visually.
This allows machine learning models to efficiently understand and process complex data.
Embeddings of a text in LLM is sometimes useful, for example, to train other MLP models.
"""); // NOTE: this description was AI generated
while (true)
{
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Please input your text: ");
Console.ForegroundColor = ConsoleColor.Green;
var text = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
float[] embeddings = embedder.GetEmbeddings(text).Result;
Console.WriteLine($"Embeddings contain {embeddings.Length:N0} floating point values:");
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine(string.Join(", ", embeddings.Take(20)) + ", ...");
Console.WriteLine();
}
}
}
}
```

View File

@ -0,0 +1,58 @@
# Grammar - json response
```cs
using LLama.Common;
using LLama.Grammars;
namespace LLama.Examples.Examples
{
// This example shows how to get response in json format using grammar.
public class GrammarJsonResponse
{
public static async Task Run()
{
string modelPath = UserSettings.GetModelPath();
var gbnf = File.ReadAllText("Assets/json.gbnf").Trim();
var grammar = Grammar.Parse(gbnf, "root");
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the LLM will follow your instructions and always respond in a JSON format. For example, you can input \"Tell me the attributes of a good dish\"");
Console.ForegroundColor = ConsoleColor.White;
using var grammarInstance = grammar.CreateInstance();
var inferenceParams = new InferenceParams()
{
Temperature = 0.6f,
AntiPrompts = new List<string> { "Question:", "#", "Question: ", ".\n" },
MaxTokens = 50,
Grammar = grammarInstance
};
while (true)
{
Console.Write("\nQuestion: ");
Console.ForegroundColor = ConsoleColor.Green;
var prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Answer: ");
prompt = $"Question: {prompt?.Trim()} Answer: ";
await foreach (var text in ex.InferAsync(prompt, inferenceParams))
{
Console.Write(text);
}
}
}
}
}
```

View File

@ -1,55 +0,0 @@
# Grammer json response
```cs
using LLama.Common;
using LLama.Grammars;
public class GrammarJsonResponse
{
public static async Task Run()
{
var gbnf = (await File.ReadAllTextAsync("Assets/json.gbnf")).Trim();
var grammar = Grammar.Parse(gbnf, "root");
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the LLM will follow your instructions and always respond in a JSON format. For example, you can input \"Tell me the attributes of a good dish\"");
Console.ForegroundColor = ConsoleColor.White;
using var grammarInstance = grammar.CreateInstance();
var inferenceParams = new InferenceParams()
{
Temperature = 0.6f,
AntiPrompts = new List<string> { "Question:", "#", "Question: ", ".\n" },
MaxTokens = 50,
Grammar = grammarInstance
};
while (true)
{
Console.Write("\nQuestion: ");
Console.ForegroundColor = ConsoleColor.Green;
var prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Answer: ");
prompt = $"Question: {prompt?.Trim()} Answer: ";
await foreach (var text in ex.InferAsync(prompt, inferenceParams))
{
Console.Write(text);
}
}
}
}
```

View File

@ -1,40 +1,48 @@
# Use instruct executor
# Instruct executor - basic
```cs
using LLama.Common;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
public class InstructModeExecute
namespace LLama.Examples.Examples
{
public static void Run()
// This example shows how to use InstructExecutor to generate the response.
public class InstructModeExecute
{
Console.Write("Please input your model path: ");
string modelPath = Console.ReadLine();
var prompt = File.ReadAllText("Assets/dan.txt").Trim();
InstructExecutor ex = new(new LLamaModel(new ModelParams(modelPath, contextSize: 1024)));
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the LLM will follow your instructions. For example, you can input \"Write a story about a fox who want to " +
"make friend with human, no less than 200 words.\"");
Console.ForegroundColor = ConsoleColor.White;
var inferenceParams = new InferenceParams() { Temperature = 0.8f, MaxTokens = 300 };
while (true)
public static async Task Run()
{
foreach (var text in ex.Infer(prompt, inferenceParams))
string modelPath = UserSettings.GetModelPath();
var prompt = File.ReadAllText("Assets/dan.txt").Trim();
var parameters = new ModelParams(modelPath)
{
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InstructExecutor(context);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the LLM will follow your instructions. For example, you can input \"Write a story about a fox who want to " +
"make friend with human, no less than 200 words.\"");
Console.ForegroundColor = ConsoleColor.White;
var inferenceParams = new InferenceParams() { Temperature = 0.8f, MaxTokens = 600 };
while (true)
{
await foreach (var text in executor.InferAsync(prompt, inferenceParams))
{
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
}
}
}
}
```

View File

@ -1,41 +1,49 @@
# Use interactive executor
# Interactive executor - basic
```cs
using LLama.Common;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
public class InteractiveModeExecute
namespace LLama.Examples.Examples
{
public async static Task Run()
// This is an example which shows how to chat with LLM with InteractiveExecutor.
public class InteractiveModeExecute
{
Console.Write("Please input your model path: ");
string modelPath = Console.ReadLine();
var prompt = File.ReadAllText("Assets/chat-with-bob.txt").Trim();
InteractiveExecutor ex = new(new LLamaModel(new ModelParams(modelPath, contextSize: 256)));
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the prompt is printed, the maximum tokens is set to 64 and the context size is 256. (an example for small scale usage)");
Console.ForegroundColor = ConsoleColor.White;
Console.Write(prompt);
var inferenceParams = new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" }, MaxTokens = 64 };
while (true)
public static async Task Run()
{
await foreach (var text in ex.InferAsync(prompt, inferenceParams))
string modelPath = UserSettings.GetModelPath();
var prompt = (await File.ReadAllTextAsync("Assets/chat-with-bob.txt")).Trim();
var parameters = new ModelParams(modelPath)
{
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var ex = new InteractiveExecutor(context);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the prompt is printed, the maximum tokens is set to 128 and the context size is 256. (an example for small scale usage)");
Console.ForegroundColor = ConsoleColor.White;
Console.Write(prompt);
var inferenceParams = new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" }, MaxTokens = 128 };
while (true)
{
await foreach (var text in ex.InferAsync(prompt, inferenceParams))
{
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
}
}
}
}
```

View File

@ -1,62 +1,114 @@
# Kernel memory
# Kernel memory integration - basic
```cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory;
using Microsoft.KernelMemory.Configuration;
using Microsoft.KernelMemory.Handlers;
using System.Diagnostics;
public class KernelMemory
namespace LLama.Examples.Examples
{
public static async Task Run()
// This example is from Microsoft's official kernel memory "custom prompts" example:
// https://github.com/microsoft/kernel-memory/blob/6d516d70a23d50c6cb982e822e6a3a9b2e899cfa/examples/101-dotnet-custom-Prompts/Program.cs#L1-L86
// Microsoft.KernelMemory has more features than Microsoft.SemanticKernel.
// See https://microsoft.github.io/kernel-memory/ for details.
public class KernelMemory
{
Console.WriteLine("Example from: https://github.com/microsoft/kernel-memory/blob/main/examples/101-using-core-nuget/Program.cs");
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
var searchClientConfig = new SearchClientConfig
public static async Task Run()
{
MaxMatchesCount = 1,
AnswerTokens = 100,
};
var memory = new KernelMemoryBuilder()
.WithLLamaSharpDefaults(new LLamaSharpConfig(modelPath)
{
DefaultInferenceParams = new Common.InferenceParams
{
AntiPrompts = new List<string> { "\n\n" }
}
})
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine(
"""
This program uses the Microsoft.KernelMemory package to ingest documents
and answer questions about them in an interactive chat prompt.
""");
// Setup the kernel memory with the LLM model
string modelPath = UserSettings.GetModelPath();
IKernelMemory memory = CreateMemory(modelPath);
// Ingest documents (format is automatically detected from the filename)
string[] filesToIngest = [
Path.GetFullPath(@"./Assets/sample-SK-Readme.pdf"),
Path.GetFullPath(@"./Assets/sample-KM-Readme.pdf"),
];
for (int i = 0; i < filesToIngest.Length; i++)
{
string path = filesToIngest[i];
Stopwatch sw = Stopwatch.StartNew();
Console.ForegroundColor = ConsoleColor.Blue;
Console.WriteLine($"Importing {i + 1} of {filesToIngest.Length}: {path}");
await memory.ImportDocumentAsync(path, steps: Constants.PipelineWithoutSummary);
Console.WriteLine($"Completed in {sw.Elapsed}\n");
}
// Ask a predefined question
Console.ForegroundColor = ConsoleColor.Green;
string question1 = "What formats does KM support";
Console.WriteLine($"Question: {question1}");
await AnswerQuestion(memory, question1);
// Let the user ask additional questions
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("Question: ");
string question = Console.ReadLine()!;
if (string.IsNullOrEmpty(question))
return;
await AnswerQuestion(memory, question);
}
}
private static IKernelMemory CreateMemory(string modelPath)
{
Common.InferenceParams infParams = new() { AntiPrompts = ["\n\n"] };
LLamaSharpConfig lsConfig = new(modelPath) { DefaultInferenceParams = infParams };
SearchClientConfig searchClientConfig = new()
{
MaxMatchesCount = 1,
AnswerTokens = 100,
};
TextPartitioningOptions parseOptions = new()
{
MaxTokensPerParagraph = 300,
MaxTokensPerLine = 100,
OverlappingTokens = 30
};
return new KernelMemoryBuilder()
.WithLLamaSharpDefaults(lsConfig)
.WithSearchClientConfig(searchClientConfig)
.With(new TextPartitioningOptions
{
MaxTokensPerParagraph = 300,
MaxTokensPerLine = 100,
OverlappingTokens = 30
})
.Build();
.With(parseOptions)
.Build();
}
await memory.ImportDocumentAsync(@"./Assets/sample-SK-Readme.pdf", steps: Constants.PipelineWithoutSummary);
var question = "What's Semantic Kernel?";
Console.WriteLine($"\n\nQuestion: {question}");
var answer = await memory.AskAsync(question);
Console.WriteLine($"\nAnswer: {answer.Result}");
Console.WriteLine("\n\n Sources:\n");
foreach (var x in answer.RelevantSources)
private static async Task AnswerQuestion(IKernelMemory memory, string question)
{
Console.WriteLine($" - {x.SourceName} - {x.Link} [{x.Partitions.First().LastUpdate:D}]");
Stopwatch sw = Stopwatch.StartNew();
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"Generating answer...");
MemoryAnswer answer = await memory.AskAsync(question);
Console.WriteLine($"Answer generated in {sw.Elapsed}");
Console.ForegroundColor = ConsoleColor.Gray;
Console.WriteLine($"Answer: {answer.Result}");
foreach (var source in answer.RelevantSources)
{
Console.WriteLine($"Source: {source.SourceName}");
}
Console.WriteLine();
}
}
}
```

View File

@ -0,0 +1,166 @@
# Kernel-memory - save & load
```cs
using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory;
using Microsoft.KernelMemory.Configuration;
using Microsoft.KernelMemory.ContentStorage.DevTools;
using Microsoft.KernelMemory.FileSystem.DevTools;
using Microsoft.KernelMemory.MemoryStorage.DevTools;
using System.Diagnostics;
namespace LLama.Examples.Examples;
// This example shows how to use kernel-memory integration with pre-saved embeddings.
public class KernelMemorySaveAndLoad
{
static string StorageFolder => Path.GetFullPath($"./storage-{nameof(KernelMemorySaveAndLoad)}");
static bool StorageExists => Directory.Exists(StorageFolder) && Directory.GetDirectories(StorageFolder).Length > 0;
public static async Task Run()
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine(
"""
This program uses the Microsoft.KernelMemory package to ingest documents
and store the embeddings as local files so they can be quickly recalled
when this application is launched again.
""");
string modelPath = UserSettings.GetModelPath();
IKernelMemory memory = CreateMemoryWithLocalStorage(modelPath);
Console.ForegroundColor = ConsoleColor.Yellow;
if (StorageExists)
{
Console.WriteLine(
"""
Kernel memory files have been located!
Information about previously analyzed documents has been loaded.
""");
}
else
{
Console.WriteLine(
$"""
Existing kernel memory was not found.
Documents will be analyzed (slow) and information saved to disk.
Analysis will not be required the next time this program is run.
Press ENTER to proceed...
""");
Console.ReadLine();
await IngestDocuments(memory);
}
await AskSingleQuestion(memory, "What formats does KM support?");
await StartUserChatSession(memory);
}
private static IKernelMemory CreateMemoryWithLocalStorage(string modelPath)
{
Common.InferenceParams infParams = new() { AntiPrompts = ["\n\n"] };
LLamaSharpConfig lsConfig = new(modelPath) { DefaultInferenceParams = infParams };
SearchClientConfig searchClientConfig = new()
{
MaxMatchesCount = 1,
AnswerTokens = 100,
};
TextPartitioningOptions parseOptions = new()
{
MaxTokensPerParagraph = 300,
MaxTokensPerLine = 100,
OverlappingTokens = 30
};
SimpleFileStorageConfig storageConfig = new()
{
Directory = StorageFolder,
StorageType = FileSystemTypes.Disk,
};
SimpleVectorDbConfig vectorDbConfig = new()
{
Directory = StorageFolder,
StorageType = FileSystemTypes.Disk,
};
Console.ForegroundColor = ConsoleColor.Blue;
Console.WriteLine($"Kernel memory folder: {StorageFolder}");
Console.ForegroundColor = ConsoleColor.DarkGray;
return new KernelMemoryBuilder()
.WithSimpleFileStorage(storageConfig)
.WithSimpleVectorDb(vectorDbConfig)
.WithLLamaSharpDefaults(lsConfig)
.WithSearchClientConfig(searchClientConfig)
.With(parseOptions)
.Build();
}
private static async Task AskSingleQuestion(IKernelMemory memory, string question)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"Question: {question}");
await ShowAnswer(memory, question);
}
private static async Task StartUserChatSession(IKernelMemory memory)
{
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("Question: ");
string question = Console.ReadLine()!;
if (string.IsNullOrEmpty(question))
return;
await ShowAnswer(memory, question);
}
}
private static async Task IngestDocuments(IKernelMemory memory)
{
string[] filesToIngest = [
Path.GetFullPath(@"./Assets/sample-SK-Readme.pdf"),
Path.GetFullPath(@"./Assets/sample-KM-Readme.pdf"),
];
for (int i = 0; i < filesToIngest.Length; i++)
{
string path = filesToIngest[i];
Stopwatch sw = Stopwatch.StartNew();
Console.ForegroundColor = ConsoleColor.Blue;
Console.WriteLine($"Importing {i + 1} of {filesToIngest.Length}: {path}");
await memory.ImportDocumentAsync(path, steps: Constants.PipelineWithoutSummary);
Console.WriteLine($"Completed in {sw.Elapsed}\n");
}
}
private static async Task ShowAnswer(IKernelMemory memory, string question)
{
Stopwatch sw = Stopwatch.StartNew();
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"Generating answer...");
MemoryAnswer answer = await memory.AskAsync(question);
Console.WriteLine($"Answer generated in {sw.Elapsed}");
Console.ForegroundColor = ConsoleColor.Gray;
Console.WriteLine($"Answer: {answer.Result}");
foreach (var source in answer.RelevantSources)
{
Console.WriteLine($"Source: {source.SourceName}");
}
Console.WriteLine();
}
}
```

View File

@ -0,0 +1,127 @@
# LLaVA - basic
```cs
using System.Text.RegularExpressions;
using LLama.Batched;
using LLama.Common;
using Spectre.Console;
namespace LLama.Examples.Examples
{
// This example shows how to chat with LLaVA model with both image and text as input.
// It uses the interactive executor to inference.
public class LlavaInteractiveModeExecute
{
public static async Task Run()
{
string multiModalProj = UserSettings.GetMMProjPath();
string modelPath = UserSettings.GetModelPath();
string modelImage = UserSettings.GetImagePath();
const int maxTokens = 1024;
var prompt = $"{{{modelImage}}}\nUSER:\nProvide a full description of the image.\nASSISTANT:\n";
var parameters = new ModelParams(modelPath)
{
ContextSize = 4096,
Seed = 1337,
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
// Llava Init
using var clipModel = LLavaWeights.LoadFromFile(multiModalProj);
var ex = new InteractiveExecutor(context, clipModel );
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the prompt is printed, the maximum tokens is set to {0} and the context size is {1}.", maxTokens, parameters.ContextSize );
Console.WriteLine("To send an image, enter its filename in curly braces, like this {c:/image.jpg}.");
var inferenceParams = new InferenceParams() { Temperature = 0.1f, AntiPrompts = new List<string> { "\nUSER:" }, MaxTokens = maxTokens };
do
{
// Evaluate if we have images
//
var imageMatches = Regex.Matches(prompt, "{([^}]*)}").Select(m => m.Value);
var imageCount = imageMatches.Count();
var hasImages = imageCount > 0;
byte[][] imageBytes = null;
if (hasImages)
{
var imagePathsWithCurlyBraces = Regex.Matches(prompt, "{([^}]*)}").Select(m => m.Value);
var imagePaths = Regex.Matches(prompt, "{([^}]*)}").Select(m => m.Groups[1].Value);
try
{
imageBytes = imagePaths.Select(File.ReadAllBytes).ToArray();
}
catch (IOException exception)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.Write(
$"Could not load your {(imageCount == 1 ? "image" : "images")}:");
Console.Write($"{exception.Message}");
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Please try again.");
break;
}
int index = 0;
foreach (var path in imagePathsWithCurlyBraces)
{
// First image replace to tag <image, the rest of the images delete the tag
if (index++ == 0)
prompt = prompt.Replace(path, "<image>");
else
prompt = prompt.Replace(path, "");
}
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"Here are the images, that are sent to the chat model in addition to your message.");
Console.WriteLine();
foreach (var consoleImage in imageBytes?.Select(bytes => new CanvasImage(bytes)))
{
consoleImage.MaxWidth = 50;
AnsiConsole.Write(consoleImage);
}
Console.WriteLine();
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"The images were scaled down for the console only, the model gets full versions.");
Console.WriteLine($"Write /exit or press Ctrl+c to return to main menu.");
Console.WriteLine();
// Initilize Images in executor
//
ex.ImagePaths = imagePaths.ToList();
}
Console.ForegroundColor = Color.White;
await foreach (var text in ex.InferAsync(prompt, inferenceParams))
{
Console.Write(text);
}
Console.Write(" ");
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.WriteLine();
// let the user finish with exit
//
if (prompt.Equals("/exit", StringComparison.OrdinalIgnoreCase))
break;
}
while(true);
}
}
}
```

View File

@ -1,66 +1,84 @@
# Load and save chat session
# ChatSession - load & save
Warning: this example has been outdated for the latest version of LLamaSharp, please refer to [this example](./ChatSessionWithRestart.md) to see how to save and load state for `ChatSession`. If you are using some old versions of LLamaSharp, this example may help you.
```cs
using LLama.Common;
using LLama.OldVersion;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
public class SaveAndLoadSession
namespace LLama.Examples.Examples
{
public static void Run()
public class SaveAndLoadSession
{
Console.Write("Please input your model path: ");
string modelPath = Console.ReadLine();
var prompt = File.ReadAllText("Assets/chat-with-bob.txt").Trim();
InteractiveExecutor ex = new(new LLamaModel(new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5)));
ChatSession session = new ChatSession(ex); // The only change is to remove the transform for the output text stream.
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The chat session has started. In this example, the prompt is printed for better visual result. Input \"save\" to save and reload the session.");
Console.ForegroundColor = ConsoleColor.White;
// show the prompt
Console.Write(prompt);
while (true)
public static async Task Run()
{
foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } }))
{
Console.Write(text);
}
string modelPath = UserSettings.GetModelPath();
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
var prompt = (await File.ReadAllTextAsync("Assets/chat-with-bob.txt")).Trim();
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var ex = new InteractiveExecutor(context);
var session = new ChatSession(ex);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The chat session has started. In this example, the prompt is printed for better visual result. Input \"save\" to save and reload the session.");
Console.ForegroundColor = ConsoleColor.White;
if (prompt == "save")
// show the prompt
Console.Write(prompt);
while (true)
{
Console.Write("Preparing to save the state, please input the path you want to save it: ");
Console.ForegroundColor = ConsoleColor.Green;
var statePath = Console.ReadLine();
session.SaveSession(statePath);
Console.ForegroundColor = ConsoleColor.White;
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Saved session!");
Console.ForegroundColor = ConsoleColor.White;
await foreach (
var text
in session.ChatAsync(
new ChatHistory.Message(AuthorRole.User, prompt),
new InferenceParams()
{
Temperature = 0.6f,
AntiPrompts = new List<string> { "User:" }
}))
{
Console.Write(text);
}
ex.Model.Dispose();
ex = new(new LLamaModel(new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5)));
session = new ChatSession(ex).WithOutputTransform(new LLamaTransforms.KeywordTextOutputStreamTransform(new string[] { "User:", "Bob:" }, redundancyLength: 8));
session.LoadSession(statePath);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Loaded session!");
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Now you can continue your session: ");
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
if (prompt == "save")
{
Console.Write("Preparing to save the state, please input the path you want to save it: ");
Console.ForegroundColor = ConsoleColor.Green;
var statePath = Console.ReadLine();
session.SaveSession(statePath);
Console.ForegroundColor = ConsoleColor.White;
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Saved session!");
Console.ForegroundColor = ConsoleColor.White;
ex.Context.Dispose();
ex = new(new LLamaContext(model, parameters));
session = new ChatSession(ex);
session.LoadSession(statePath);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Loaded session!");
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Now you can continue your session: ");
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
}
}
}
}
}
```

View File

@ -1,67 +1,76 @@
# Load and save model/executor state
# Executor - save/load state
```cs
using LLama.Common;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
public class LoadAndSaveState
namespace LLama.Examples.Examples
{
public static void Run()
// This example shows how to save/load state of the executor.
public class LoadAndSaveState
{
Console.Write("Please input your model path: ");
string modelPath = Console.ReadLine();
var prompt = File.ReadAllText("Assets/chat-with-bob.txt").Trim();
InteractiveExecutor ex = new(new LLamaModel(new ModelParams(modelPath, contextSize: 256)));
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the prompt is printed, the maximum tokens is set to 64 and the context size is 256. (an example for small scale usage)");
Console.ForegroundColor = ConsoleColor.White;
Console.Write(prompt);
var inferenceParams = new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } };
while (true)
public static async Task Run()
{
foreach (var text in ex.Infer(prompt, inferenceParams))
string modelPath = UserSettings.GetModelPath();
var prompt = (await File.ReadAllTextAsync("Assets/chat-with-bob.txt")).Trim();
var parameters = new ModelParams(modelPath)
{
Console.Write(text);
}
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var ex = new InteractiveExecutor(context);
prompt = Console.ReadLine();
if (prompt == "save")
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the prompt is printed, " +
"the maximum tokens is set to 64 and the context size is 256. (an example for small scale usage)");
Console.ForegroundColor = ConsoleColor.White;
Console.Write(prompt);
var inferenceParams = new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } };
while (true)
{
Console.Write("Your path to save model state: ");
string modelStatePath = Console.ReadLine();
ex.Model.SaveState(modelStatePath);
await foreach (var text in ex.InferAsync(prompt, inferenceParams))
{
Console.Write(text);
}
Console.Write("Your path to save executor state: ");
string executorStatePath = Console.ReadLine();
ex.SaveState(executorStatePath);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("All states saved!");
Console.ForegroundColor = ConsoleColor.White;
var model = ex.Model;
model.LoadState(modelStatePath);
ex = new InteractiveExecutor(model);
ex.LoadState(executorStatePath);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Loaded state!");
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Now you can continue your session: ");
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
if (prompt == "save")
{
Console.Write("Your path to save model state: ");
var modelStatePath = Console.ReadLine();
ex.Context.SaveState(modelStatePath);
Console.Write("Your path to save executor state: ");
var executorStatePath = Console.ReadLine();
await ex.SaveState(executorStatePath);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("All states saved!");
Console.ForegroundColor = ConsoleColor.White;
var ctx = ex.Context;
ctx.LoadState(modelStatePath);
ex = new InteractiveExecutor(ctx);
await ex.LoadState(executorStatePath);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Loaded state!");
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Now you can continue your session: ");
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
}
}
}
}
}
```

View File

@ -1,30 +1,28 @@
# Quantize model
# Quantization
```cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
public class QuantizeModel
namespace LLama.Examples.Examples
{
public static void Run()
public class QuantizeModel
{
Console.Write("Please input your original model path: ");
var inputPath = Console.ReadLine();
Console.Write("Please input your output model path: ");
var outputPath = Console.ReadLine();
Console.Write("Please input the quantize type (one of q4_0, q4_1, q5_0, q5_1, q8_0): ");
var quantizeType = Console.ReadLine();
if (LLamaQuantizer.Quantize(inputPath, outputPath, quantizeType))
public static void Run()
{
Console.WriteLine("Quantization succeed!");
}
else
{
Console.WriteLine("Quantization failed!");
string inputPath = UserSettings.GetModelPath();
Console.Write("Please input your output model path: ");
var outputPath = Console.ReadLine();
Console.Write("Please input the quantize type (one of q4_0, q4_1, q5_0, q5_1, q8_0): ");
var quantizeType = Console.ReadLine();
if (LLamaQuantizer.Quantize(inputPath, outputPath, quantizeType))
{
Console.WriteLine("Quantization succeeded!");
}
else
{
Console.WriteLine("Quantization failed!");
}
}
}
}

View File

@ -0,0 +1,67 @@
# Semantic-kernel - chat
```cs
using LLama.Common;
using LLamaSharp.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.ChatCompletion;
namespace LLama.Examples.Examples
{
public class SemanticKernelChat
{
public static async Task Run()
{
string modelPath = UserSettings.GetModelPath();
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("This example is from: \n" +
"https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example17_ChatGPT.cs");
// Load weights into memory
var parameters = new ModelParams(modelPath);
using var model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);
var chatGPT = new LLamaSharpChatCompletion(ex);
var chatHistory = chatGPT.CreateNewChat("This is a conversation between the " +
"assistant and the user. \n\n You are a librarian, expert about books. ");
Console.WriteLine("Chat content:");
Console.WriteLine("------------------------");
chatHistory.AddUserMessage("Hi, I'm looking for book suggestions");
await MessageOutputAsync(chatHistory);
// First bot assistant message
var reply = await chatGPT.GetChatMessageContentAsync(chatHistory);
chatHistory.AddAssistantMessage(reply.Content);
await MessageOutputAsync(chatHistory);
// Second user message
chatHistory.AddUserMessage("I love history and philosophy, I'd like to learn " +
"something new about Greece, any suggestion");
await MessageOutputAsync(chatHistory);
// Second bot assistant message
reply = await chatGPT.GetChatMessageContentAsync(chatHistory);
chatHistory.AddAssistantMessage(reply.Content);
await MessageOutputAsync(chatHistory);
}
/// <summary>
/// Outputs the last message of the chat history
/// </summary>
private static Task MessageOutputAsync(Microsoft.SemanticKernel.ChatCompletion.ChatHistory chatHistory)
{
var message = chatHistory.Last();
Console.WriteLine($"{message.Role}: {message.Content}");
Console.WriteLine("------------------------");
return Task.CompletedTask;
}
}
}
```

View File

@ -1,169 +1,170 @@
# Semantic kernel memory
# Semantic-kernel - with kernel-memory
Semantic Memory allows to store your data like traditional DBs, adding the ability to query it using natural language.
```cs
using LLama.Common;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Memory;
using LLamaSharp.SemanticKernel.TextEmbedding;
using Microsoft.SemanticKernel.AI.Embeddings;
using Microsoft.SemanticKernel.Plugins.Memory;
public class SemanticKernelMemory
namespace LLama.Examples.Examples
{
private const string MemoryCollectionName = "SKGitHub";
public static async Task Run()
public class SemanticKernelMemory
{
var loggerFactory = ConsoleLogger.LoggerFactory;
Console.WriteLine("Example from: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example14_SemanticMemory.cs");
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
private const string MemoryCollectionName = "SKGitHub";
var seed = 1337u;
// Load weights into memory
var parameters = new ModelParams(modelPath)
public static async Task Run()
{
Seed = seed,
EmbeddingMode = true
};
string modelPath = UserSettings.GetModelPath();
using var model = LLamaWeights.LoadFromFile(parameters);
var embedding = new LLamaEmbedder(model, parameters);
Console.WriteLine("This example is from: \n" +
"https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example14_SemanticMemory.cs");
Console.WriteLine("====================================================");
Console.WriteLine("======== Semantic Memory (volatile, in RAM) ========");
Console.WriteLine("====================================================");
var seed = 1337u;
// Load weights into memory
var parameters = new ModelParams(modelPath)
{
Seed = seed,
EmbeddingMode = true
};
/* You can build your own semantic memory combining an Embedding Generator
* with a Memory storage that supports search by similarity (ie semantic search).
*
* In this example we use a volatile memory, a local simulation of a vector DB.
*
* You can replace VolatileMemoryStore with Qdrant (see QdrantMemoryStore connector)
* or implement your connectors for Pinecone, Vespa, Postgres + pgvector, SQLite VSS, etc.
*/
using var model = LLamaWeights.LoadFromFile(parameters);
var embedding = new LLamaEmbedder(model, parameters);
var memory = new MemoryBuilder()
.WithTextEmbeddingGeneration(new LLamaSharpEmbeddingGeneration(embedding))
.WithMemoryStore(new VolatileMemoryStore())
.Build();
Console.WriteLine("====================================================");
Console.WriteLine("======== Semantic Memory (volatile, in RAM) ========");
Console.WriteLine("====================================================");
await RunExampleAsync(memory);
}
/* You can build your own semantic memory combining an Embedding Generator
* with a Memory storage that supports search by similarity (ie semantic search).
*
* In this example we use a volatile memory, a local simulation of a vector DB.
*
* You can replace VolatileMemoryStore with Qdrant (see QdrantMemoryStore connector)
* or implement your connectors for Pinecone, Vespa, Postgres + pgvector, SQLite VSS, etc.
*/
private static async Task RunExampleAsync(ISemanticTextMemory memory)
{
await StoreMemoryAsync(memory);
var memory = new MemoryBuilder()
.WithTextEmbeddingGeneration(new LLamaSharpEmbeddingGeneration(embedding))
.WithMemoryStore(new VolatileMemoryStore())
.Build();
await SearchMemoryAsync(memory, "How do I get started?");
/*
Output:
Query: How do I get started?
Result 1:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Result 2:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/dotnet-jupyter-notebooks/00-getting-started.ipynb
Title : Jupyter notebook describing how to get started with the Semantic Kernel
*/
await SearchMemoryAsync(memory, "Can I build a chat with SK?");
/*
Output:
Query: Can I build a chat with SK?
Result 1:
URL: : https://github.com/microsoft/semantic-kernel/tree/main/samples/skills/ChatSkill/ChatGPT
Title : Sample demonstrating how to create a chat skill interfacing with ChatGPT
Result 2:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/apps/chat-summary-webapp-react/README.md
Title : README: README associated with a sample chat summary react-based webapp
*/
await SearchMemoryAsync(memory, "Jupyter notebook");
await SearchMemoryAsync(memory, "README: README associated with a sample chat summary react-based webapp");
await SearchMemoryAsync(memory, "Jupyter notebook describing how to pass prompts from a file to a semantic skill or function");
}
private static async Task SearchMemoryAsync(ISemanticTextMemory memory, string query)
{
Console.WriteLine("\nQuery: " + query + "\n");
var memories = memory.SearchAsync(MemoryCollectionName, query, limit: 10, minRelevanceScore: 0.5);
int i = 0;
await foreach (MemoryQueryResult result in memories)
{
Console.WriteLine($"Result {++i}:");
Console.WriteLine(" URL: : " + result.Metadata.Id);
Console.WriteLine(" Title : " + result.Metadata.Description);
Console.WriteLine(" Relevance: " + result.Relevance);
Console.WriteLine();
await RunExampleAsync(memory);
}
Console.WriteLine("----------------------");
}
private static async Task RunExampleAsync(ISemanticTextMemory memory)
{
await StoreMemoryAsync(memory);
await SearchMemoryAsync(memory, "How do I get started?");
/*
Output:
Query: How do I get started?
Result 1:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Result 2:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/dotnet-jupyter-notebooks/00-getting-started.ipynb
Title : Jupyter notebook describing how to get started with the Semantic Kernel
private static async Task StoreMemoryAsync(ISemanticTextMemory memory)
{
/* Store some data in the semantic memory.
*
* When using Azure Cognitive Search the data is automatically indexed on write.
*
* When using the combination of VolatileStore and Embedding generation, SK takes
* care of creating and storing the index
*/
Console.WriteLine("\nAdding some GitHub file URLs and their descriptions to the semantic memory.");
var githubFiles = SampleData();
var i = 0;
foreach (var entry in githubFiles)
{
var result = await memory.SaveReferenceAsync(
collection: MemoryCollectionName,
externalSourceName: "GitHub",
externalId: entry.Key,
description: entry.Value,
text: entry.Value);
await SearchMemoryAsync(memory, "Can I build a chat with SK?");
Console.WriteLine($"#{++i} saved.");
Console.WriteLine(result);
/*
Output:
Query: Can I build a chat with SK?
Result 1:
URL: : https://github.com/microsoft/semantic-kernel/tree/main/samples/skills/ChatSkill/ChatGPT
Title : Sample demonstrating how to create a chat skill interfacing with ChatGPT
Result 2:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/apps/chat-summary-webapp-react/README.md
Title : README: README associated with a sample chat summary react-based webapp
*/
await SearchMemoryAsync(memory, "Jupyter notebook");
await SearchMemoryAsync(memory, "README: README associated with a sample chat summary react-based webapp");
await SearchMemoryAsync(memory, "Jupyter notebook describing how to pass prompts from a file to a semantic skill or function");
}
Console.WriteLine("\n----------------------");
}
private static Dictionary<string, string> SampleData()
{
return new Dictionary<string, string>
private static async Task SearchMemoryAsync(ISemanticTextMemory memory, string query)
{
["https://github.com/microsoft/semantic-kernel/blob/main/README.md"]
= "README: Installation, getting started, and how to contribute",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb"]
= "Jupyter notebook describing how to pass prompts from a file to a semantic skill or function",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks//00-getting-started.ipynb"]
= "Jupyter notebook describing how to get started with the Semantic Kernel",
["https://github.com/microsoft/semantic-kernel/tree/main/samples/skills/ChatSkill/ChatGPT"]
= "Sample demonstrating how to create a chat skill interfacing with ChatGPT",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/SemanticKernel/Memory/VolatileMemoryStore.cs"]
= "C# class that defines a volatile embedding store",
["https://github.com/microsoft/semantic-kernel/blob/main/samples/dotnet/KernelHttpServer/README.md"]
= "README: How to set up a Semantic Kernel Service API using Azure Function Runtime v4",
["https://github.com/microsoft/semantic-kernel/blob/main/samples/apps/chat-summary-webapp-react/README.md"]
= "README: README associated with a sample chat summary react-based webapp",
};
Console.WriteLine("\nQuery: " + query + "\n");
var memories = memory.SearchAsync(MemoryCollectionName, query, limit: 10, minRelevanceScore: 0.5);
int i = 0;
await foreach (MemoryQueryResult result in memories)
{
Console.WriteLine($"Result {++i}:");
Console.WriteLine(" URL: : " + result.Metadata.Id);
Console.WriteLine(" Title : " + result.Metadata.Description);
Console.WriteLine(" Relevance: " + result.Relevance);
Console.WriteLine();
}
Console.WriteLine("----------------------");
}
private static async Task StoreMemoryAsync(ISemanticTextMemory memory)
{
/* Store some data in the semantic memory.
*
* When using Azure Cognitive Search the data is automatically indexed on write.
*
* When using the combination of VolatileStore and Embedding generation, SK takes
* care of creating and storing the index
*/
Console.WriteLine("\nAdding some GitHub file URLs and their descriptions to the semantic memory.");
var githubFiles = SampleData();
var i = 0;
foreach (var entry in githubFiles)
{
var result = await memory.SaveReferenceAsync(
collection: MemoryCollectionName,
externalSourceName: "GitHub",
externalId: entry.Key,
description: entry.Value,
text: entry.Value);
Console.WriteLine($"#{++i} saved.");
Console.WriteLine(result);
}
Console.WriteLine("\n----------------------");
}
private static Dictionary<string, string> SampleData()
{
return new Dictionary<string, string>
{
["https://github.com/microsoft/semantic-kernel/blob/main/README.md"]
= "README: Installation, getting started, and how to contribute",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb"]
= "Jupyter notebook describing how to pass prompts from a file to a semantic skill or function",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks//00-getting-started.ipynb"]
= "Jupyter notebook describing how to get started with the Semantic Kernel",
["https://github.com/microsoft/semantic-kernel/tree/main/samples/skills/ChatSkill/ChatGPT"]
= "Sample demonstrating how to create a chat skill interfacing with ChatGPT",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/SemanticKernel/Memory/VolatileMemoryStore.cs"]
= "C# class that defines a volatile embedding store",
["https://github.com/microsoft/semantic-kernel/blob/main/samples/dotnet/KernelHttpServer/README.md"]
= "README: How to set up a Semantic Kernel Service API using Azure Function Runtime v4",
["https://github.com/microsoft/semantic-kernel/blob/main/samples/apps/chat-summary-webapp-react/README.md"]
= "README: README associated with a sample chat summary react-based webapp",
};
}
}
}

View File

@ -1,7 +1,6 @@
# Semantic kernel mode
# Semantic-kernel - basic
```cs
using System.Security.Cryptography;
using LLama.Common;
using LLamaSharp.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel;
@ -9,47 +8,51 @@ using LLamaSharp.SemanticKernel.TextCompletion;
using Microsoft.SemanticKernel.TextGeneration;
using Microsoft.Extensions.DependencyInjection;
public class SemanticKernelPrompt
namespace LLama.Examples.Examples
{
public static async Task Run()
// The basic example for using the semantic-kernel integration
public class SemanticKernelPrompt
{
Console.WriteLine("Example from: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/README.md");
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
public static async Task Run()
{
string modelPath = UserSettings.GetModelPath();
// Load weights into memory
var parameters = new ModelParams(modelPath);
using var model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("This example is from: " +
"https://github.com/microsoft/semantic-kernel/blob/main/dotnet/README.md");
var builder = Kernel.CreateBuilder();
builder.Services.AddKeyedSingleton<ITextGenerationService>("local-llama", new LLamaSharpTextCompletion(ex));
// Load weights into memory
var parameters = new ModelParams(modelPath);
using var model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);
var kernel = builder.Build();
var builder = Kernel.CreateBuilder();
builder.Services.AddKeyedSingleton<ITextGenerationService>("local-llama", new LLamaSharpTextCompletion(ex));
var prompt = @"{{$input}}
var kernel = builder.Build();
var prompt = @"{{$input}}
One line TLDR with the fewest words.";
ChatRequestSettings settings = new() { MaxTokens = 100 };
var summarize = kernel.CreateFunctionFromPrompt(prompt, settings);
ChatRequestSettings settings = new() { MaxTokens = 100 };
var summarize = kernel.CreateFunctionFromPrompt(prompt, settings);
string text1 = @"
string text1 = @"
1st Law of Thermodynamics - Energy cannot be created or destroyed.
2nd Law of Thermodynamics - For a spontaneous process, the entropy of the universe increases.
3rd Law of Thermodynamics - A perfect crystal at zero Kelvin has zero entropy.";
string text2 = @"
string text2 = @"
1. An object at rest remains at rest, and an object in motion remains in motion at constant speed and in a straight line unless acted on by an unbalanced force.
2. The acceleration of an object depends on the mass of the object and the amount of force applied.
3. Whenever one object exerts a force on another object, the second object exerts an equal and opposite on the first.";
Console.WriteLine((await kernel.InvokeAsync(summarize, new() { ["input"] = text1 })).GetValue<string>());
Console.WriteLine((await kernel.InvokeAsync(summarize, new() { ["input"] = text1 })).GetValue<string>());
Console.WriteLine((await kernel.InvokeAsync(summarize, new() { ["input"] = text2 })).GetValue<string>());
Console.WriteLine((await kernel.InvokeAsync(summarize, new() { ["input"] = text2 })).GetValue<string>());
}
}
}
```

View File

@ -1,44 +1,51 @@
# Use stateless executor
# Stateless executor
```cs
using LLama.Common;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using LLama.Examples.Extensions;
public class StatelessModeExecute
namespace LLama.Examples.Examples
{
public static void Run()
// Basic usage of the stateless executor.
public class StatelessModeExecute
{
Console.Write("Please input your model path: ");
string modelPath = Console.ReadLine();
StatelessExecutor ex = new(new LLamaModel(new ModelParams(modelPath, contextSize: 256)));
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the inference is an one-time job. That says, the previous input and response has " +
"no impact on the current response. Now you can ask it questions. Note that in this example, no prompt was set for LLM and the maximum response tokens is 50. " +
"It may not perform well because of lack of prompt. This is also an example that could indicate the improtance of prompt in LLM. To improve it, you can add " +
"a prompt for it yourself!");
Console.ForegroundColor = ConsoleColor.White;
var inferenceParams = new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "Question:", "#", "Question: ", ".\n" }, MaxTokens = 50 };
while (true)
public static async Task Run()
{
Console.Write("\nQuestion: ");
Console.ForegroundColor = ConsoleColor.Green;
string prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Answer: ");
prompt = $"Question: {prompt.Trim()} Answer: ";
foreach (var text in ex.Infer(prompt, inferenceParams))
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath)
{
Console.Write(text);
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the inference is an one-time job. That says, the previous input and response has " +
"no impact on the current response. Now you can ask it questions. Note that in this example, no prompt was set for LLM and the maximum response tokens is 50. " +
"It may not perform well because of lack of prompt. This is also an example that could indicate the importance of prompt in LLM. To improve it, you can add " +
"a prompt for it yourself!");
Console.ForegroundColor = ConsoleColor.White;
var inferenceParams = new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "Question:", "#", "Question: ", ".\n" }, MaxTokens = 50 };
while (true)
{
Console.Write("\nQuestion: ");
Console.ForegroundColor = ConsoleColor.Green;
var prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
Console.Write("Answer: ");
prompt = $"Question: {prompt?.Trim()} Answer: ";
await foreach (var text in ex.InferAsync(prompt, inferenceParams).Spinner())
{
Console.Write(text);
}
}
}
}
}
```

View File

@ -1,72 +1,74 @@
# Talk to yourself
```cs
using System.Security.Cryptography;
using System.Text;
using LLama.Abstractions;
using LLama.Common;
public class TalkToYourself
namespace LLama.Examples.Examples
{
public static async Task Run()
// Let two bots chat with each other.
public class TalkToYourself
{
Console.Write("Please input your model path: ");
var modelPath = Console.ReadLine();
// Load weights into memory
var @params = new ModelParams(modelPath);
using var weights = LLamaWeights.LoadFromFile(@params);
// Create 2 contexts sharing the same weights
using var aliceCtx = weights.CreateContext(@params);
var alice = new InteractiveExecutor(aliceCtx);
using var bobCtx = weights.CreateContext(@params);
var bob = new InteractiveExecutor(bobCtx);
// Initial alice prompt
var alicePrompt = "Transcript of a dialog, where the Alice interacts a person named Bob. Alice is friendly, kind, honest and good at writing.\nAlice: Hello";
var aliceResponse = await Prompt(alice, ConsoleColor.Green, alicePrompt, false, false);
// Initial bob prompt
var bobPrompt = $"Transcript of a dialog, where the Bob interacts a person named Alice. Bob is smart, intellectual and good at writing.\nAlice: Hello{aliceResponse}";
var bobResponse = await Prompt(bob, ConsoleColor.Red, bobPrompt, true, true);
// swap back and forth from Alice to Bob
while (true)
public static async Task Run()
{
aliceResponse = await Prompt(alice, ConsoleColor.Green, bobResponse, false, true);
bobResponse = await Prompt(bob, ConsoleColor.Red, aliceResponse, false, true);
string modelPath = UserSettings.GetModelPath();
if (Console.KeyAvailable)
break;
}
}
// Load weights into memory
var @params = new ModelParams(modelPath);
using var weights = LLamaWeights.LoadFromFile(@params);
private static async Task<string> Prompt(ILLamaExecutor executor, ConsoleColor color, string prompt, bool showPrompt, bool showResponse)
{
var inferenceParams = new InferenceParams
{
Temperature = 0.9f,
AntiPrompts = new List<string> { "Alice:", "Bob:", "User:" },
MaxTokens = 128,
Mirostat = MirostatType.Mirostat2,
MirostatTau = 10,
};
// Create 2 contexts sharing the same weights
using var aliceCtx = weights.CreateContext(@params);
var alice = new InteractiveExecutor(aliceCtx);
using var bobCtx = weights.CreateContext(@params);
var bob = new InteractiveExecutor(bobCtx);
Console.ForegroundColor = ConsoleColor.White;
if (showPrompt)
Console.Write(prompt);
// Initial alice prompt
var alicePrompt = "Transcript of a dialog, where the Alice interacts a person named Bob. Alice is friendly, kind, honest and good at writing.\nAlice: Hello";
var aliceResponse = await Prompt(alice, ConsoleColor.Green, alicePrompt, false, false);
Console.ForegroundColor = color;
var builder = new StringBuilder();
await foreach (var text in executor.InferAsync(prompt, inferenceParams))
{
builder.Append(text);
if (showResponse)
Console.Write(text);
// Initial bob prompt
var bobPrompt = $"Transcript of a dialog, where the Bob interacts a person named Alice. Bob is smart, intellectual and good at writing.\nAlice: Hello{aliceResponse}";
var bobResponse = await Prompt(bob, ConsoleColor.Red, bobPrompt, true, true);
// swap back and forth from Alice to Bob
while (true)
{
aliceResponse = await Prompt(alice, ConsoleColor.Green, bobResponse, false, true);
bobResponse = await Prompt(bob, ConsoleColor.Red, aliceResponse, false, true);
if (Console.KeyAvailable)
break;
}
}
return builder.ToString();
private static async Task<string> Prompt(ILLamaExecutor executor, ConsoleColor color, string prompt, bool showPrompt, bool showResponse)
{
var inferenceParams = new InferenceParams
{
Temperature = 0.9f,
AntiPrompts = new List<string> { "Alice:", "Bob:", "User:" },
MaxTokens = 128,
Mirostat = MirostatType.Mirostat2,
MirostatTau = 10,
};
Console.ForegroundColor = ConsoleColor.White;
if (showPrompt)
Console.Write(prompt);
Console.ForegroundColor = color;
var builder = new StringBuilder();
await foreach (var text in executor.InferAsync(prompt, inferenceParams))
{
builder.Append(text);
if (showResponse)
Console.Write(text);
}
return builder.ToString();
}
}
}

64
docs/FAQ.md Normal file
View File

@ -0,0 +1,64 @@
# Frequently asked qustions
Sometimes, your application with LLM and LLamaSharp may have unexpected behaviours. Here are some frequently asked questions, which may help you to deal with your problem.
## Why GPU is not used when I have installed CUDA
1. If you are using backend packages, please make sure you have installed the cuda backend package which matches the cuda version of your device. Please note that before LLamaSharp v0.10.0, only one backend package should be installed.
2. Add `NativeLibraryConfig.Instance.WithLogs(LLamaLogLevel.Info)` to the very beginning of your code. The log will show which native library file is loaded. If the CPU library is loaded, please try to compile the native library yourself and open an issue for that. If the CUDA libraty is loaded, please check if `GpuLayerCount > 0` when loading the model weight.
## Why the inference is slow
Firstly, due to the large size of LLM models, it requires more time to generate outputs than other models, especially when you are using models larger than 30B.
To see if that's a LLamaSharp performance issue, please follow the two tips below.
1. If you are using CUDA, Metal or OpenCL, please set `GpuLayerCount` as large as possible.
2. If it's still slower than you expect it to be, please try to run the same model with same setting in [llama.cpp examples](https://github.com/ggerganov/llama.cpp/tree/master/examples). If llama.cpp outperforms LLamaSharp significantly, it's likely a LLamaSharp BUG and please report us for that.
## Why the program crashes before any output is generated
Generally, there are two possible cases for this problem:
1. The native library (backend) you are using is not compatible with the LLamaSharp version. If you compiled the native library yourself, please make sure you have checkouted llama.cpp to the corresponding commit of LLamaSharp, which could be found at the bottom of README.
2. The model file you are using is not compatible with the backend. If you are using a GGUF file downloaded from huggingface, please check its publishing time.
## Why my model is generating output infinitely
Please set anti-prompt or max-length when executing the inference.
Anti-prompt can also be called as "Stop-keyword", which decides when to stop the response generation. Under interactive mode, the maximum tokens count is always not set, which makes the LLM generates responses infinitively. Therefore, setting anti-prompt correctly helps a lot to avoid the strange behaviours. For example, the prompt file `chat-with-bob.txt` has the following content:
```
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
```
Therefore, the anti-prompt should be set as "User:". If the last line of the prompt is removed, LLM will automatically generate a question (user) and a response (bob) for one time when running the chat session. Therefore, the antiprompt is suggested to be appended to the prompt when starting a chat session.
What if an extra line is appended? The string "User:" in the prompt will be followed with a char "\n". Thus when running the model, the automatic generation of a pair of question and response may appear because the anti-prompt is "User:" but the last token is "User:\n". As for whether it will appear, it's an undefined behaviour, which depends on the implementation inside the `LLamaExecutor`. Anyway, since it may leads to unexpected behaviors, it's recommended to trim your prompt or carefully keep consistent with your anti-prompt.
## How to run LLM with non-English languages
English is the most popular language in the world, and in the region of LLM. If you want to accept inputs and generate outputs of other languages, please follow the two tips below.
1. Ensure the model you selected is well-trained with data of your language. For example, [LLaMA](https://github.com/meta-llama/llama) (original) used few Chinese text during the pretrain, while [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) finetuned LLaMA with a large amount of Chinese text data. Therefore, the quality of the output of Chinese-LLaMA-Alpaca is much better than that of LLaMA.
## Pay attention to the length of prompt
Sometimes we want to input a long prompt to execute a task. However, the context size may limit the inference of LLama model. Please ensure the inequality below holds.
$$ len(prompt) + len(response) < len(context) $$
In this inequality, `len(response)` refers to the expected tokens for LLM to generate.
## Choose models weight depending on you task
The differences between modes may lead to much different behaviours under the same task. For example, if you're building a chat bot with non-English, a fine-tuned model specially for the language you want to use will have huge effect on the performance.

View File

@ -1,118 +0,0 @@
# Get Started
## Install packages
Firstly, search `LLamaSharp` in nuget package manager and install it.
```
PM> Install-Package LLamaSharp
```
Then, search and install one of the following backends:
```
LLamaSharp.Backend.Cpu
LLamaSharp.Backend.Cuda11
LLamaSharp.Backend.Cuda12
```
Here's the mapping of them and corresponding model samples provided by `LLamaSharp`. If you're not sure which model is available for a version, please try our sample model.
| LLamaSharp.Backend | LLamaSharp | Verified Model Resources | llama.cpp commit id |
| - | - | -- | - |
| - | v0.2.0 | This version is not recommended to use. | - |
| - | v0.2.1 | [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/previous_llama), [Vicuna (filenames with "old")](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main) | - |
| v0.2.2 | v0.2.2, v0.2.3 | [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/previous_llama_ggmlv2), [Vicuna (filenames without "old")](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main) | 63d2046 |
| v0.3.0 | v0.3.0 | [LLamaSharpSamples v0.3.0](https://huggingface.co/AsakusaRinne/LLamaSharpSamples/tree/v0.3.0), [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/main) | 7e4ea5b |
## Download a model
One of the following models could be okay:
- LLaMA 🦙
- [Alpaca](https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca)
- [GPT4All](https://github.com/ggerganov/llama.cpp#using-gpt4all)
- [Chinese LLaMA / Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
- [Vigogne (French)](https://github.com/bofenghuang/vigogne)
- [Vicuna](https://github.com/ggerganov/llama.cpp/discussions/643#discussioncomment-5533894)
- [Koala](https://bair.berkeley.edu/blog/2023/04/03/koala/)
- [OpenBuddy 🐶 (Multilingual)](https://github.com/OpenBuddy/OpenBuddy)
- [Pygmalion 7B / Metharme 7B](#using-pygmalion-7b--metharme-7b)
- [WizardLM](https://github.com/nlpxucan/WizardLM)
**Note that because `llama.cpp` is under fast development now and often introduce break changes, some model weights on huggingface which works under a version may be invalid with another version. If it's your first time to configure LLamaSharp, we'd like to suggest for using verified model weights in the table above.**
## Run the program
Please create a console program with dotnet runtime >= netstandard 2.0 (>= net6.0 is more recommended). Then, paste the following code to `program.cs`;
```cs
using LLama.Common;
using LLama;
string modelPath = "<Your model path>" // change it to your own model path
var prompt = "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\r\n\r\nUser: Hello, Bob.\r\nBob: Hello. How may I help you today?\r\nUser: Please tell me the largest city in Europe.\r\nBob: Sure. The largest city in Europe is Moscow, the capital of Russia.\r\nUser:"; // use the "chat-with-bob" prompt here.
// Load model
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024
};
using var model = LLamaWeights.LoadFromFile(parameters);
// Initialize a chat session
using var context = model.CreateContext(parameters);
var ex = new InteractiveExecutor(context);
ChatSession session = new ChatSession(ex);
// show the prompt
Console.WriteLine();
Console.Write(prompt);
// run the inference in a loop to chat with LLM
while (true)
{
await foreach (var text in session.ChatAsync(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } }))
{
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
}
```
After starting it, you'll see the following outputs.
```
Please input your model path: D:\development\llama\weights\wizard-vicuna-13B.ggmlv3.q4_1.bin
llama.cpp: loading model from D:\development\llama\weights\wizard-vicuna-13B.ggmlv3.q4_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1024
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 7759.48 MB
llama_model_load_internal: mem required = 9807.48 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 800.00 MB
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
```
Now, enjoy chatting with LLM!

View File

@ -1,3 +0,0 @@
# The Usage of semantic-kernel Integration
Please see [this doc](../../LLama.SemanticKernel/README.md)

View File

@ -1,3 +1,3 @@
# The Usage of BotSharp Integration
# BotSharp integration
The document is under work, please have a wait. Thank you for your support! :)

View File

@ -0,0 +1,3 @@
# LLamaSharp.kernel-memory
The document is under work, please have a wait. Thank you for your support! :)

View File

@ -0,0 +1,3 @@
# Langchain integration
The document is under work, please have a wait. Thank you for your support! :)

View File

@ -0,0 +1,38 @@
# LLamaSharp.SemanticKernel
LLamaSharp.SemanticKernel are connections for [SemanticKernel](https://github.com/microsoft/semantic-kernel): an SDK for integrating various LLM interfaces into a single implementation. With this, you can add local LLaMa queries as another connection point with your existing connections.
For reference on how to implement it, view the following examples:
- [SemanticKernelChat](../LLama.Examples/Examples/SemanticKernelChat.cs)
- [SemanticKernelPrompt](../LLama.Examples/Examples/SemanticKernelPrompt.cs)
- [SemanticKernelMemory](../LLama.Examples/Examples/SemanticKernelMemory.cs)
## ITextCompletion
```csharp
using var model = LLamaWeights.LoadFromFile(parameters);
// LLamaSharpTextCompletion can accept ILLamaExecutor.
var ex = new StatelessExecutor(model, parameters);
var builder = new KernelBuilder();
builder.WithAIService<ITextCompletion>("local-llama", new LLamaSharpTextCompletion(ex), true);
```
## IChatCompletion
```csharp
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
// LLamaSharpChatCompletion requires InteractiveExecutor, as it's the best fit for the given command.
var ex = new InteractiveExecutor(context);
var chatGPT = new LLamaSharpChatCompletion(ex);
```
## ITextEmbeddingGeneration
```csharp
using var model = LLamaWeights.LoadFromFile(parameters);
var embedding = new LLamaEmbedder(model, parameters);
var kernelWithCustomDb = Kernel.Builder
.WithLoggerFactory(ConsoleLogger.LoggerFactory)
.WithAIService<ITextEmbeddingGeneration>("local-llama-embed", new LLamaSharpEmbeddingGeneration(embedding), true)
.WithMemoryStorage(new VolatileMemoryStore())
.Build();
```

View File

@ -1,208 +0,0 @@
# LLamaModel Parameters
When initializing a `LLamaModel` object, there're three parameters, `ModelParams Params, string encoding = "UTF-8", ILLamaLogger? logger = null`.
The usage of `logger` will be further introduced in [logger doc](../More/log.md). The `encoding` is the encoding you want to use when dealing with text via this model.
The most important of all, is the `ModelParams`, which is defined as below. We'll explain the parameters step by step in this document.
```cs
public class ModelParams
{
public int ContextSize { get; set; } = 512;
public int GpuLayerCount { get; set; } = 20;
public int Seed { get; set; } = 1686349486;
public bool UseFp16Memory { get; set; } = true;
public bool UseMemorymap { get; set; } = true;
public bool UseMemoryLock { get; set; } = false;
public bool Perplexity { get; set; } = false;
public string ModelPath { get; set; }
public string LoraAdapter { get; set; } = string.Empty;
public string LoraBase { get; set; } = string.Empty;
public int Threads { get; set; } = Math.Max(Environment.ProcessorCount / 2, 1);
public int BatchSize { get; set; } = 512;
public bool ConvertEosToNewLine { get; set; } = false;
}
```
# ModelParams
Namespace: LLama.Common
```csharp
public class ModelParams
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ModelParams]()
## Properties
### **ContextSize**
Model context size (n_ctx)
```csharp
public int ContextSize { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **GpuLayerCount**
Number of layers to run in VRAM / GPU memory (n_gpu_layers)
```csharp
public int GpuLayerCount { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **Seed**
Seed for the random number generator (seed)
```csharp
public int Seed { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **UseFp16Memory**
Use f16 instead of f32 for memory kv (memory_f16)
```csharp
public bool UseFp16Memory { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **UseMemorymap**
Use mmap for faster loads (use_mmap)
```csharp
public bool UseMemorymap { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **UseMemoryLock**
Use mlock to keep model in memory (use_mlock)
```csharp
public bool UseMemoryLock { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Perplexity**
Compute perplexity over the prompt (perplexity)
```csharp
public bool Perplexity { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **ModelPath**
Model path (model)
```csharp
public string ModelPath { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **LoraAdapter**
lora adapter path (lora_adapter)
```csharp
public string LoraAdapter { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **LoraBase**
base model path for the lora adapter (lora_base)
```csharp
public string LoraBase { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Threads**
Number of threads (-1 = autodetect) (n_threads)
```csharp
public int Threads { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **BatchSize**
batch size for prompt processing (must be &gt;=32 to use BLAS) (n_batch)
```csharp
public int BatchSize { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **ConvertEosToNewLine**
Whether to convert eos to newline during the inference.
```csharp
public bool ConvertEosToNewLine { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **EmbeddingMode**
Whether to use embedding mode. (embedding) Note that if this is set to true,
The LLamaModel won't produce text response anymore.
```csharp
public bool EmbeddingMode { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

View File

@ -1,23 +0,0 @@
# Quantization
Quantization is significant to accelerate the model inference. Since there's little accuracy (performance) reduction when quantizing the model, get it easy to quantize it!
To quantize the model, please call `Quantize` from `LLamaQuantizer`, which is a static method.
```cs
string srcPath = "<model.bin>";
string dstPath = "<model_q4_0.bin>";
LLamaQuantizer.Quantize(srcPath, dstPath, "q4_0");
// The following overload is also okay.
// LLamaQuantizer.Quantize(srcPath, dstPath, LLamaFtype.LLAMA_FTYPE_MOSTLY_Q4_0);
```
After calling it, a quantized model file will be saved.
There're currently 5 types of quantization supported:
- q4_0
- q4_1
- q5_0
- q5_1
- q8_0

View File

@ -1,19 +0,0 @@
# Save/Load State
There're two ways to load state: loading from path and loading from bite array. Therefore, correspondingly, state data can be extracted as byte array or saved to a file.
```cs
LLamaModel model = new LLamaModel(new ModelParams("<modelPath>"));
// do some things...
model.SaveState("model.st");
var stateData = model.GetStateData();
model.Dispose();
LLamaModel model2 = new LLamaModel(new ModelParams("<modelPath>"));
model2.LoadState(stateData);
// do some things...
LLamaModel model3 = new LLamaModel(new ModelParams("<modelPath>"));
model3.LoadState("model.st");
// do some things...
```

View File

@ -1,25 +0,0 @@
# Tokenization/Detokenization
A pair of APIs to make conversion between text and tokens.
## Tokenization
The basic usage is to call `Tokenize` after initializing the model.
```cs
LLamaModel model = new LLamaModel(new ModelParams("<modelPath>"));
string text = "hello";
int[] tokens = model.Tokenize(text).ToArray();
```
Depending on different model (or vocab), the output will be various.
## Detokenization
Similar to tokenization, just pass an `IEnumerable<int>` to `Detokenize` method.
```cs
LLamaModel model = new LLamaModel(new ModelParams("<modelPath>"));
int[] tokens = new int[] {125, 2568, 13245};
string text = model.Detokenize(tokens);
```

View File

@ -1,69 +0,0 @@
## Differences between the executors
There're currently three kinds of executors provided, which are `InteractiveExecutor`, `InstructExecutor` and `StatelessExecutor`.
In a word, `InteractiveExecutor` is suitable for getting answer of your questions from LLM continuously. `InstructExecutor` let LLM execute your instructions, such as "continue writing". `StatelessExecutor` is best for one-time job because the previous inference has no impact on the current inference.
## Interactive mode & Instruct mode
Both of them are taking "completing the prompt" as the goal to generate the response. For example, if you input `Long long ago, there was a fox who wanted to make friend with humen. One day`, then the LLM will continue to write the story.
Under interactive mode, you serve a role of user and the LLM serves the role of assistant. Then it will help you with your question or request.
Under instruct mode, you give LLM some instructions and it follows.
Though the behaviors of them sounds similar, it could introduce many differences depending on your prompt. For example, "chat-with-bob" has good performance under interactive mode and `alpaca` does well with instruct mode.
```
// chat-with-bob
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
```
```
// alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
```
Therefore, please modify the prompt correspondingly when switching from one mode to the other.
## Stateful mode and Stateless mode.
Despite the differences between interactive mode and instruct mode, both of them are stateful mode. That is, your previous question/instruction will impact on the current response from LLM. On the contrary, the stateless executor does not have such a "memory". No matter how many times you talk to it, it will only concentrate on what you say in this time.
Since the stateless executor has no memory of conversations before, you need to input your question with the whole prompt into it to get the better answer.
For example, if you feed `Q: Who is Trump? A: ` to the stateless executor, it may give the following answer with the antiprompt `Q: `.
```
Donald J. Trump, born June 14, 1946, is an American businessman, television personality, politician and the 45th President of the United States (2017-2021). # Anexo:Torneo de Hamburgo 2022 (individual masculino)
## Presentación previa
* Defensor del título: Daniil Medvédev
```
It seems that things went well at first. However, after answering the question itself, LLM began to talk about some other things until the answer reached the token count limit. The reason of this strange behavior is the anti-prompt cannot be match. With the input, LLM cannot decide whether to append a string "A: " at the end of the response.
As an improvement, let's take the following text as the input:
```
Q: What is the capital of the USA? A: Washingtong. Q: What is the sum of 1 and 2? A: 3. Q: Who is Trump? A:
```
Then, I got the following answer with the anti-prompt `Q: `.
```
45th president of the United States.
```
At this time, by repeating the same mode of `Q: xxx? A: xxx.`, LLM outputs the anti-prompt we want to help to decide where to stop the generation.

View File

@ -1,261 +0,0 @@
# Inference Parameters
Different from `LLamaModel`, when using an executor, `InferenceParams` is passed to the `Infer` method instead of constructor. This is because executors only define the ways to run the model, therefore in each run, you can change the settings for this time inference.
# InferenceParams
Namespace: LLama.Common
```csharp
public class InferenceParams
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [InferenceParams]()
## Properties
### **TokensKeep**
number of tokens to keep from initial prompt
```csharp
public int TokensKeep { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **MaxTokens**
how many new tokens to predict (n_predict), set to -1 to infinitely generate response
until it complete.
```csharp
public int MaxTokens { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **LogitBias**
logit bias for specific tokens
```csharp
public Dictionary<int, float> LogitBias { get; set; }
```
#### Property Value
[Dictionary&lt;Int32, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
### **AntiPrompts**
Sequences where the model will stop generating further tokens.
```csharp
public IEnumerable<string> AntiPrompts { get; set; }
```
#### Property Value
[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
### **PathSession**
path to file for saving/loading model eval state
```csharp
public string PathSession { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **InputSuffix**
string to suffix user inputs with
```csharp
public string InputSuffix { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **InputPrefix**
string to prefix user inputs with
```csharp
public string InputPrefix { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **TopK**
0 or lower to use vocab size
```csharp
public int TopK { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **TopP**
1.0 = disabled
```csharp
public float TopP { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **TfsZ**
1.0 = disabled
```csharp
public float TfsZ { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **TypicalP**
1.0 = disabled
```csharp
public float TypicalP { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **Temperature**
1.0 = disabled
```csharp
public float Temperature { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **RepeatPenalty**
1.0 = disabled
```csharp
public float RepeatPenalty { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **RepeatLastTokensCount**
last n tokens to penalize (0 = disable penalty, -1 = context size) (repeat_last_n)
```csharp
public int RepeatLastTokensCount { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **FrequencyPenalty**
frequency penalty coefficient
0.0 = disabled
```csharp
public float FrequencyPenalty { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **PresencePenalty**
presence penalty coefficient
0.0 = disabled
```csharp
public float PresencePenalty { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **Mirostat**
Mirostat uses tokens instead of words.
algorithm described in the paper https://arxiv.org/abs/2007.14966.
0 = disabled, 1 = mirostat, 2 = mirostat 2.0
```csharp
public MiroStateType Mirostat { get; set; }
```
#### Property Value
[MiroStateType]()<br>
### **MirostatTau**
target entropy
```csharp
public float MirostatTau { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **MirostatEta**
learning rate
```csharp
public float MirostatEta { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **PenalizeNL**
consider newlines as a repeatable token (penalize_nl)
```csharp
public bool PenalizeNL { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

View File

@ -1,27 +0,0 @@
# Save/Load State of Executor
Similar to `LLamaModel`, an executor also has its state, which can be saved and loaded. **Note that in most of cases, the state of executor and the state of the model should be loaded and saved at the same time.**
To decouple the model and executor, we provide APIs to save/load state for model and executor respectively. However, during the inference, the processed information will leave footprint in `LLamaModel`'s native context. Therefore, if you just load a state from another executor but keep the model unmodified, some strange things may happen. So will loading model state only.
Is there a condition that requires to load one of them only? The answer is YES. For example, after resetting the model state, if you don't want the inference starting from the new position, leaving the executor unmodified is okay. But, anyway, this flexible usage may cause some unexpected behaviors, therefore please ensure you know what you're doing before using it in this way.
In the future version, we'll open the access for some variables inside the executor to support more flexible usages.
The APIs to load/save state of the executors is similar to that of `LLamaModel`. However, note that `StatelessExecutor` doesn't have such APIs because it's stateless itself. Besides, the output of `GetStateData` is an object of type `ExecutorBaseState`.
```cs
LLamaModel model = new LLamaModel(new ModelParams("<modelPath>"));
InteractiveExecutor executor = new InteractiveExecutor(model);
// do some things...
executor.SaveState("executor.st");
var stateData = model.GetStateData();
InteractiveExecutor executor2 = new InteractiveExecutor(model);
executor2.LoadState(stateData);
// do some things...
InteractiveExecutor executor3 = new InteractiveExecutor(model);
executor3.LoadState("executor.st");
// do some things...
```

View File

@ -1,18 +0,0 @@
# Text-to-Text APIs of the executors
All the executors implements the interface `ILLamaExecutor`, which provides two APIs to execute text-to-text tasks.
```cs
public interface ILLamaExecutor
{
public LLamaModel Model { get; }
IEnumerable<string> Infer(string text, InferenceParams? inferenceParams = null, CancellationToken token = default);
IAsyncEnumerable<string> InferAsync(string text, InferenceParams? inferenceParams = null, CancellationToken token = default);
}
```
Just pass the text to the executor with the inference parameters. For the inference parameters, please refer to [executor inference parameters doc](./parameters.md).
The output of both two APIs are **yield enumerable**. Therefore, when receiving the output, you can directly use `foreach` to take actions on each word you get by order, instead of waiting for the whole process completed.

View File

@ -1,3 +0,0 @@
# Use LLamaSharp with Chinese
It's supported now but the document is under work. Please wait for some time. Thank you for your support! :)

197
docs/QuickStart.md Normal file
View File

@ -0,0 +1,197 @@
# Quick start
## Installation
To gain high performance, LLamaSharp interacts with a native library compiled from c++, which is called `backend`. We provide backend packages for Windows, Linux and MAC with CPU, Cuda, Metal and OpenCL. You **don't** need to handle anything about c++ but just install the backend packages.
If no published backend match your device, please open an issue to let us know. If compiling c++ code is not difficult for you, you could also follow [this guide](./ContributingGuide.md) to compile a backend and run LLamaSharp with it.
1. Install [LLamaSharp](https://www.nuget.org/packages/LLamaSharp) package on NuGet:
```
PM> Install-Package LLamaSharp
```
2. Install one or more of these backends, or use self-compiled backend.
- [`LLamaSharp.Backend.Cpu`](https://www.nuget.org/packages/LLamaSharp.Backend.Cpu): Pure CPU for Windows & Linux & MAC. Metal (GPU) support for MAC.
- [`LLamaSharp.Backend.Cuda11`](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda11): CUDA11 for Windows & Linux.
- [`LLamaSharp.Backend.Cuda12`](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda12): CUDA 12 for Windows & Linux.
- [`LLamaSharp.Backend.OpenCL`](https://www.nuget.org/packages/LLamaSharp.Backend.OpenCL): OpenCL for Windows & Linux.
3. (optional) For [Microsoft semantic-kernel](https://github.com/microsoft/semantic-kernel) integration, install the [LLamaSharp.semantic-kernel](https://www.nuget.org/packages/LLamaSharp.semantic-kernel) package.
4. (optional) To enable RAG support, install the [LLamaSharp.kernel-memory](https://www.nuget.org/packages/LLamaSharp.kernel-memory) package (this package only supports `net6.0` or higher yet), which is based on [Microsoft kernel-memory](https://github.com/microsoft/kernel-memory) integration.
## Model preparation
There are two popular format of model file of LLM now, which are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. To get `GGUF` file, there are two options:
1. Search model name + 'gguf' in [Huggingface](https://huggingface.co), you will find lots of model files that have already been converted to GGUF format. Please take care of the publishing time of them because some old ones could only work with old version of LLamaSharp.
2. Convert PyTorch or Huggingface format to GGUF format yourself. Please follow the instructions of [this part of llama.cpp readme](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize) to convert them with the python scripts.
Generally, we recommend downloading models with quantization rather than fp16, because it significantly reduce the required memory size while only slightly impact on its generation quality.
## Example of LLaMA chat session
Here is a simple example to chat with bot based on LLM in LLamaSharp. Please replace the model path with yours.
![llama_demo](./media/console_demo.gif)
```cs
using LLama.Common;
using LLama;
string modelPath = @"<Your Model Path>"; // change it to your own model path.
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024, // The longest length of chat as memory.
GpuLayerCount = 5 // How many layers to offload to GPU. Please adjust it according to your GPU memory.
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InteractiveExecutor(context);
// Add chat histories as prompt to tell AI how to act.
var chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.System, "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.");
chatHistory.AddMessage(AuthorRole.User, "Hello, Bob.");
chatHistory.AddMessage(AuthorRole.Assistant, "Hello. How may I help you today?");
ChatSession session = new(executor, chatHistory);
InferenceParams inferenceParams = new InferenceParams()
{
MaxTokens = 256, // No more than 256 tokens should appear in answer. Remove it if antiprompt is enough for control.
AntiPrompts = new List<string> { "User:" } // Stop generation once antiprompts appear.
};
Console.ForegroundColor = ConsoleColor.Yellow;
Console.Write("The chat session has started.\nUser: ");
Console.ForegroundColor = ConsoleColor.Green;
string userInput = Console.ReadLine() ?? "";
while (userInput != "exit")
{
await foreach ( // Generate the response streamingly.
var text
in session.ChatAsync(
new ChatHistory.Message(AuthorRole.User, userInput),
inferenceParams))
{
Console.ForegroundColor = ConsoleColor.White;
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.Green;
userInput = Console.ReadLine() ?? "";
}
```
## Examples of chatting with LLaVA
This example shows chatting with LLaVA to ask it to describe the picture.
![llava_demo](./media/llava_demo.gif)
```cs
using System.Text.RegularExpressions;
using LLama;
using LLama.Common;
string multiModalProj = @"<Your multi-modal proj file path>";
string modelPath = @"<Your LLaVA model file path>";
string modelImage = @"<Your image path>";
const int maxTokens = 1024; // The max tokens that could be generated.
var prompt = $"{{{modelImage}}}\nUSER:\nProvide a full description of the image.\nASSISTANT:\n";
var parameters = new ModelParams(modelPath)
{
ContextSize = 4096,
Seed = 1337,
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
// Llava Init
using var clipModel = LLavaWeights.LoadFromFile(multiModalProj);
var ex = new InteractiveExecutor(context, clipModel);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The executor has been enabled. In this example, the prompt is printed, the maximum tokens is set to {0} and the context size is {1}.", maxTokens, parameters.ContextSize);
Console.WriteLine("To send an image, enter its filename in curly braces, like this {c:/image.jpg}.");
var inferenceParams = new InferenceParams() { Temperature = 0.1f, AntiPrompts = new List<string> { "\nUSER:" }, MaxTokens = maxTokens };
do
{
// Evaluate if we have images
//
var imageMatches = Regex.Matches(prompt, "{([^}]*)}").Select(m => m.Value);
var imageCount = imageMatches.Count();
var hasImages = imageCount > 0;
byte[][] imageBytes = null;
if (hasImages)
{
var imagePathsWithCurlyBraces = Regex.Matches(prompt, "{([^}]*)}").Select(m => m.Value);
var imagePaths = Regex.Matches(prompt, "{([^}]*)}").Select(m => m.Groups[1].Value);
try
{
imageBytes = imagePaths.Select(File.ReadAllBytes).ToArray();
}
catch (IOException exception)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.Write(
$"Could not load your {(imageCount == 1 ? "image" : "images")}:");
Console.Write($"{exception.Message}");
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Please try again.");
break;
}
int index = 0;
foreach (var path in imagePathsWithCurlyBraces)
{
// First image replace to tag <image, the rest of the images delete the tag
if (index++ == 0)
prompt = prompt.Replace(path, "<image>");
else
prompt = prompt.Replace(path, "");
}
Console.WriteLine();
// Initilize Images in executor
//
ex.ImagePaths = imagePaths.ToList();
}
Console.ForegroundColor = ConsoleColor.White;
await foreach (var text in ex.InferAsync(prompt, inferenceParams))
{
Console.Write(text);
}
Console.Write(" ");
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.WriteLine();
// let the user finish with exit
//
if (prompt.Equals("/exit", StringComparison.OrdinalIgnoreCase))
break;
}
while (true);
```
*For more examples, please refer to [LLamaSharp.Examples](./LLama.Examples).*

View File

@ -1,44 +0,0 @@
# Tricks for FAQ
Sometimes, your application with LLM and LLamaSharp may have strange behaviours. Before opening an issue to report the BUG, the following tricks may worth a try.
## Carefully set the anti-prompts
Anti-prompt can also be called as "Stop-keyword", which decides when to stop the response generation. Under interactive mode, the maximum tokens count is always not set, which makes the LLM generates responses infinitively. Therefore, setting anti-prompt correctly helps a lot to avoid the strange behaviours. For example, the prompt file `chat-with-bob.txt` has the following content:
```
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
```
Therefore, the anti-prompt should be set as "User:". If the last line of the prompt is removed, LLM will automatically generate a question (user) and a response (bob) for one time when running the chat session. Therefore, the antiprompt is suggested to be appended to the prompt when starting a chat session.
What if an extra line is appended? The string "User:" in the prompt will be followed with a char "\n". Thus when running the model, the automatic generation of a pair of question and response may appear because the anti-prompt is "User:" but the last token is "User:\n". As for whether it will appear, it's an undefined behaviour, which depends on the implementation inside the `LLamaExecutor`. Anyway, since it may leads to unexpected behaviors, it's recommended to trim your prompt or carefully keep consistent with your anti-prompt.
## Pay attention to the length of prompt
Sometimes we want to input a long prompt to execute a task. However, the context size may limit the inference of LLama model. Please ensure the inequality below holds.
$$ len(prompt) + len(response) < len(context) $$
In this inequality, `len(response)` refers to the expected tokens for LLM to generate.
## Try different executors with a prompt
Some prompt works well under interactive mode, such as `chat-with-bob`, some others may work well with instruct mode, such as `alpaca`. Besides, if your input is quite simple and one-time job, such as "Q: what is the satellite of the earth? A: ", stateless mode will be a good choice.
If your chat bot has bad performance, trying different executor will possibly make it work well.
## Choose models weight depending on you task
The differences between modes may lead to much different behaviours under the same task. For example, if you're building a chat bot with non-English, a fine-tuned model specially for the language you want to use will have huge effect on the performance.
## Set the layer count you want to offload to GPU
Currently, the `GpuLayerCount` parameter, which decides the number of layer loaded into GPU, is set to 20 by default. However, if you have some efficient GPUs, setting it as a larger number will attain faster inference.

View File

@ -1,10 +1,70 @@
# Transforms in Chat Session
# LLamaSharp chat session
## Basic usages of ChatSession
`ChatSession` is a higher-level abstraction than the executors. In the context of a chat application like ChatGPT, a "chat session" refers to an interactive conversation or exchange of messages between the user and the chatbot. It represents a continuous flow of communication where the user enters input or asks questions, and the chatbot responds accordingly. A chat session typically starts when the user initiates a conversation with the chatbot and continues until the interaction comes to a natural end or is explicitly terminated by either the user or the system. During a chat session, the chatbot maintains the context of the conversation, remembers previous messages, and generates appropriate responses based on the user's inputs and the ongoing dialogue.
### Initialize a session
Currently, the only parameter that is accepted is an `ILLamaExecutor`, because this is the only parameter that we're sure to exist in all the future versions. Since it's the high-level abstraction, we're conservative to the API designs. In the future, there may be more kinds of constructors added.
```cs
InteractiveExecutor ex = new(new LLamaModel(new ModelParams(modelPath)));
ChatSession session = new ChatSession(ex);
```
### Chat with the bot
There'll be two kinds of input accepted by the `Chat` API, which are `ChatHistory` and `String`. The API with string is quite similar to that of the executors. Meanwhile, the API with `ChatHistory` is aimed to provide more flexible usages. For example, you have had a chat with the bot in session A before you open the session B. Now session B has no memory for what you said before. Therefore, you can feed the history of A to B.
```cs
string prompt = "What is C#?";
await foreach (var text in session.ChatAsync(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } })) // the inference params should be changed depending on your statement
{
Console.Write(text);
}
```
### Get the history
Currently `History` is a property of `ChatSession`.
```cs
foreach(var rec in session.History.Messages)
{
Console.WriteLine($"{rec.AuthorRole}: {rec.Content}");
}
```
## Save/Load Chat Session
Generally, the chat session could be switched, which requires the ability of loading and saving session.
The API is also quite simple, the files will be saved into a directory you specified. If the path does not exist, a new directory will be created.
```cs
string savePath = "<save dir>";
session.SaveSession(savePath);
session.LoadSession(savePath, loadTransforms:true);
session.LoadSession(savePath, loadTransforms:false);
```
You could also keep the state in memory and load them with the following APIs.
```cs
var sessionState = session.GetSessionState();
session.LoadSession(sessionState, loadTransforms:true);
session.LoadSession(sessionState, loadTransforms:false);
## Transforms in Chat Session
There's three important elements in `ChatSession`, which are input, output and history. Besides, there're some conversions between them. Since the process of them under different conditions varies, LLamaSharp hands over this part of the power to the users.
Currently, there're three kinds of process that could be customized, as introduced below.
## Input transform
### Input transform
In general, the input of the chat API is a text (without stream), therefore `ChatSession` processes it in a pipeline. If you want to use your customized transform, you need to define a transform that implements `ITextTransform` and add it to the pipeline of `ChatSession`.
@ -35,7 +95,7 @@ public class MyInputTransform2 : ITextTransform
session.AddInputTransform(new MyInputTransform1()).AddInputTransform(new MyInputTransform2());
```
## Output transform
### Output transform
Different from the input, the output of chat API is a text stream. Therefore you need to process it word by word, instead of getting the full text at once.
@ -145,7 +205,7 @@ public class KeywordTextOutputStreamTransform : ITextStreamTransform
}
```
## History transform
### History transform
The chat history could be converted to or from a text, which is exactly what the interface of it.
@ -242,4 +302,4 @@ public class DefaultHistoryTransform : IHistoryTransform
return text;
}
}
```
```

349
docs/Tutorials/Executors.md Normal file
View File

@ -0,0 +1,349 @@
# LLamaSharp executors
LLamaSharp executor defines the behavior of the model when it is called. Currently, there are four kinds of executors, which are `InteractiveExecutor`, `InstructExecutor`, `StatelessExecutor` and `BatchedExecutor`.
In a word, `InteractiveExecutor` is suitable for getting answer of your questions from LLM continuously. `InstructExecutor` let LLM execute your instructions, such as "continue writing". `StatelessExecutor` is best for one-time job because the previous inference has no impact on the current inference. `BatchedExecutor` could accept multiple inputs and generate multiple outputs of different sessions at the same time, significantly improving the throughput of the program.
## Text-to-Text APIs of the executors
All the executors implements the interface `ILLamaExecutor`, which provides two APIs to execute text-to-text tasks.
```cs
public interface ILLamaExecutor
{
/// <summary>
/// The loaded context for this executor.
/// </summary>
public LLamaContext Context { get; }
// LLava Section
//
/// <summary>
/// Identify if it's a multi-modal model and there is a image to process.
/// </summary>
public bool IsMultiModal { get; }
/// <summary>
/// Muti-Modal Projections / Clip Model weights
/// </summary>
public LLavaWeights? ClipModel { get; }
/// <summary>
/// List of images: Image filename and path (jpeg images).
/// </summary>
public List<string> ImagePaths { get; set; }
/// <summary>
/// Asynchronously infers a response from the model.
/// </summary>
/// <param name="text">Your prompt</param>
/// <param name="inferenceParams">Any additional parameters</param>
/// <param name="token">A cancellation token.</param>
/// <returns></returns>
IAsyncEnumerable<string> InferAsync(string text, IInferenceParams? inferenceParams = null, CancellationToken token = default);
}
```
The output of both two APIs are **yield enumerable**. Therefore, when receiving the output, you can directly use `foreach` to take actions on each word you get by order, instead of waiting for the whole process completed.
## InteractiveExecutor & InstructExecutor
Both of them are taking "completing the prompt" as the goal to generate the response. For example, if you input `Long long ago, there was a fox who wanted to make friend with humen. One day`, then the LLM will continue to write the story.
Under interactive mode, you serve a role of user and the LLM serves the role of assistant. Then it will help you with your question or request.
Under instruct mode, you give LLM some instructions and it follows.
Though the behaviors of them sounds similar, it could introduce many differences depending on your prompt. For example, "chat-with-bob" has good performance under interactive mode and `alpaca` does well with instruct mode.
```
// chat-with-bob
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
```
```
// alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
```
Therefore, please modify the prompt correspondingly when switching from one mode to the other.
## StatelessExecutor.
Despite the differences between interactive mode and instruct mode, both of them are stateful mode. That is, your previous question/instruction will impact on the current response from LLM. On the contrary, the stateless executor does not have such a "memory". No matter how many times you talk to it, it will only concentrate on what you say in this time. It is very useful when you want a clean context, without being affected by previous inputs.
Since the stateless executor has no memory of conversations before, you need to input your question with the whole prompt into it to get the better answer.
For example, if you feed `Q: Who is Trump? A: ` to the stateless executor, it may give the following answer with the antiprompt `Q: `.
```
Donald J. Trump, born June 14, 1946, is an American businessman, television personality, politician and the 45th President of the United States (2017-2021). # Anexo:Torneo de Hamburgo 2022 (individual masculino)
## Presentación previa
* Defensor del título: Daniil Medvédev
```
It seems that things went well at first. However, after answering the question itself, LLM began to talk about some other things until the answer reached the token count limit. The reason of this strange behavior is the anti-prompt cannot be match. With the input, LLM cannot decide whether to append a string "A: " at the end of the response.
As an improvement, let's take the following text as the input:
```
Q: What is the capital of the USA? A: Washingtong. Q: What is the sum of 1 and 2? A: 3. Q: Who is Trump? A:
```
Then, I got the following answer with the anti-prompt `Q: `.
```
45th president of the United States.
```
At this time, by repeating the same mode of `Q: xxx? A: xxx.`, LLM outputs the anti-prompt we want to help to decide where to stop the generation.
## BatchedExecutor
Different from other executors, `BatchedExecutor` could accept multiple inputs from different sessions and geneate outputs for them at the same time. Here is an example to use it.
```cs
using LLama.Batched;
using LLama.Common;
using LLama.Native;
using LLama.Sampling;
using Spectre.Console;
namespace LLama.Examples.Examples;
/// <summary>
/// This demonstrates using a batch to generate two sequences and then using one
/// sequence as the negative guidance ("classifier free guidance") for the other.
/// </summary>
public class BatchedExecutorGuidance
{
private const int n_len = 32;
public static async Task Run()
{
string modelPath = UserSettings.GetModelPath();
var parameters = new ModelParams(modelPath);
using var model = LLamaWeights.LoadFromFile(parameters);
var positivePrompt = AnsiConsole.Ask("Positive Prompt (or ENTER for default):", "My favourite colour is").Trim();
var negativePrompt = AnsiConsole.Ask("Negative Prompt (or ENTER for default):", "I hate the colour red. My favourite colour is").Trim();
var weight = AnsiConsole.Ask("Guidance Weight (or ENTER for default):", 2.0f);
// Create an executor that can evaluate a batch of conversations together
using var executor = new BatchedExecutor(model, parameters);
// Print some info
var name = executor.Model.Metadata.GetValueOrDefault("general.name", "unknown model name");
Console.WriteLine($"Created executor with model: {name}");
// Load the two prompts into two conversations
using var guided = executor.Create();
guided.Prompt(positivePrompt);
using var guidance = executor.Create();
guidance.Prompt(negativePrompt);
// Run inference to evaluate prompts
await AnsiConsole
.Status()
.Spinner(Spinner.Known.Line)
.StartAsync("Evaluating Prompts...", _ => executor.Infer());
// Fork the "guided" conversation. We'll run this one without guidance for comparison
using var unguided = guided.Fork();
// Run inference loop
var unguidedSampler = new GuidedSampler(null, weight);
var unguidedDecoder = new StreamingTokenDecoder(executor.Context);
var guidedSampler = new GuidedSampler(guidance, weight);
var guidedDecoder = new StreamingTokenDecoder(executor.Context);
await AnsiConsole
.Progress()
.StartAsync(async progress =>
{
var reporter = progress.AddTask("Running Inference", maxValue: n_len);
for (var i = 0; i < n_len; i++)
{
if (i != 0)
await executor.Infer();
// Sample from the "unguided" conversation. This is just a conversation using the same prompt, without any
// guidance. This serves as a comparison to show the effect of guidance.
var u = unguidedSampler.Sample(executor.Context.NativeHandle, unguided.Sample(), Array.Empty<LLamaToken>());
unguidedDecoder.Add(u);
unguided.Prompt(u);
// Sample from the "guided" conversation. This sampler will internally use the "guidance" conversation
// to steer the conversation. See how this is done in GuidedSampler.ProcessLogits (bottom of this file).
var g = guidedSampler.Sample(executor.Context.NativeHandle, guided.Sample(), Array.Empty<LLamaToken>());
guidedDecoder.Add(g);
// Use this token to advance both guided _and_ guidance. Keeping them in sync (except for the initial prompt).
guided.Prompt(g);
guidance.Prompt(g);
// Early exit if we reach the natural end of the guided sentence
if (g == model.EndOfSentenceToken)
break;
// Update progress bar
reporter.Increment(1);
}
});
AnsiConsole.MarkupLine($"[green]Unguided:[/][white]{unguidedDecoder.Read().ReplaceLineEndings(" ")}[/]");
AnsiConsole.MarkupLine($"[green]Guided:[/][white]{guidedDecoder.Read().ReplaceLineEndings(" ")}[/]");
}
private class GuidedSampler(Conversation? guidance, float weight)
: BaseSamplingPipeline
{
public override void Accept(SafeLLamaContextHandle ctx, LLamaToken token)
{
}
public override ISamplingPipeline Clone()
{
throw new NotSupportedException();
}
protected override void ProcessLogits(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
{
if (guidance == null)
return;
// Get the logits generated by the guidance sequences
var guidanceLogits = guidance.Sample();
// Use those logits to guide this sequence
NativeApi.llama_sample_apply_guidance(ctx, logits, guidanceLogits, weight);
}
protected override LLamaToken ProcessTokenDataArray(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, ReadOnlySpan<LLamaToken> lastTokens)
{
candidates.Temperature(ctx, 0.8f);
candidates.TopK(ctx, 25);
return candidates.SampleToken(ctx);
}
}
}
```
## Inference parameters
Different from context parameters, which is indicated in [understand-llama-context](./UnderstandLLamaContext.md), executors accept parameters when you call its API to execute the inference. That means you could change the parameters every time you ask the model to generate the outputs.
Here is the parameters for LLamaSharp executors.
```cs
/// <summary>
/// The paramters used for inference.
/// </summary>
public record InferenceParams
: IInferenceParams
{
/// <summary>
/// number of tokens to keep from initial prompt
/// </summary>
public int TokensKeep { get; set; } = 0;
/// <summary>
/// how many new tokens to predict (n_predict), set to -1 to inifinitely generate response
/// until it complete.
/// </summary>
public int MaxTokens { get; set; } = -1;
/// <summary>
/// logit bias for specific tokens
/// </summary>
public Dictionary<LLamaToken, float>? LogitBias { get; set; } = null;
/// <summary>
/// Sequences where the model will stop generating further tokens.
/// </summary>
public IReadOnlyList<string> AntiPrompts { get; set; } = Array.Empty<string>();
/// <inheritdoc />
public int TopK { get; set; } = 40;
/// <inheritdoc />
public float TopP { get; set; } = 0.95f;
/// <inheritdoc />
public float MinP { get; set; } = 0.05f;
/// <inheritdoc />
public float TfsZ { get; set; } = 1.0f;
/// <inheritdoc />
public float TypicalP { get; set; } = 1.0f;
/// <inheritdoc />
public float Temperature { get; set; } = 0.8f;
/// <inheritdoc />
public float RepeatPenalty { get; set; } = 1.1f;
/// <inheritdoc />
public int RepeatLastTokensCount { get; set; } = 64;
/// <inheritdoc />
public float FrequencyPenalty { get; set; } = .0f;
/// <inheritdoc />
public float PresencePenalty { get; set; } = .0f;
/// <inheritdoc />
public MirostatType Mirostat { get; set; } = MirostatType.Disable;
/// <inheritdoc />
public float MirostatTau { get; set; } = 5.0f;
/// <inheritdoc />
public float MirostatEta { get; set; } = 0.1f;
/// <inheritdoc />
public bool PenalizeNL { get; set; } = true;
/// <inheritdoc />
public SafeLLamaGrammarHandle? Grammar { get; set; }
/// <inheritdoc />
public ISamplingPipeline? SamplingPipeline { get; set; }
}
```
## Save and load executor state
An executor also has its state, which can be saved and loaded. That means a lot when you want to support restore a previous session for the user in your application.
The following code shows how to use save and load executor state.
```cs
InteractiveExecutor executor = new InteractiveExecutor(model);
// do some things...
executor.SaveState("executor.st");
var stateData = executor.GetStateData();
InteractiveExecutor executor2 = new InteractiveExecutor(model);
executor2.LoadState(stateData);
// do some things...
InteractiveExecutor executor3 = new InteractiveExecutor(model);
executor3.LoadState("executor.st");
// do some things...
```

View File

@ -1,4 +1,4 @@
# Get Embeddings
# Get embeddings
Getting the embeddings of a text in LLM is sometimes useful, for example, to train other MLP models.

View File

@ -0,0 +1,42 @@
# Customize the native library loading
As indicated in [Architecture](../Architecture.md), LLamaSharp uses the native library to run the LLM models. Sometimes you may want to compile the native library yourself, or just dynamically load the library due to the environment of your user of your application. Luckily, since version 0.7.0, dynamic loading of native library has been supported! That allows you to customize the native library loading process.
## When you should compile the native library yourself
Before introducing the way to customize native library loading, please follow the tips below to see if you need to compile the native library yourself, rather than use the published backend packages, which contain native library files for multiple targets.
1. Your device/environment has not been supported by any published backend packages. For example, vulkan has not been supported yet. In this case, it will mean a lot to open an issue to tell us you are using it. Since our support for new backend will have a delay, you could compile yourself before that.
2. You want to gain the best performance of LLamaSharp. Because LLamaSharp offloads the model to both GPU and CPU, the performance is significantly related with CPU if your GPU memory size is small. AVX ([Advanced Vector Extensions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)) and BLAS ([Basic Linear Algebra Subprograms](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms)) are the most important ways to accelerate the CPU computation. By default, LLamaSharp disables the support for BLAS and use AVX2 for CUDA backend yet. If you would like to enable BLAS or use AVX 512 along with CUDA, please compile the native library youself, following the [instructions here](../ContributingGuide.md).
3. You want to debug the c++ code.
## Use NativeLibraryConfig
We provide `LLama.Native.NativeLibraryConfig` class with singleton mode to allow users to customize the loading process of the native library. Any method of it should be called before the model loading, because a native library file must be decided before any model is loaded.
### Load specified native library file
All you need to do is adding the following code to the very beginning of your code.
```cs
NativeLibraryConfig.Instance.WithLibrary("<Your native library path>");
```
### Automatically select one from multiple native library files
Let's consider this case: you don't know your user's device when distributing your application, so you put all the possible native libraries in a folder and want to select the best one depending on the user's device. LLamaSharp allows you to define the strategy to do it.
- `NativeLibraryConfig.Instance.WithCuda`: decide if you want to use cuda if possible.
- `NativeLibraryConfig.Instance.WithAvx`: decide the highest AVX level you want to use if possible.
- `NativeLibraryConfig.Instance.WithSearchDirectory`: specify the directory to search the native library files.
- `NativeLibraryConfig.Instance.WithAutoFallback`: whether to allow fall back to other options if no native library that matches your specified settings could be found.
### Set the log level of native library loading
```cs
NativeLibraryConfig.Instance.WithLogs();
```
There are four log levels, which are error, warning, info and debug. If you are not sure if the correct library is selected, please set log level to `info` to see the full logs.

View File

@ -0,0 +1,54 @@
# Quantization
Quantization is significant to accelerate the model inference. Since there's little accuracy (performance) reduction when quantizing the model, get it easy to quantize it!
To quantize the model, please call `Quantize` from `LLamaQuantizer`, which is a static method.
```cs
string srcPath = "<model.bin>";
string dstPath = "<model_q4_0.bin>";
LLamaQuantizer.Quantize(srcPath, dstPath, "q4_0");
// The following overload is also okay.
// LLamaQuantizer.Quantize(srcPath, dstPath, LLamaFtype.LLAMA_FTYPE_MOSTLY_Q4_0);
```
After calling it, a quantized model file will be saved.
There're currently the following types of quantization supported:
```cpp
{ "Q4_0", LLAMA_FTYPE_MOSTLY_Q4_0, " 3.56G, +0.2166 ppl @ LLaMA-v1-7B", },
{ "Q4_1", LLAMA_FTYPE_MOSTLY_Q4_1, " 3.90G, +0.1585 ppl @ LLaMA-v1-7B", },
{ "Q5_0", LLAMA_FTYPE_MOSTLY_Q5_0, " 4.33G, +0.0683 ppl @ LLaMA-v1-7B", },
{ "Q5_1", LLAMA_FTYPE_MOSTLY_Q5_1, " 4.70G, +0.0349 ppl @ LLaMA-v1-7B", },
{ "IQ2_XXS",LLAMA_FTYPE_MOSTLY_IQ2_XXS," 2.06 bpw quantization", },
{ "IQ2_XS", LLAMA_FTYPE_MOSTLY_IQ2_XS, " 2.31 bpw quantization", },
{ "IQ2_S", LLAMA_FTYPE_MOSTLY_IQ2_S, " 2.5 bpw quantization", },
{ "IQ2_M", LLAMA_FTYPE_MOSTLY_IQ2_M, " 2.7 bpw quantization", },
{ "IQ1_S", LLAMA_FTYPE_MOSTLY_IQ1_S, " 1.56 bpw quantization", },
{ "IQ1_M", LLAMA_FTYPE_MOSTLY_IQ1_M, " 1.75 bpw quantization", },
{ "Q2_K", LLAMA_FTYPE_MOSTLY_Q2_K, " 2.63G, +0.6717 ppl @ LLaMA-v1-7B", },
{ "Q2_K_S", LLAMA_FTYPE_MOSTLY_Q2_K_S, " 2.16G, +9.0634 ppl @ LLaMA-v1-7B", },
{ "IQ3_XXS",LLAMA_FTYPE_MOSTLY_IQ3_XXS," 3.06 bpw quantization", },
{ "IQ3_S", LLAMA_FTYPE_MOSTLY_IQ3_S, " 3.44 bpw quantization", },
{ "IQ3_M", LLAMA_FTYPE_MOSTLY_IQ3_M, " 3.66 bpw quantization mix", },
{ "Q3_K", LLAMA_FTYPE_MOSTLY_Q3_K_M, "alias for Q3_K_M" },
{ "IQ3_XS", LLAMA_FTYPE_MOSTLY_IQ3_XS, " 3.3 bpw quantization" , },
{ "Q3_K_S", LLAMA_FTYPE_MOSTLY_Q3_K_S, " 2.75G, +0.5551 ppl @ LLaMA-v1-7B", },
{ "Q3_K_M", LLAMA_FTYPE_MOSTLY_Q3_K_M, " 3.07G, +0.2496 ppl @ LLaMA-v1-7B", },
{ "Q3_K_L", LLAMA_FTYPE_MOSTLY_Q3_K_L, " 3.35G, +0.1764 ppl @ LLaMA-v1-7B", },
{ "IQ4_NL", LLAMA_FTYPE_MOSTLY_IQ4_NL, " 4.50 bpw non-linear quantization", },
{ "IQ4_XS", LLAMA_FTYPE_MOSTLY_IQ4_XS, " 4.25 bpw non-linear quantization", },
{ "Q4_K", LLAMA_FTYPE_MOSTLY_Q4_K_M, "alias for Q4_K_M", },
{ "Q4_K_S", LLAMA_FTYPE_MOSTLY_Q4_K_S, " 3.59G, +0.0992 ppl @ LLaMA-v1-7B", },
{ "Q4_K_M", LLAMA_FTYPE_MOSTLY_Q4_K_M, " 3.80G, +0.0532 ppl @ LLaMA-v1-7B", },
{ "Q5_K", LLAMA_FTYPE_MOSTLY_Q5_K_M, "alias for Q5_K_M", },
{ "Q5_K_S", LLAMA_FTYPE_MOSTLY_Q5_K_S, " 4.33G, +0.0400 ppl @ LLaMA-v1-7B", },
{ "Q5_K_M", LLAMA_FTYPE_MOSTLY_Q5_K_M, " 4.45G, +0.0122 ppl @ LLaMA-v1-7B", },
{ "Q6_K", LLAMA_FTYPE_MOSTLY_Q6_K, " 5.15G, +0.0008 ppl @ LLaMA-v1-7B", },
{ "Q8_0", LLAMA_FTYPE_MOSTLY_Q8_0, " 6.70G, +0.0004 ppl @ LLaMA-v1-7B", },
{ "F16", LLAMA_FTYPE_MOSTLY_F16, "13.00G @ 7B", },
{ "F32", LLAMA_FTYPE_ALL_F32, "26.00G @ 7B", },
// Note: Ensure COPY comes after F32 to avoid ftype 0 from matching.
{ "COPY", LLAMA_FTYPE_ALL_F32, "only copy tensors, no quantizing", },
```

View File

@ -0,0 +1,122 @@
# Understand LLamaSharp context
`LLamaContext` is the most important component as a link between native APIs and higher-level APIs. It contains the basic settings for model inference and holds the kv-cache, which could significantly accelerate the model inference. Since `LLamaContext` is not coupled with `LLamaWeights`, it's possible to create multiple context based on one piece of model weight. Each `ILLamaExecutor` will hold a `LLamaContext` instance, but it's possible to switch to different context in an executor.
If your application has multiple sessions, please take care of managing `LLamaContext`.
`LLamaContext` takes the following parameters as its settings. Note that the parameters could not be changed once the context has been created.
```cs
public interface IContextParams
{
/// <summary>
/// Model context size (n_ctx)
/// </summary>
uint? ContextSize { get; }
/// <summary>
/// batch size for prompt processing (must be >=32 to use BLAS) (n_batch)
/// </summary>
uint BatchSize { get; }
/// <summary>
/// Seed for the random number generator (seed)
/// </summary>
uint Seed { get; }
/// <summary>
/// Whether to use embedding mode. (embedding) Note that if this is set to true,
/// The LLamaModel won't produce text response anymore.
/// </summary>
bool EmbeddingMode { get; }
/// <summary>
/// RoPE base frequency (null to fetch from the model)
/// </summary>
float? RopeFrequencyBase { get; }
/// <summary>
/// RoPE frequency scaling factor (null to fetch from the model)
/// </summary>
float? RopeFrequencyScale { get; }
/// <summary>
/// The encoding to use for models
/// </summary>
Encoding Encoding { get; }
/// <summary>
/// Number of threads (null = autodetect) (n_threads)
/// </summary>
uint? Threads { get; }
/// <summary>
/// Number of threads to use for batch processing (null = autodetect) (n_threads)
/// </summary>
uint? BatchThreads { get; }
/// <summary>
/// YaRN extrapolation mix factor (null = from model)
/// </summary>
float? YarnExtrapolationFactor { get; }
/// <summary>
/// YaRN magnitude scaling factor (null = from model)
/// </summary>
float? YarnAttentionFactor { get; }
/// <summary>
/// YaRN low correction dim (null = from model)
/// </summary>
float? YarnBetaFast { get; }
/// <summary>
/// YaRN high correction dim (null = from model)
/// </summary>
float? YarnBetaSlow { get; }
/// <summary>
/// YaRN original context length (null = from model)
/// </summary>
uint? YarnOriginalContext { get; }
/// <summary>
/// YaRN scaling method to use.
/// </summary>
RopeScalingType? YarnScalingType { get; }
/// <summary>
/// Override the type of the K cache
/// </summary>
GGMLType? TypeK { get; }
/// <summary>
/// Override the type of the V cache
/// </summary>
GGMLType? TypeV { get; }
/// <summary>
/// Whether to disable offloading the KQV cache to the GPU
/// </summary>
bool NoKqvOffload { get; }
/// <summary>
/// defragment the KV cache if holes/size &gt; defrag_threshold, Set to &lt; 0 to disable (default)
/// </summary>
float DefragThreshold { get; }
/// <summary>
/// Whether to pool (sum) embedding results by sequence id (ignored if no pooling layer)
/// </summary>
bool DoPooling { get; }
}
```
`LLamaContext` has its state, which could be saved and loaded.
```cs
LLamaContext.SaveState(string filename)
LLamaContext.GetState()
```

View File

@ -2,27 +2,30 @@
![logo](./media/LLamaSharpLogo.png)
LLamaSharp is the C#/.NET binding of [llama.cpp](https://github.com/ggerganov/llama.cpp). It provides APIs to inference the LLaMa Models and deploy it on native environment or Web. It could help C# developers to deploy the LLM (Large Language Model) locally and integrate with C# apps.
## Main features
- Model inference
- Model quantization
- Generating embeddings
- Grammar parse
- Interactive/Instruct/Stateless executor mode
- Chat session APIs
- Save/load the state
- Integration with other applications like BotSharp and semantic-kernel
LLamaSharp is a cross-platform library to run 🦙LLaMA/LLaVA model (and others) in local device. Based on [llama.cpp](https://github.com/ggerganov/llama.cpp), inference with LLamaSharp is efficient on both CPU and GPU. With the higher-level APIs and RAG support, it's convenient to deploy LLM (Large Language Model) in your application with LLamaSharp.
## Essential insights for novice learners
If you are new to LLM, here're some tips for you to help you to get start with `LLamaSharp`. If you are experienced in this field, we'd still recommend you to take a few minutes to read it because some things perform differently compared to cpp/python.
1. The main ability of LLamaSharp is to provide an efficient way to run inference of LLM (Large Language Model) locally (and fine-tune model in the future). The model weights, however, need to be downloaded from other resources such as [huggingface](https://huggingface.co).
2. Since LLamaSharp supports multiple platforms, The nuget package is split into `LLamaSharp` and `LLama.Backend`. After installing `LLamaSharp`, please install one of `LLama.Backend.Cpu`, `LLama.Backend.Cuda11` or `LLama.Backend.Cuda12`. If you use the source code, dynamic libraries can be found in `LLama/Runtimes`.
3. `LLaMa` originally refers to the weights released by Meta (Facebook Research). After that, many models are fine-tuned based on it, such as `Vicuna`, `GPT4All`, and `Pyglion`. Though all of these models are supported by LLamaSharp, some steps are necessary with different file formats. There're mainly three kinds of files, which are `.pth`, `.bin (ggml)`, `.bin (quantized)`. If you have the `.bin (quantized)` file, it could be used directly by LLamaSharp. If you have the `.bin (ggml)` file, you could use it directly but get higher inference speed after the quantization. If you have the `.pth` file, you need to follow [the instructions in llama.cpp](https://github.com/ggerganov/llama.cpp#prepare-data--run) to convert it to `.bin (ggml)` file at first.
4. LLamaSharp supports GPU acceleration, but it requires cuda installation. Please install cuda 11 or cuda 12 on your system before using LLamaSharp to enable GPU. If you have another cuda version, you could compile llama.cpp from source to get the dll. For building from source, please refer to [issue #5](https://github.com/SciSharp/LLamaSharp/issues/5).
1. The main ability of LLamaSharp is to provide an efficient way to run inference of LLM on your device (and fine-tune model in the future). The model weights, however, need to be downloaded from other resources such as [huggingface](https://huggingface.co).
2. To gain high performance, LLamaSharp interacts with a native library compiled from c++, which is called `backend`. We provide backend packages for Windows, Linux and MAC with CPU, Cuda, Metal and OpenCL. You **don't** need to handle anything about c++ but just install the backend packages. If no published backend match your device, please open an issue to let us know. If compiling c++ code is not difficult for you, you could also follow [this guide]() to compile a backend and run LLamaSharp with it.
3. `LLaMA` originally refers to the weights released by Meta (Facebook Research). After that, many models are fine-tuned based on it, such as `Vicuna`, `GPT4All`, and `Pyglion`. There are two popular file format of these model now, which are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. There are two options for you to get GGUF format file. a) Search model name + 'gguf' in [Huggingface](https://huggingface.co), you will find lots of model files that have already been converted to GGUF format. Please take care of the publishing time of them because some old ones could only work with old version of LLamaSharp. b) Convert PyTorch or Huggingface format to GGUF format yourself. Please follow the instructions of [this part of llama.cpp readme](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize) to convert them with the python scripts.
4. LLamaSharp supports multi-modal, which means that the model could take both text and image as input. Note that there are two model files requied for using multi-modal (LLaVA), which are main model and mm-proj model. Here is a huggingface repo which shows that: [link](https://huggingface.co/ShadowBeast/llava-v1.6-mistral-7b-Q5_K_S-GGUF/tree/main).
## Integrations
There are integarions for the following libraries, which help to expand the application of LLamaSharp. Integrations for semantic-kernel and kernel-memory are developed in LLamaSharp repository, while others are developed in their own repositories.
- [semantic-kernel](https://github.com/microsoft/semantic-kernel): an SDK that integrates LLM like OpenAI, Azure OpenAI, and Hugging Face.
- [kernel-memory](https://github.com/microsoft/kernel-memory): a multi-modal AI Service specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines, with support for RAG ([Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation)), synthetic memory, prompt engineering, and custom semantic memory processing.
- [BotSharp](https://github.com/SciSharp/BotSharp): an open source machine learning framework for AI Bot platform builder.
- [Langchain](https://github.com/tryAGI/LangChain): a framework for developing applications powered by language models.
![LLamaShrp-Integrations](./media/LLamaSharp-Integrations.png)
## Welcome to join the development!
@ -32,6 +35,6 @@ Community effort is always one of the most important things in open-source proje
2. Open an PR if you've fixed something. Even if just correcting a typo, it also makes great sense.
3. Help to optimize the documentation.
4. Write an example or blog about how to integrate LLamaSharp with your APPs.
5. Ask for a missed feature and discuss with other developers.
5. Ask for a missing feature and discuss with us.
If you'd like to get deeply involved in development, please touch us in discord channel or send email to `AsakusaRinne@gmail.com`. :)
If you'd like to get deeply involved in development, please touch us in discord channel or send email to `AsakusaRinne@gmail.com`. 🤗

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

BIN
docs/media/console_demo.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 672 KiB

BIN
docs/media/llava_demo.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.0 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 158 KiB

After

Width:  |  Height:  |  Size: 258 KiB

Binary file not shown.

View File

@ -0,0 +1,20 @@
import os
dir = 'Examples'
if __name__ == '__main__':
res = []
# loop all the files of `dir`
for root, dirs, files in os.walk(dir):
for file in files:
with open(os.path.join(root, file), 'r', encoding='utf-8') as f:
first_line = f.readline()
title = first_line.split('#')[-1]
filename = file.split('/')[-1].split('\\')[-1]
res.append(f'- {title.strip()}: {dir}/{filename}')
for item in res:
print(item)

View File

@ -2,6 +2,8 @@
## LLama
[AntipromptProcessor](./llama.antipromptprocessor.md)
[ChatSession](./llama.chatsession.md)
[InstructExecutor](./llama.instructexecutor.md)
@ -18,26 +20,66 @@
[LLamaWeights](./llama.llamaweights.md)
[LLavaWeights](./llama.llavaweights.md)
[SessionState](./llama.sessionstate.md)
[StatefulExecutorBase](./llama.statefulexecutorbase.md)
[StatelessExecutor](./llama.statelessexecutor.md)
[Utils](./llama.utils.md)
[StreamingTokenDecoder](./llama.streamingtokendecoder.md)
## LLama.Abstractions
[AdapterCollection](./llama.abstractions.adaptercollection.md)
[IContextParams](./llama.abstractions.icontextparams.md)
[IHistoryTransform](./llama.abstractions.ihistorytransform.md)
[IInferenceParams](./llama.abstractions.iinferenceparams.md)
[ILLamaExecutor](./llama.abstractions.illamaexecutor.md)
[ILLamaParams](./llama.abstractions.illamaparams.md)
[IModelParams](./llama.abstractions.imodelparams.md)
[ITextStreamTransform](./llama.abstractions.itextstreamtransform.md)
[ITextTransform](./llama.abstractions.itexttransform.md)
[LoraAdapter](./llama.abstractions.loraadapter.md)
[MetadataOverride](./llama.abstractions.metadataoverride.md)
[MetadataOverrideConverter](./llama.abstractions.metadataoverrideconverter.md)
[TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)
[TensorSplitsCollectionConverter](./llama.abstractions.tensorsplitscollectionconverter.md)
## LLama.Batched
[AlreadyPromptedConversationException](./llama.batched.alreadypromptedconversationexception.md)
[BatchedExecutor](./llama.batched.batchedexecutor.md)
[CannotForkWhileRequiresInferenceException](./llama.batched.cannotforkwhilerequiresinferenceexception.md)
[CannotModifyWhileRequiresInferenceException](./llama.batched.cannotmodifywhilerequiresinferenceexception.md)
[CannotSampleRequiresInferenceException](./llama.batched.cannotsamplerequiresinferenceexception.md)
[CannotSampleRequiresPromptException](./llama.batched.cannotsamplerequirespromptexception.md)
[Conversation](./llama.batched.conversation.md)
[ConversationExtensions](./llama.batched.conversationextensions.md)
[ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md)
## LLama.Common
[AuthorRole](./llama.common.authorrole.md)
@ -46,12 +88,8 @@
[FixedSizeQueue&lt;T&gt;](./llama.common.fixedsizequeue-1.md)
[ILLamaLogger](./llama.common.illamalogger.md)
[InferenceParams](./llama.common.inferenceparams.md)
[LLamaDefaultLogger](./llama.common.llamadefaultlogger.md)
[MirostatType](./llama.common.mirostattype.md)
[ModelParams](./llama.common.modelparams.md)
@ -78,13 +116,17 @@
[GrammarUnknownEscapeCharacter](./llama.exceptions.grammarunknownescapecharacter.md)
[LLamaDecodeError](./llama.exceptions.llamadecodeerror.md)
[LoadWeightsFailedException](./llama.exceptions.loadweightsfailedexception.md)
[RuntimeError](./llama.exceptions.runtimeerror.md)
## LLama.Extensions
[IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)
[IContextParamsExtensions](./llama.extensions.icontextparamsextensions.md)
[KeyValuePairExtensions](./llama.extensions.keyvaluepairextensions.md)
[IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)
## LLama.Grammars
@ -94,6 +136,20 @@
## LLama.Native
[DecodeResult](./llama.native.decoderesult.md)
[GGMLType](./llama.native.ggmltype.md)
[GPUSplitMode](./llama.native.gpusplitmode.md)
[LLamaBatch](./llama.native.llamabatch.md)
[LLamaBeamsState](./llama.native.llamabeamsstate.md)
[LLamaBeamView](./llama.native.llamabeamview.md)
[LLamaChatMessage](./llama.native.llamachatmessage.md)
[LLamaContextParams](./llama.native.llamacontextparams.md)
[LLamaFtype](./llama.native.llamaftype.md)
@ -102,16 +158,52 @@
[LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)
[LLamaKvCacheView](./llama.native.llamakvcacheview.md)
[LLamaKvCacheViewCell](./llama.native.llamakvcacheviewcell.md)
[LLamaKvCacheViewSafeHandle](./llama.native.llamakvcacheviewsafehandle.md)
[LLamaLogLevel](./llama.native.llamaloglevel.md)
[LLamaModelKvOverrideType](./llama.native.llamamodelkvoverridetype.md)
[LLamaModelMetadataOverride](./llama.native.llamamodelmetadataoverride.md)
[LLamaModelParams](./llama.native.llamamodelparams.md)
[LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)
[LLamaNativeBatch](./llama.native.llamanativebatch.md)
[LLamaPoolingType](./llama.native.llamapoolingtype.md)
[LLamaPos](./llama.native.llamapos.md)
[LLamaRopeType](./llama.native.llamaropetype.md)
[LLamaSeqId](./llama.native.llamaseqid.md)
[LLamaToken](./llama.native.llamatoken.md)
[LLamaTokenData](./llama.native.llamatokendata.md)
[LLamaTokenDataArray](./llama.native.llamatokendataarray.md)
[LLamaTokenDataArrayNative](./llama.native.llamatokendataarraynative.md)
[LLamaTokenType](./llama.native.llamatokentype.md)
[LLamaVocabType](./llama.native.llamavocabtype.md)
[LLavaImageEmbed](./llama.native.llavaimageembed.md)
[NativeApi](./llama.native.nativeapi.md)
[NativeLibraryConfig](./llama.native.nativelibraryconfig.md)
[RopeScalingType](./llama.native.ropescalingtype.md)
[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)
@ -120,48 +212,22 @@
[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
[SamplingApi](./llama.native.samplingapi.md)
[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)
## LLama.OldVersion
[SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)
[ChatCompletion](./llama.oldversion.chatcompletion.md)
## LLama.Sampling
[ChatCompletionChoice](./llama.oldversion.chatcompletionchoice.md)
[BaseSamplingPipeline](./llama.sampling.basesamplingpipeline.md)
[ChatCompletionChunk](./llama.oldversion.chatcompletionchunk.md)
[DefaultSamplingPipeline](./llama.sampling.defaultsamplingpipeline.md)
[ChatCompletionChunkChoice](./llama.oldversion.chatcompletionchunkchoice.md)
[GreedySamplingPipeline](./llama.sampling.greedysamplingpipeline.md)
[ChatCompletionChunkDelta](./llama.oldversion.chatcompletionchunkdelta.md)
[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)
[ChatCompletionMessage](./llama.oldversion.chatcompletionmessage.md)
[ISamplingPipelineExtensions](./llama.sampling.isamplingpipelineextensions.md)
[ChatMessageRecord](./llama.oldversion.chatmessagerecord.md)
[Mirostate2SamplingPipeline](./llama.sampling.mirostate2samplingpipeline.md)
[ChatRole](./llama.oldversion.chatrole.md)
[ChatSession&lt;T&gt;](./llama.oldversion.chatsession-1.md)
[Completion](./llama.oldversion.completion.md)
[CompletionChoice](./llama.oldversion.completionchoice.md)
[CompletionChunk](./llama.oldversion.completionchunk.md)
[CompletionLogprobs](./llama.oldversion.completionlogprobs.md)
[CompletionUsage](./llama.oldversion.completionusage.md)
[Embedding](./llama.oldversion.embedding.md)
[EmbeddingData](./llama.oldversion.embeddingdata.md)
[EmbeddingUsage](./llama.oldversion.embeddingusage.md)
[IChatModel](./llama.oldversion.ichatmodel.md)
[LLamaEmbedder](./llama.oldversion.llamaembedder.md)
[LLamaModel](./llama.oldversion.llamamodel.md)
[LLamaParams](./llama.oldversion.llamaparams.md)
[MirostateSamplingPipeline](./llama.sampling.mirostatesamplingpipeline.md)

View File

@ -0,0 +1,92 @@
# AdapterCollection
Namespace: LLama.Abstractions
A list of LoraAdapter objects
```csharp
public sealed class AdapterCollection : System.Collections.Generic.List`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.Generic.IList`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.Generic.ICollection`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.Generic.IEnumerable`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.IEnumerable, System.Collections.IList, System.Collections.ICollection, System.Collections.Generic.IReadOnlyList`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.Generic.IReadOnlyCollection`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.IEquatable`1[[LLama.Abstractions.AdapterCollection, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [List&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1) → [AdapterCollection](./llama.abstractions.adaptercollection.md)<br>
Implements [IList&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ilist-1), [ICollection&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.icollection-1), [IEnumerable&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1), [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable), [IList](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ilist), [ICollection](https://docs.microsoft.com/en-us/dotnet/api/system.collections.icollection), [IReadOnlyList&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1), [IReadOnlyCollection&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlycollection-1), [IEquatable&lt;AdapterCollection&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
## Properties
### **Capacity**
```csharp
public int Capacity { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **Count**
```csharp
public int Count { get; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **Item**
```csharp
public LoraAdapter Item { get; set; }
```
#### Property Value
[LoraAdapter](./llama.abstractions.loraadapter.md)<br>
## Constructors
### **AdapterCollection()**
```csharp
public AdapterCollection()
```
## Methods
### **Equals(AdapterCollection)**
```csharp
public bool Equals(AdapterCollection other)
```
#### Parameters
`other` [AdapterCollection](./llama.abstractions.adaptercollection.md)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Equals(Object)**
```csharp
public bool Equals(object obj)
```
#### Parameters
`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **GetHashCode()**
```csharp
public int GetHashCode()
```
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>

View File

@ -0,0 +1,252 @@
# IContextParams
Namespace: LLama.Abstractions
The parameters for initializing a LLama context from a model.
```csharp
public interface IContextParams
```
## Properties
### **ContextSize**
Model context size (n_ctx)
```csharp
public abstract Nullable<uint> ContextSize { get; }
```
#### Property Value
[Nullable&lt;UInt32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **BatchSize**
batch size for prompt processing (must be &gt;=32 to use BLAS) (n_batch)
```csharp
public abstract uint BatchSize { get; }
```
#### Property Value
[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
### **Seed**
Seed for the random number generator (seed)
```csharp
public abstract uint Seed { get; }
```
#### Property Value
[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
### **EmbeddingMode**
Whether to use embedding mode. (embedding) Note that if this is set to true,
The LLamaModel won't produce text response anymore.
```csharp
public abstract bool EmbeddingMode { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **RopeFrequencyBase**
RoPE base frequency (null to fetch from the model)
```csharp
public abstract Nullable<float> RopeFrequencyBase { get; }
```
#### Property Value
[Nullable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **RopeFrequencyScale**
RoPE frequency scaling factor (null to fetch from the model)
```csharp
public abstract Nullable<float> RopeFrequencyScale { get; }
```
#### Property Value
[Nullable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **Encoding**
The encoding to use for models
```csharp
public abstract Encoding Encoding { get; }
```
#### Property Value
[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
### **Threads**
Number of threads (null = autodetect) (n_threads)
```csharp
public abstract Nullable<uint> Threads { get; }
```
#### Property Value
[Nullable&lt;UInt32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **BatchThreads**
Number of threads to use for batch processing (null = autodetect) (n_threads)
```csharp
public abstract Nullable<uint> BatchThreads { get; }
```
#### Property Value
[Nullable&lt;UInt32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **YarnExtrapolationFactor**
YaRN extrapolation mix factor (null = from model)
```csharp
public abstract Nullable<float> YarnExtrapolationFactor { get; }
```
#### Property Value
[Nullable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **YarnAttentionFactor**
YaRN magnitude scaling factor (null = from model)
```csharp
public abstract Nullable<float> YarnAttentionFactor { get; }
```
#### Property Value
[Nullable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **YarnBetaFast**
YaRN low correction dim (null = from model)
```csharp
public abstract Nullable<float> YarnBetaFast { get; }
```
#### Property Value
[Nullable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **YarnBetaSlow**
YaRN high correction dim (null = from model)
```csharp
public abstract Nullable<float> YarnBetaSlow { get; }
```
#### Property Value
[Nullable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **YarnOriginalContext**
YaRN original context length (null = from model)
```csharp
public abstract Nullable<uint> YarnOriginalContext { get; }
```
#### Property Value
[Nullable&lt;UInt32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **YarnScalingType**
YaRN scaling method to use.
```csharp
public abstract Nullable<RopeScalingType> YarnScalingType { get; }
```
#### Property Value
[Nullable&lt;RopeScalingType&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **TypeK**
Override the type of the K cache
```csharp
public abstract Nullable<GGMLType> TypeK { get; }
```
#### Property Value
[Nullable&lt;GGMLType&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **TypeV**
Override the type of the V cache
```csharp
public abstract Nullable<GGMLType> TypeV { get; }
```
#### Property Value
[Nullable&lt;GGMLType&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
### **NoKqvOffload**
Whether to disable offloading the KQV cache to the GPU
```csharp
public abstract bool NoKqvOffload { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **DefragThreshold**
defragment the KV cache if holes/size &gt; defrag_threshold, Set to &lt; 0 to disable (default)
```csharp
public abstract float DefragThreshold { get; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **DoPooling**
Whether to pool (sum) embedding results by sequence id (ignored if no pooling layer)
```csharp
public abstract bool DoPooling { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>

View File

@ -47,3 +47,15 @@ The chat history as plain text.
[ChatHistory](./llama.common.chathistory.md)<br>
The updated history.
### **Clone()**
Copy the transform.
```csharp
IHistoryTransform Clone()
```
#### Returns
[IHistoryTransform](./llama.abstractions.ihistorytransform.md)<br>

View File

@ -40,60 +40,24 @@ public abstract int MaxTokens { get; set; }
logit bias for specific tokens
```csharp
public abstract Dictionary<int, float> LogitBias { get; set; }
public abstract Dictionary<LLamaToken, float> LogitBias { get; set; }
```
#### Property Value
[Dictionary&lt;Int32, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
[Dictionary&lt;LLamaToken, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
### **AntiPrompts**
Sequences where the model will stop generating further tokens.
```csharp
public abstract IEnumerable<string> AntiPrompts { get; set; }
public abstract IReadOnlyList<string> AntiPrompts { get; set; }
```
#### Property Value
[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
### **PathSession**
path to file for saving/loading model eval state
```csharp
public abstract string PathSession { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **InputSuffix**
string to suffix user inputs with
```csharp
public abstract string InputSuffix { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **InputPrefix**
string to prefix user inputs with
```csharp
public abstract string InputPrefix { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
[IReadOnlyList&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
### **TopK**
@ -119,6 +83,18 @@ public abstract float TopP { get; set; }
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **MinP**
0.0 = disabled
```csharp
public abstract float MinP { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **TfsZ**
1.0 = disabled
@ -266,3 +242,15 @@ public abstract SafeLLamaGrammarHandle Grammar { get; set; }
#### Property Value
[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
### **SamplingPipeline**
Set a custom sampling pipeline to use. If this is set All other sampling parameters are ignored!
```csharp
public abstract ISamplingPipeline SamplingPipeline { get; set; }
```
#### Property Value
[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>

View File

@ -22,30 +22,43 @@ public abstract LLamaContext Context { get; }
[LLamaContext](./llama.llamacontext.md)<br>
## Methods
### **IsMultiModal**
### **Infer(String, IInferenceParams, CancellationToken)**
Infers a response from the model.
Identify if it's a multi-modal model and there is a image to process.
```csharp
IEnumerable<string> Infer(string text, IInferenceParams inferenceParams, CancellationToken token)
public abstract bool IsMultiModal { get; }
```
#### Parameters
#### Property Value
`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
Your prompt
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
Any additional parameters
### **ClipModel**
`token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
A cancellation token.
Muti-Modal Projections / Clip Model weights
#### Returns
```csharp
public abstract LLavaWeights ClipModel { get; }
```
[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
#### Property Value
[LLavaWeights](./llama.llavaweights.md)<br>
### **ImagePaths**
List of images: Image filename and path (jpeg images).
```csharp
public abstract List<string> ImagePaths { get; set; }
```
#### Property Value
[List&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
## Methods
### **InferAsync(String, IInferenceParams, CancellationToken)**

View File

@ -0,0 +1,15 @@
# ILLamaParams
Namespace: LLama.Abstractions
Convenience interface for implementing both type of parameters.
```csharp
public interface ILLamaParams : IModelParams, IContextParams
```
Implements [IModelParams](./llama.abstractions.imodelparams.md), [IContextParams](./llama.abstractions.icontextparams.md)
**Remarks:**
Mostly exists for backwards compatibility reasons, when these two were not split.

View File

@ -10,21 +10,10 @@ public interface IModelParams
## Properties
### **ContextSize**
Model context size (n_ctx)
```csharp
public abstract int ContextSize { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **MainGpu**
the GPU that is used for scratch and small tensors
main_gpu interpretation depends on split_mode:
NoneThe GPU that is used for the entire mode.RowThe GPU that is used for small tensors and intermediate results.LayerIgnored.
```csharp
public abstract int MainGpu { get; set; }
@ -34,60 +23,36 @@ public abstract int MainGpu { get; set; }
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **LowVram**
### **SplitMode**
if true, reduce VRAM usage at the cost of performance
How to split the model across multiple GPUs
```csharp
public abstract bool LowVram { get; set; }
public abstract GPUSplitMode SplitMode { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
[GPUSplitMode](./llama.native.gpusplitmode.md)<br>
### **GpuLayerCount**
Number of layers to run in VRAM / GPU memory (n_gpu_layers)
```csharp
public abstract int GpuLayerCount { get; set; }
public abstract int GpuLayerCount { get; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **Seed**
Seed for the random number generator (seed)
```csharp
public abstract int Seed { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **UseFp16Memory**
Use f16 instead of f32 for memory kv (memory_f16)
```csharp
public abstract bool UseFp16Memory { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **UseMemorymap**
Use mmap for faster loads (use_mmap)
```csharp
public abstract bool UseMemorymap { get; set; }
public abstract bool UseMemorymap { get; }
```
#### Property Value
@ -99,19 +64,7 @@ public abstract bool UseMemorymap { get; set; }
Use mlock to keep model in memory (use_mlock)
```csharp
public abstract bool UseMemoryLock { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Perplexity**
Compute perplexity over the prompt (perplexity)
```csharp
public abstract bool Perplexity { get; set; }
public abstract bool UseMemoryLock { get; }
```
#### Property Value
@ -123,154 +76,69 @@ public abstract bool Perplexity { get; set; }
Model path (model)
```csharp
public abstract string ModelPath { get; set; }
public abstract string ModelPath { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **ModelAlias**
model alias
```csharp
public abstract string ModelAlias { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **LoraAdapter**
lora adapter path (lora_adapter)
```csharp
public abstract string LoraAdapter { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **LoraBase**
base model path for the lora adapter (lora_base)
```csharp
public abstract string LoraBase { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Threads**
Number of threads (-1 = autodetect) (n_threads)
```csharp
public abstract int Threads { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **BatchSize**
batch size for prompt processing (must be &gt;=32 to use BLAS) (n_batch)
```csharp
public abstract int BatchSize { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **ConvertEosToNewLine**
Whether to convert eos to newline during the inference.
```csharp
public abstract bool ConvertEosToNewLine { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **EmbeddingMode**
Whether to use embedding mode. (embedding) Note that if this is set to true,
The LLamaModel won't produce text response anymore.
```csharp
public abstract bool EmbeddingMode { get; set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **TensorSplits**
how split tensors should be distributed across GPUs
```csharp
public abstract Single[] TensorSplits { get; set; }
public abstract TensorSplitsCollection TensorSplits { get; }
```
#### Property Value
[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
[TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)<br>
### **RopeFrequencyBase**
### **VocabOnly**
RoPE base frequency
Load vocab only (no weights)
```csharp
public abstract float RopeFrequencyBase { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **RopeFrequencyScale**
RoPE frequency scaling factor
```csharp
public abstract float RopeFrequencyScale { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **MulMatQ**
Use experimental mul_mat_q kernels
```csharp
public abstract bool MulMatQ { get; set; }
public abstract bool VocabOnly { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Encoding**
### **LoraAdapters**
The encoding to use for models
List of LoRA adapters to apply
```csharp
public abstract Encoding Encoding { get; set; }
public abstract AdapterCollection LoraAdapters { get; }
```
#### Property Value
[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
[AdapterCollection](./llama.abstractions.adaptercollection.md)<br>
### **LoraBase**
base model path for the lora adapter (lora_base)
```csharp
public abstract string LoraBase { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **MetadataOverrides**
Override specific metadata items in the model
```csharp
public abstract List<MetadataOverride> MetadataOverrides { get; }
```
#### Property Value
[List&lt;MetadataOverride&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>

View File

@ -10,22 +10,6 @@ public interface ITextStreamTransform
## Methods
### **Transform(IEnumerable&lt;String&gt;)**
Takes a stream of tokens and transforms them, returning a new stream of tokens.
```csharp
IEnumerable<string> Transform(IEnumerable<string> tokens)
```
#### Parameters
`tokens` [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
#### Returns
[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
### **TransformAsync(IAsyncEnumerable&lt;String&gt;)**
Takes a stream of tokens and transforms them, returning a new stream of tokens asynchronously.
@ -41,3 +25,15 @@ IAsyncEnumerable<string> TransformAsync(IAsyncEnumerable<string> tokens)
#### Returns
[IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
### **Clone()**
Copy the transform.
```csharp
ITextStreamTransform Clone()
```
#### Returns
[ITextStreamTransform](./llama.abstractions.itextstreamtransform.md)<br>

View File

@ -31,3 +31,15 @@ string Transform(string text)
#### Returns
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Clone()**
Copy the transform.
```csharp
ITextTransform Clone()
```
#### Returns
[ITextTransform](./llama.abstractions.itexttransform.md)<br>

View File

@ -0,0 +1,118 @@
# LoraAdapter
Namespace: LLama.Abstractions
A LoRA adapter to apply to a model
```csharp
public struct LoraAdapter
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LoraAdapter](./llama.abstractions.loraadapter.md)<br>
Implements [IEquatable&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
## Properties
### **Path**
Path to the LoRA file
```csharp
public string Path { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Scale**
Strength of this LoRA
```csharp
public float Scale { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
## Constructors
### **LoraAdapter(String, Single)**
A LoRA adapter to apply to a model
```csharp
LoraAdapter(string Path, float Scale)
```
#### Parameters
`Path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
Path to the LoRA file
`Scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
Strength of this LoRA
## Methods
### **ToString()**
```csharp
string ToString()
```
#### Returns
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **GetHashCode()**
```csharp
int GetHashCode()
```
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **Equals(Object)**
```csharp
bool Equals(object obj)
```
#### Parameters
`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Equals(LoraAdapter)**
```csharp
bool Equals(LoraAdapter other)
```
#### Parameters
`other` [LoraAdapter](./llama.abstractions.loraadapter.md)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Deconstruct(String&, Single&)**
```csharp
void Deconstruct(String& Path, Single& Scale)
```
#### Parameters
`Path` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
`Scale` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>

View File

@ -0,0 +1,150 @@
# MetadataOverride
Namespace: LLama.Abstractions
An override for a single key/value pair in model metadata
```csharp
public sealed class MetadataOverride : System.IEquatable`1[[LLama.Abstractions.MetadataOverride, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [MetadataOverride](./llama.abstractions.metadataoverride.md)<br>
Implements [IEquatable&lt;MetadataOverride&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
## Properties
### **Key**
Get the key being overriden by this override
```csharp
public string Key { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
## Constructors
### **MetadataOverride(String, Int32)**
Create a new override for an int key
```csharp
public MetadataOverride(string key, int value)
```
#### Parameters
`key` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
`value` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **MetadataOverride(String, Single)**
Create a new override for a float key
```csharp
public MetadataOverride(string key, float value)
```
#### Parameters
`key` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
`value` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **MetadataOverride(String, Boolean)**
Create a new override for a boolean key
```csharp
public MetadataOverride(string key, bool value)
```
#### Parameters
`key` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
`value` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
## Methods
### **WriteValue(LLamaModelMetadataOverride&)**
```csharp
internal void WriteValue(LLamaModelMetadataOverride& dest)
```
#### Parameters
`dest` [LLamaModelMetadataOverride&](./llama.native.llamamodelmetadataoverride&.md)<br>
### **WriteValue(Utf8JsonWriter)**
```csharp
internal void WriteValue(Utf8JsonWriter writer)
```
#### Parameters
`writer` Utf8JsonWriter<br>
### **ToString()**
```csharp
public string ToString()
```
#### Returns
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **GetHashCode()**
```csharp
public int GetHashCode()
```
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **Equals(Object)**
```csharp
public bool Equals(object obj)
```
#### Parameters
`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Equals(MetadataOverride)**
```csharp
public bool Equals(MetadataOverride other)
```
#### Parameters
`other` [MetadataOverride](./llama.abstractions.metadataoverride.md)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **&lt;Clone&gt;$()**
```csharp
public MetadataOverride <Clone>$()
```
#### Returns
[MetadataOverride](./llama.abstractions.metadataoverride.md)<br>

View File

@ -0,0 +1,65 @@
# MetadataOverrideConverter
Namespace: LLama.Abstractions
A JSON converter for [MetadataOverride](./llama.abstractions.metadataoverride.md)
```csharp
public class MetadataOverrideConverter : System.Text.Json.Serialization.JsonConverter`1[[LLama.Abstractions.MetadataOverride, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → JsonConverter → JsonConverter&lt;MetadataOverride&gt; → [MetadataOverrideConverter](./llama.abstractions.metadataoverrideconverter.md)
## Properties
### **HandleNull**
```csharp
public bool HandleNull { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
## Constructors
### **MetadataOverrideConverter()**
```csharp
public MetadataOverrideConverter()
```
## Methods
### **Read(Utf8JsonReader&, Type, JsonSerializerOptions)**
```csharp
public MetadataOverride Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
```
#### Parameters
`reader` Utf8JsonReader&<br>
`typeToConvert` [Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
`options` JsonSerializerOptions<br>
#### Returns
[MetadataOverride](./llama.abstractions.metadataoverride.md)<br>
### **Write(Utf8JsonWriter, MetadataOverride, JsonSerializerOptions)**
```csharp
public void Write(Utf8JsonWriter writer, MetadataOverride value, JsonSerializerOptions options)
```
#### Parameters
`writer` Utf8JsonWriter<br>
`value` [MetadataOverride](./llama.abstractions.metadataoverride.md)<br>
`options` JsonSerializerOptions<br>

View File

@ -0,0 +1,92 @@
# TensorSplitsCollection
Namespace: LLama.Abstractions
A fixed size array to set the tensor splits across multiple GPUs
```csharp
public sealed class TensorSplitsCollection : System.Collections.Generic.IEnumerable`1[[System.Single, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Collections.IEnumerable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)<br>
Implements [IEnumerable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1), [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable)
## Properties
### **Length**
The size of this array
```csharp
public int Length { get; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **Item**
```csharp
public float Item { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
## Constructors
### **TensorSplitsCollection(Single[])**
Create a new tensor splits collection, copying the given values
```csharp
public TensorSplitsCollection(Single[] splits)
```
#### Parameters
`splits` [Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
#### Exceptions
[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
### **TensorSplitsCollection()**
Create a new tensor splits collection with all values initialised to the default
```csharp
public TensorSplitsCollection()
```
## Methods
### **Clear()**
Set all values to zero
```csharp
public void Clear()
```
### **Pin()**
```csharp
internal MemoryHandle Pin()
```
#### Returns
[MemoryHandle](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memoryhandle)<br>
### **GetEnumerator()**
```csharp
public IEnumerator<float> GetEnumerator()
```
#### Returns
[IEnumerator&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerator-1)<br>

View File

@ -0,0 +1,65 @@
# TensorSplitsCollectionConverter
Namespace: LLama.Abstractions
A JSON converter for [TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)
```csharp
public class TensorSplitsCollectionConverter : System.Text.Json.Serialization.JsonConverter`1[[LLama.Abstractions.TensorSplitsCollection, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → JsonConverter → JsonConverter&lt;TensorSplitsCollection&gt; → [TensorSplitsCollectionConverter](./llama.abstractions.tensorsplitscollectionconverter.md)
## Properties
### **HandleNull**
```csharp
public bool HandleNull { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
## Constructors
### **TensorSplitsCollectionConverter()**
```csharp
public TensorSplitsCollectionConverter()
```
## Methods
### **Read(Utf8JsonReader&, Type, JsonSerializerOptions)**
```csharp
public TensorSplitsCollection Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
```
#### Parameters
`reader` Utf8JsonReader&<br>
`typeToConvert` [Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
`options` JsonSerializerOptions<br>
#### Returns
[TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)<br>
### **Write(Utf8JsonWriter, TensorSplitsCollection, JsonSerializerOptions)**
```csharp
public void Write(Utf8JsonWriter writer, TensorSplitsCollection value, JsonSerializerOptions options)
```
#### Parameters
`writer` Utf8JsonWriter<br>
`value` [TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)<br>
`options` JsonSerializerOptions<br>

View File

@ -0,0 +1,69 @@
# AntipromptProcessor
Namespace: LLama
AntipromptProcessor keeps track of past tokens looking for any set Anti-Prompts
```csharp
public sealed class AntipromptProcessor
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [AntipromptProcessor](./llama.antipromptprocessor.md)
## Constructors
### **AntipromptProcessor(IEnumerable&lt;String&gt;)**
Initializes a new instance of the [AntipromptProcessor](./llama.antipromptprocessor.md) class.
```csharp
public AntipromptProcessor(IEnumerable<string> antiprompts)
```
#### Parameters
`antiprompts` [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
The antiprompts.
## Methods
### **AddAntiprompt(String)**
Add an antiprompt to the collection
```csharp
public void AddAntiprompt(string antiprompt)
```
#### Parameters
`antiprompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **SetAntiprompts(IEnumerable&lt;String&gt;)**
Overwrite all current antiprompts with a new set
```csharp
public void SetAntiprompts(IEnumerable<string> antiprompts)
```
#### Parameters
`antiprompts` [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
### **Add(String)**
Add some text and check if the buffer now ends with any antiprompt
```csharp
public bool Add(string text)
```
#### Parameters
`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
true if the text buffer ends with any antiprompt

View File

@ -0,0 +1,96 @@
# AlreadyPromptedConversationException
Namespace: LLama.Batched
This exception is thrown when "Prompt()" is called on a [Conversation](./llama.batched.conversation.md) which has
already been prompted and before "Infer()" has been called on the associated
[BatchedExecutor](./llama.batched.batchedexecutor.md).
```csharp
public class AlreadyPromptedConversationException : ExperimentalBatchedExecutorException, System.Runtime.Serialization.ISerializable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md) → [AlreadyPromptedConversationException](./llama.batched.alreadypromptedconversationexception.md)<br>
Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
## Properties
### **TargetSite**
```csharp
public MethodBase TargetSite { get; }
```
#### Property Value
[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
### **Message**
```csharp
public string Message { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Data**
```csharp
public IDictionary Data { get; }
```
#### Property Value
[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
### **InnerException**
```csharp
public Exception InnerException { get; }
```
#### Property Value
[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
### **HelpLink**
```csharp
public string HelpLink { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Source**
```csharp
public string Source { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **HResult**
```csharp
public int HResult { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **StackTrace**
```csharp
public string StackTrace { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

View File

@ -0,0 +1,151 @@
# BatchedExecutor
Namespace: LLama.Batched
A batched executor that can infer multiple separate "conversations" simultaneously.
```csharp
public sealed class BatchedExecutor : System.IDisposable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [BatchedExecutor](./llama.batched.batchedexecutor.md)<br>
Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
## Properties
### **Context**
The [LLamaContext](./llama.llamacontext.md) this executor is using
```csharp
public LLamaContext Context { get; }
```
#### Property Value
[LLamaContext](./llama.llamacontext.md)<br>
### **Model**
The [LLamaWeights](./llama.llamaweights.md) this executor is using
```csharp
public LLamaWeights Model { get; }
```
#### Property Value
[LLamaWeights](./llama.llamaweights.md)<br>
### **BatchedTokenCount**
Get the number of tokens in the batch, waiting for [BatchedExecutor.Infer(CancellationToken)](./llama.batched.batchedexecutor.md#infercancellationtoken) to be called
```csharp
public int BatchedTokenCount { get; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **IsDisposed**
Check if this executor has been disposed.
```csharp
public bool IsDisposed { get; private set; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
## Constructors
### **BatchedExecutor(LLamaWeights, IContextParams)**
Create a new batched executor
```csharp
public BatchedExecutor(LLamaWeights model, IContextParams contextParams)
```
#### Parameters
`model` [LLamaWeights](./llama.llamaweights.md)<br>
The model to use
`contextParams` [IContextParams](./llama.abstractions.icontextparams.md)<br>
Parameters to create a new context
## Methods
### **Prompt(String)**
#### Caution
Use BatchedExecutor.Create instead
---
Start a new [Conversation](./llama.batched.conversation.md) with the given prompt
```csharp
public Conversation Prompt(string prompt)
```
#### Parameters
`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[Conversation](./llama.batched.conversation.md)<br>
### **Create()**
Start a new [Conversation](./llama.batched.conversation.md)
```csharp
public Conversation Create()
```
#### Returns
[Conversation](./llama.batched.conversation.md)<br>
### **Infer(CancellationToken)**
Run inference for all conversations in the batch which have pending tokens.
If the result is `NoKvSlot` then there is not enough memory for inference, try disposing some conversation
threads and running inference again.
```csharp
public Task<DecodeResult> Infer(CancellationToken cancellation)
```
#### Parameters
`cancellation` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
#### Returns
[Task&lt;DecodeResult&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
### **Dispose()**
```csharp
public void Dispose()
```
### **GetNextSequenceId()**
```csharp
internal LLamaSeqId GetNextSequenceId()
```
#### Returns
[LLamaSeqId](./llama.native.llamaseqid.md)<br>

View File

@ -0,0 +1,94 @@
# CannotForkWhileRequiresInferenceException
Namespace: LLama.Batched
This exception is thrown when [Conversation.Fork()](./llama.batched.conversation.md#fork) is called when [Conversation.RequiresInference](./llama.batched.conversation.md#requiresinference) = true
```csharp
public class CannotForkWhileRequiresInferenceException : ExperimentalBatchedExecutorException, System.Runtime.Serialization.ISerializable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md) → [CannotForkWhileRequiresInferenceException](./llama.batched.cannotforkwhilerequiresinferenceexception.md)<br>
Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
## Properties
### **TargetSite**
```csharp
public MethodBase TargetSite { get; }
```
#### Property Value
[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
### **Message**
```csharp
public string Message { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Data**
```csharp
public IDictionary Data { get; }
```
#### Property Value
[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
### **InnerException**
```csharp
public Exception InnerException { get; }
```
#### Property Value
[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
### **HelpLink**
```csharp
public string HelpLink { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Source**
```csharp
public string Source { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **HResult**
```csharp
public int HResult { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **StackTrace**
```csharp
public string StackTrace { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

View File

@ -0,0 +1,94 @@
# CannotModifyWhileRequiresInferenceException
Namespace: LLama.Batched
This exception is thrown when [Conversation.Modify(ModifyKvCache)](./llama.batched.conversation.md#modifymodifykvcache) is called when [Conversation.RequiresInference](./llama.batched.conversation.md#requiresinference) = true
```csharp
public class CannotModifyWhileRequiresInferenceException : ExperimentalBatchedExecutorException, System.Runtime.Serialization.ISerializable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md) → [CannotModifyWhileRequiresInferenceException](./llama.batched.cannotmodifywhilerequiresinferenceexception.md)<br>
Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
## Properties
### **TargetSite**
```csharp
public MethodBase TargetSite { get; }
```
#### Property Value
[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
### **Message**
```csharp
public string Message { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Data**
```csharp
public IDictionary Data { get; }
```
#### Property Value
[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
### **InnerException**
```csharp
public Exception InnerException { get; }
```
#### Property Value
[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
### **HelpLink**
```csharp
public string HelpLink { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Source**
```csharp
public string Source { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **HResult**
```csharp
public int HResult { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **StackTrace**
```csharp
public string StackTrace { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

View File

@ -0,0 +1,96 @@
# CannotSampleRequiresInferenceException
Namespace: LLama.Batched
This exception is thrown when "Sample()" is called on a [Conversation](./llama.batched.conversation.md) which has
already been prompted and before "Infer()" has been called on the associated
[BatchedExecutor](./llama.batched.batchedexecutor.md).
```csharp
public class CannotSampleRequiresInferenceException : ExperimentalBatchedExecutorException, System.Runtime.Serialization.ISerializable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md) → [CannotSampleRequiresInferenceException](./llama.batched.cannotsamplerequiresinferenceexception.md)<br>
Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
## Properties
### **TargetSite**
```csharp
public MethodBase TargetSite { get; }
```
#### Property Value
[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
### **Message**
```csharp
public string Message { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Data**
```csharp
public IDictionary Data { get; }
```
#### Property Value
[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
### **InnerException**
```csharp
public Exception InnerException { get; }
```
#### Property Value
[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
### **HelpLink**
```csharp
public string HelpLink { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Source**
```csharp
public string Source { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **HResult**
```csharp
public int HResult { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **StackTrace**
```csharp
public string StackTrace { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

View File

@ -0,0 +1,96 @@
# CannotSampleRequiresPromptException
Namespace: LLama.Batched
This exception is thrown when "Sample()" is called on a [Conversation](./llama.batched.conversation.md) which was not
first prompted.
[BatchedExecutor](./llama.batched.batchedexecutor.md).
```csharp
public class CannotSampleRequiresPromptException : ExperimentalBatchedExecutorException, System.Runtime.Serialization.ISerializable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md) → [CannotSampleRequiresPromptException](./llama.batched.cannotsamplerequirespromptexception.md)<br>
Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
## Properties
### **TargetSite**
```csharp
public MethodBase TargetSite { get; }
```
#### Property Value
[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
### **Message**
```csharp
public string Message { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Data**
```csharp
public IDictionary Data { get; }
```
#### Property Value
[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
### **InnerException**
```csharp
public Exception InnerException { get; }
```
#### Property Value
[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
### **HelpLink**
```csharp
public string HelpLink { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Source**
```csharp
public string Source { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **HResult**
```csharp
public int HResult { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **StackTrace**
```csharp
public string StackTrace { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

View File

@ -0,0 +1,233 @@
# Conversation
Namespace: LLama.Batched
A single conversation thread that can be prompted (adding tokens from the user) or inferred (extracting a token from the LLM)
```csharp
public sealed class Conversation : System.IDisposable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Conversation](./llama.batched.conversation.md)<br>
Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
## Properties
### **Executor**
The executor which this conversation belongs to
```csharp
public BatchedExecutor Executor { get; }
```
#### Property Value
[BatchedExecutor](./llama.batched.batchedexecutor.md)<br>
### **ConversationId**
Unique ID for this conversation
```csharp
public LLamaSeqId ConversationId { get; }
```
#### Property Value
[LLamaSeqId](./llama.native.llamaseqid.md)<br>
### **TokenCount**
Total number of tokens in this conversation, cannot exceed the context length.
```csharp
public int TokenCount { get; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **IsDisposed**
Indicates if this conversation has been disposed, nothing can be done with a disposed conversation
```csharp
public bool IsDisposed { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **RequiresInference**
Indicates if this conversation is waiting for inference to be run on the executor. "Prompt" and "Sample" cannot be called when this is true.
```csharp
public bool RequiresInference { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **RequiresSampling**
Indicates that this conversation should be sampled.
```csharp
public bool RequiresSampling { get; }
```
#### Property Value
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
## Methods
### **Finalize()**
Finalizer for Conversation
```csharp
protected void Finalize()
```
### **Dispose()**
End this conversation, freeing all resources used by it
```csharp
public void Dispose()
```
#### Exceptions
[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
### **Fork()**
Create a copy of the current conversation
```csharp
public Conversation Fork()
```
#### Returns
[Conversation](./llama.batched.conversation.md)<br>
#### Exceptions
[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
**Remarks:**
The copy shares internal state, so consumes very little extra memory.
### **Sample()**
Get the logits from this conversation, ready for sampling
```csharp
public Span<float> Sample()
```
#### Returns
[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
#### Exceptions
[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
[CannotSampleRequiresPromptException](./llama.batched.cannotsamplerequirespromptexception.md)<br>
Thrown if this conversation was not prompted before the previous call to infer
[CannotSampleRequiresInferenceException](./llama.batched.cannotsamplerequiresinferenceexception.md)<br>
Thrown if Infer() must be called on the executor
### **Prompt(String)**
Add tokens to this conversation
```csharp
public void Prompt(string input)
```
#### Parameters
`input` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Prompt(List&lt;LLamaToken&gt;)**
Add tokens to this conversation
```csharp
public void Prompt(List<LLamaToken> tokens)
```
#### Parameters
`tokens` [List&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
#### Exceptions
[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
[AlreadyPromptedConversationException](./llama.batched.alreadypromptedconversationexception.md)<br>
### **Prompt(ReadOnlySpan&lt;LLamaToken&gt;)**
Add tokens to this conversation
```csharp
public void Prompt(ReadOnlySpan<LLamaToken> tokens)
```
#### Parameters
`tokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
#### Exceptions
[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
[AlreadyPromptedConversationException](./llama.batched.alreadypromptedconversationexception.md)<br>
### **Prompt(LLamaToken)**
Add a single token to this conversation
```csharp
public void Prompt(LLamaToken token)
```
#### Parameters
`token` [LLamaToken](./llama.native.llamatoken.md)<br>
#### Exceptions
[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
[AlreadyPromptedConversationException](./llama.batched.alreadypromptedconversationexception.md)<br>
### **Modify(ModifyKvCache)**
Directly modify the KV cache of this conversation
```csharp
public void Modify(ModifyKvCache modifier)
```
#### Parameters
`modifier` [ModifyKvCache](./llama.batched.conversation.modifykvcache.md)<br>
#### Exceptions
[CannotModifyWhileRequiresInferenceException](./llama.batched.cannotmodifywhilerequiresinferenceexception.md)<br>
Thrown if this method is called while [Conversation.RequiresInference](./llama.batched.conversation.md#requiresinference) == true

View File

@ -0,0 +1,55 @@
# ConversationExtensions
Namespace: LLama.Batched
Extension method for [Conversation](./llama.batched.conversation.md)
```csharp
public static class ConversationExtensions
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ConversationExtensions](./llama.batched.conversationextensions.md)
## Methods
### **Rewind(Conversation, Int32)**
Rewind a [Conversation](./llama.batched.conversation.md) back to an earlier state by removing tokens from the end
```csharp
public static void Rewind(Conversation conversation, int tokens)
```
#### Parameters
`conversation` [Conversation](./llama.batched.conversation.md)<br>
The conversation to rewind
`tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
The number of tokens to rewind
#### Exceptions
[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
Thrown if `tokens` parameter is larger than TokenCount
### **ShiftLeft(Conversation, Int32, Int32)**
Shift all tokens over to the left, removing "count" tokens from the start and shifting everything over.
Leaves "keep" tokens at the start completely untouched. This can be used to free up space when the context
gets full, keeping the prompt at the start intact.
```csharp
public static void ShiftLeft(Conversation conversation, int count, int keep)
```
#### Parameters
`conversation` [Conversation](./llama.batched.conversation.md)<br>
The conversation to rewind
`count` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
How much to shift tokens over by
`keep` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
The number of tokens at the start which should not be shifted

View File

@ -0,0 +1,94 @@
# ExperimentalBatchedExecutorException
Namespace: LLama.Batched
Base class for exceptions thrown from [BatchedExecutor](./llama.batched.batchedexecutor.md)
```csharp
public class ExperimentalBatchedExecutorException : System.Exception, System.Runtime.Serialization.ISerializable
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md)<br>
Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
## Properties
### **TargetSite**
```csharp
public MethodBase TargetSite { get; }
```
#### Property Value
[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
### **Message**
```csharp
public string Message { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Data**
```csharp
public IDictionary Data { get; }
```
#### Property Value
[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
### **InnerException**
```csharp
public Exception InnerException { get; }
```
#### Property Value
[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
### **HelpLink**
```csharp
public string HelpLink { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **Source**
```csharp
public string Source { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **HResult**
```csharp
public int HResult { get; set; }
```
#### Property Value
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **StackTrace**
```csharp
public string StackTrace { get; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>

View File

@ -1,12 +1,6 @@
# ChatSession&lt;T&gt;
Namespace: LLama.OldVersion
#### Caution
The entire LLama.OldVersion namespace will be removed
---
Namespace: LLama
```csharp
public class ChatSession<T>
@ -16,7 +10,7 @@ public class ChatSession<T>
`T`<br>
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatSession&lt;T&gt;](./llama.oldversion.chatsession-1.md)
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatSession&lt;T&gt;](./llama.chatsession-1.md)
## Constructors
@ -32,10 +26,10 @@ public ChatSession(T model)
## Methods
### **Chat(String, String, String)**
### **Chat(String, String)**
```csharp
public IEnumerable<string> Chat(string text, string prompt, string encoding)
public IEnumerable<string> Chat(string text, string prompt)
```
#### Parameters
@ -44,48 +38,40 @@ public IEnumerable<string> Chat(string text, string prompt, string encoding)
`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
### **WithPrompt(String, String)**
### **WithPrompt(String)**
```csharp
public ChatSession<T> WithPrompt(string prompt, string encoding)
public ChatSession<T> WithPrompt(string prompt)
```
#### Parameters
`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[ChatSession&lt;T&gt;](./llama.oldversion.chatsession-1.md)<br>
[ChatSession&lt;T&gt;](./llama.chatsession-1.md)<br>
### **WithPromptFile(String, String)**
### **WithPromptFile(String)**
```csharp
public ChatSession<T> WithPromptFile(string promptFilename, string encoding)
public ChatSession<T> WithPromptFile(string promptFilename)
```
#### Parameters
`promptFilename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[ChatSession&lt;T&gt;](./llama.oldversion.chatsession-1.md)<br>
[ChatSession&lt;T&gt;](./llama.chatsession-1.md)<br>
### **WithAntiprompt(String[])**
Set the keywords to split the return value of chat AI.
```csharp
public ChatSession<T> WithAntiprompt(String[] antiprompt)
```
@ -96,4 +82,4 @@ public ChatSession<T> WithAntiprompt(String[] antiprompt)
#### Returns
[ChatSession&lt;T&gt;](./llama.oldversion.chatsession-1.md)<br>
[ChatSession&lt;T&gt;](./llama.chatsession-1.md)<br>

View File

@ -20,6 +20,54 @@ The output transform used in this session.
public ITextStreamTransform OutputTransform;
```
### **MODEL_STATE_FILENAME**
The filename for the serialized model state (KV cache, etc).
```csharp
public static string MODEL_STATE_FILENAME;
```
### **EXECUTOR_STATE_FILENAME**
The filename for the serialized executor state.
```csharp
public static string EXECUTOR_STATE_FILENAME;
```
### **HISTORY_STATE_FILENAME**
The filename for the serialized chat history.
```csharp
public static string HISTORY_STATE_FILENAME;
```
### **INPUT_TRANSFORM_FILENAME**
The filename for the serialized input transform pipeline.
```csharp
public static string INPUT_TRANSFORM_FILENAME;
```
### **OUTPUT_TRANSFORM_FILENAME**
The filename for the serialized output transform.
```csharp
public static string OUTPUT_TRANSFORM_FILENAME;
```
### **HISTORY_TRANSFORM_FILENAME**
The filename for the serialized history transform.
```csharp
public static string HISTORY_TRANSFORM_FILENAME;
```
## Properties
### **Executor**
@ -27,7 +75,7 @@ public ITextStreamTransform OutputTransform;
The executor for this session.
```csharp
public ILLamaExecutor Executor { get; }
public ILLamaExecutor Executor { get; private set; }
```
#### Property Value
@ -39,7 +87,7 @@ public ILLamaExecutor Executor { get; }
The chat history for this session.
```csharp
public ChatHistory History { get; }
public ChatHistory History { get; private set; }
```
#### Property Value
@ -74,7 +122,7 @@ public List<ITextTransform> InputTransformPipeline { get; set; }
### **ChatSession(ILLamaExecutor)**
Create a new chat session.
```csharp
public ChatSession(ILLamaExecutor executor)
@ -85,8 +133,42 @@ public ChatSession(ILLamaExecutor executor)
`executor` [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)<br>
The executor for this session
### **ChatSession(ILLamaExecutor, ChatHistory)**
Create a new chat session with a custom history.
```csharp
public ChatSession(ILLamaExecutor executor, ChatHistory history)
```
#### Parameters
`executor` [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)<br>
`history` [ChatHistory](./llama.common.chathistory.md)<br>
## Methods
### **InitializeSessionFromHistoryAsync(ILLamaExecutor, ChatHistory)**
Create a new chat session and preprocess history.
```csharp
public static Task<ChatSession> InitializeSessionFromHistoryAsync(ILLamaExecutor executor, ChatHistory history)
```
#### Parameters
`executor` [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)<br>
The executor for this session
`history` [ChatHistory](./llama.common.chathistory.md)<br>
History for this session
#### Returns
[Task&lt;ChatSession&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
### **WithHistoryTransform(IHistoryTransform)**
Use a custom history transform.
@ -137,7 +219,7 @@ public ChatSession WithOutputTransform(ITextStreamTransform transform)
### **SaveSession(String)**
Save a session from a directory.
```csharp
public void SaveSession(string path)
@ -146,53 +228,280 @@ public void SaveSession(string path)
#### Parameters
`path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
The directory name to save the session. If the directory does not exist, a new directory will be created.
### **LoadSession(String)**
#### Exceptions
[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
### **GetSessionState()**
Get the session state.
```csharp
public void LoadSession(string path)
public SessionState GetSessionState()
```
#### Returns
[SessionState](./llama.sessionstate.md)<br>
SessionState object representing session state in-memory
### **LoadSession(SessionState, Boolean)**
Load a session from a session state.
```csharp
public void LoadSession(SessionState state, bool loadTransforms)
```
#### Parameters
`state` [SessionState](./llama.sessionstate.md)<br>
`loadTransforms` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
If true loads transforms saved in the session state.
#### Exceptions
[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
### **LoadSession(String, Boolean)**
Load a session from a directory.
```csharp
public void LoadSession(string path, bool loadTransforms)
```
#### Parameters
`path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
The directory name to load the session.
### **Chat(ChatHistory, IInferenceParams, CancellationToken)**
`loadTransforms` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
If true loads transforms saved in the session state.
Get the response from the LLama model with chat histories.
#### Exceptions
[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
### **AddMessage(Message)**
Add a message to the chat history.
```csharp
public IEnumerable<string> Chat(ChatHistory history, IInferenceParams inferenceParams, CancellationToken cancellationToken)
public ChatSession AddMessage(Message message)
```
#### Parameters
`message` [Message](./llama.common.chathistory.message.md)<br>
#### Returns
[ChatSession](./llama.chatsession.md)<br>
### **AddSystemMessage(String)**
Add a system message to the chat history.
```csharp
public ChatSession AddSystemMessage(string content)
```
#### Parameters
`content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[ChatSession](./llama.chatsession.md)<br>
### **AddAssistantMessage(String)**
Add an assistant message to the chat history.
```csharp
public ChatSession AddAssistantMessage(string content)
```
#### Parameters
`content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[ChatSession](./llama.chatsession.md)<br>
### **AddUserMessage(String)**
Add a user message to the chat history.
```csharp
public ChatSession AddUserMessage(string content)
```
#### Parameters
`content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[ChatSession](./llama.chatsession.md)<br>
### **RemoveLastMessage()**
Remove the last message from the chat history.
```csharp
public ChatSession RemoveLastMessage()
```
#### Returns
[ChatSession](./llama.chatsession.md)<br>
### **AddAndProcessMessage(Message)**
Compute KV cache for the message and add it to the chat history.
```csharp
public Task<ChatSession> AddAndProcessMessage(Message message)
```
#### Parameters
`message` [Message](./llama.common.chathistory.message.md)<br>
#### Returns
[Task&lt;ChatSession&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
### **AddAndProcessSystemMessage(String)**
Compute KV cache for the system message and add it to the chat history.
```csharp
public Task<ChatSession> AddAndProcessSystemMessage(string content)
```
#### Parameters
`content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[Task&lt;ChatSession&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
### **AddAndProcessUserMessage(String)**
Compute KV cache for the user message and add it to the chat history.
```csharp
public Task<ChatSession> AddAndProcessUserMessage(string content)
```
#### Parameters
`content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[Task&lt;ChatSession&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
### **AddAndProcessAssistantMessage(String)**
Compute KV cache for the assistant message and add it to the chat history.
```csharp
public Task<ChatSession> AddAndProcessAssistantMessage(string content)
```
#### Parameters
`content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[Task&lt;ChatSession&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
### **ReplaceUserMessage(Message, Message)**
Replace a user message with a new message and remove all messages after the new message.
This is useful when the user wants to edit a message. And regenerate the response.
```csharp
public ChatSession ReplaceUserMessage(Message oldMessage, Message newMessage)
```
#### Parameters
`oldMessage` [Message](./llama.common.chathistory.message.md)<br>
`newMessage` [Message](./llama.common.chathistory.message.md)<br>
#### Returns
[ChatSession](./llama.chatsession.md)<br>
### **ChatAsync(Message, Boolean, IInferenceParams, CancellationToken)**
Chat with the model.
```csharp
public IAsyncEnumerable<string> ChatAsync(Message message, bool applyInputTransformPipeline, IInferenceParams inferenceParams, CancellationToken cancellationToken)
```
#### Parameters
`message` [Message](./llama.common.chathistory.message.md)<br>
`applyInputTransformPipeline` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
#### Returns
[IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
#### Exceptions
[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
### **ChatAsync(Message, IInferenceParams, CancellationToken)**
Chat with the model.
```csharp
public IAsyncEnumerable<string> ChatAsync(Message message, IInferenceParams inferenceParams, CancellationToken cancellationToken)
```
#### Parameters
`message` [Message](./llama.common.chathistory.message.md)<br>
`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
#### Returns
[IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
### **ChatAsync(ChatHistory, Boolean, IInferenceParams, CancellationToken)**
Chat with the model.
```csharp
public IAsyncEnumerable<string> ChatAsync(ChatHistory history, bool applyInputTransformPipeline, IInferenceParams inferenceParams, CancellationToken cancellationToken)
```
#### Parameters
`history` [ChatHistory](./llama.common.chathistory.md)<br>
`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
#### Returns
[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
### **Chat(String, IInferenceParams, CancellationToken)**
Get the response from the LLama model. Note that prompt could not only be the preset words,
but also the question you want to ask.
```csharp
public IEnumerable<string> Chat(string prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)
```
#### Parameters
`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
`applyInputTransformPipeline` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
@ -200,11 +509,15 @@ public IEnumerable<string> Chat(string prompt, IInferenceParams inferenceParams,
#### Returns
[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
[IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
#### Exceptions
[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
### **ChatAsync(ChatHistory, IInferenceParams, CancellationToken)**
Get the response from the LLama model with chat histories.
Chat with the model.
```csharp
public IAsyncEnumerable<string> ChatAsync(ChatHistory history, IInferenceParams inferenceParams, CancellationToken cancellationToken)
@ -222,22 +535,24 @@ public IAsyncEnumerable<string> ChatAsync(ChatHistory history, IInferenceParams
[IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
### **ChatAsync(String, IInferenceParams, CancellationToken)**
### **RegenerateAssistantMessageAsync(InferenceParams, CancellationToken)**
Get the response from the LLama model with chat histories asynchronously.
Regenerate the last assistant message.
```csharp
public IAsyncEnumerable<string> ChatAsync(string prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)
public IAsyncEnumerable<string> RegenerateAssistantMessageAsync(InferenceParams inferenceParams, CancellationToken cancellationToken)
```
#### Parameters
`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
#### Returns
[IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
#### Exceptions
[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>

View File

@ -17,7 +17,7 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
List of messages in the chat
```csharp
public List<Message> Messages { get; }
public List<Message> Messages { get; set; }
```
#### Property Value
@ -34,6 +34,18 @@ Create a new instance of the chat content class
public ChatHistory()
```
### **ChatHistory(Message[])**
Create a new instance of the chat history from array of messages
```csharp
public ChatHistory(Message[] messageHistory)
```
#### Parameters
`messageHistory` [Message[]](./llama.common.chathistory.message.md)<br>
## Methods
### **AddMessage(AuthorRole, String)**
@ -51,3 +63,31 @@ Role of the message author
`content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
Message content
### **ToJson()**
Serialize the chat history to JSON
```csharp
public string ToJson()
```
#### Returns
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **FromJson(String)**
Deserialize a chat history from JSON
```csharp
public static ChatHistory FromJson(string json)
```
#### Parameters
`json` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
#### Returns
[ChatHistory](./llama.common.chathistory.md)<br>

View File

@ -6,7 +6,7 @@ A queue with fixed storage size.
Currently it's only a naive implementation and needs to be further optimized in the future.
```csharp
public class FixedSizeQueue<T> : , System.Collections.IEnumerable
public class FixedSizeQueue<T> : , , , System.Collections.IEnumerable
```
#### Type Parameters
@ -14,10 +14,20 @@ public class FixedSizeQueue<T> : , System.Collections.IEnumerable
`T`<br>
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [FixedSizeQueue&lt;T&gt;](./llama.common.fixedsizequeue-1.md)<br>
Implements IEnumerable&lt;T&gt;, [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable)
Implements IReadOnlyList&lt;T&gt;, IReadOnlyCollection&lt;T&gt;, IEnumerable&lt;T&gt;, [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable)
## Properties
### **Item**
```csharp
public T Item { get; }
```
#### Property Value
T<br>
### **Count**
Number of items in this queue
@ -73,24 +83,6 @@ public FixedSizeQueue(int size, IEnumerable<T> data)
## Methods
### **FillWith(T)**
Replace every item in the queue with the given value
```csharp
public FixedSizeQueue<T> FillWith(T value)
```
#### Parameters
`value` T<br>
The value to replace all items with
#### Returns
[FixedSizeQueue&lt;T&gt;](./llama.common.fixedsizequeue-1.md)<br>
returns this
### **Enqueue(T)**
Enquene an element.

View File

@ -1,30 +0,0 @@
# ILLamaLogger
Namespace: LLama.Common
receives log messages from LLamaSharp
```csharp
public interface ILLamaLogger
```
## Methods
### **Log(String, String, LogLevel)**
Write the log in customized way
```csharp
void Log(string source, string message, LogLevel level)
```
#### Parameters
`source` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
The source of the log. It may be a method name or class name.
`message` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
The message.
`level` [LogLevel](./llama.common.illamalogger.loglevel.md)<br>
The log level.

View File

@ -5,11 +5,11 @@ Namespace: LLama.Common
The paramters used for inference.
```csharp
public class InferenceParams : LLama.Abstractions.IInferenceParams
public class InferenceParams : LLama.Abstractions.IInferenceParams, System.IEquatable`1[[LLama.Common.InferenceParams, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [InferenceParams](./llama.common.inferenceparams.md)<br>
Implements [IInferenceParams](./llama.abstractions.iinferenceparams.md)
Implements [IInferenceParams](./llama.abstractions.iinferenceparams.md), [IEquatable&lt;InferenceParams&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
## Properties
@ -43,65 +43,27 @@ public int MaxTokens { get; set; }
logit bias for specific tokens
```csharp
public Dictionary<int, float> LogitBias { get; set; }
public Dictionary<LLamaToken, float> LogitBias { get; set; }
```
#### Property Value
[Dictionary&lt;Int32, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
[Dictionary&lt;LLamaToken, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
### **AntiPrompts**
Sequences where the model will stop generating further tokens.
```csharp
public IEnumerable<string> AntiPrompts { get; set; }
public IReadOnlyList<string> AntiPrompts { get; set; }
```
#### Property Value
[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
### **PathSession**
path to file for saving/loading model eval state
```csharp
public string PathSession { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **InputSuffix**
string to suffix user inputs with
```csharp
public string InputSuffix { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **InputPrefix**
string to prefix user inputs with
```csharp
public string InputPrefix { get; set; }
```
#### Property Value
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
[IReadOnlyList&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
### **TopK**
0 or lower to use vocab size
```csharp
public int TopK { get; set; }
```
@ -112,8 +74,6 @@ public int TopK { get; set; }
### **TopP**
1.0 = disabled
```csharp
public float TopP { get; set; }
```
@ -122,9 +82,17 @@ public float TopP { get; set; }
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **TfsZ**
### **MinP**
1.0 = disabled
```csharp
public float MinP { get; set; }
```
#### Property Value
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
### **TfsZ**
```csharp
public float TfsZ { get; set; }
@ -136,8 +104,6 @@ public float TfsZ { get; set; }
### **TypicalP**
1.0 = disabled
```csharp
public float TypicalP { get; set; }
```
@ -148,8 +114,6 @@ public float TypicalP { get; set; }
### **Temperature**
1.0 = disabled
```csharp
public float Temperature { get; set; }
```
@ -160,8 +124,6 @@ public float Temperature { get; set; }
### **RepeatPenalty**
1.0 = disabled
```csharp
public float RepeatPenalty { get; set; }
```
@ -172,8 +134,6 @@ public float RepeatPenalty { get; set; }
### **RepeatLastTokensCount**
last n tokens to penalize (0 = disable penalty, -1 = context size) (repeat_last_n)
```csharp
public int RepeatLastTokensCount { get; set; }
```
@ -184,9 +144,6 @@ public int RepeatLastTokensCount { get; set; }
### **FrequencyPenalty**
frequency penalty coefficient
0.0 = disabled
```csharp
public float FrequencyPenalty { get; set; }
```
@ -197,9 +154,6 @@ public float FrequencyPenalty { get; set; }
### **PresencePenalty**
presence penalty coefficient
0.0 = disabled
```csharp
public float PresencePenalty { get; set; }
```
@ -210,10 +164,6 @@ public float PresencePenalty { get; set; }
### **Mirostat**
Mirostat uses tokens instead of words.
algorithm described in the paper https://arxiv.org/abs/2007.14966.
0 = disabled, 1 = mirostat, 2 = mirostat 2.0
```csharp
public MirostatType Mirostat { get; set; }
```
@ -224,8 +174,6 @@ public MirostatType Mirostat { get; set; }
### **MirostatTau**
target entropy
```csharp
public float MirostatTau { get; set; }
```
@ -236,8 +184,6 @@ public float MirostatTau { get; set; }
### **MirostatEta**
learning rate
```csharp
public float MirostatEta { get; set; }
```
@ -248,8 +194,6 @@ public float MirostatEta { get; set; }
### **PenalizeNL**
consider newlines as a repeatable token (penalize_nl)
```csharp
public bool PenalizeNL { get; set; }
```
@ -260,8 +204,6 @@ public bool PenalizeNL { get; set; }
### **Grammar**
A grammar to constrain the possible tokens
```csharp
public SafeLLamaGrammarHandle Grammar { get; set; }
```
@ -270,6 +212,16 @@ public SafeLLamaGrammarHandle Grammar { get; set; }
[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
### **SamplingPipeline**
```csharp
public ISamplingPipeline SamplingPipeline { get; set; }
```
#### Property Value
[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
## Constructors
### **InferenceParams()**
@ -277,3 +229,77 @@ public SafeLLamaGrammarHandle Grammar { get; set; }
```csharp
public InferenceParams()
```
## Methods
### **ToString()**
```csharp
public string ToString()
```
#### Returns
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
### **PrintMembers(StringBuilder)**
```csharp
protected bool PrintMembers(StringBuilder builder)
```
#### Parameters
`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **GetHashCode()**
```csharp
public int GetHashCode()
```
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
### **Equals(Object)**
```csharp
public bool Equals(object obj)
```
#### Parameters
`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **Equals(InferenceParams)**
```csharp
public bool Equals(InferenceParams other)
```
#### Parameters
`other` [InferenceParams](./llama.common.inferenceparams.md)<br>
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
### **&lt;Clone&gt;$()**
```csharp
public InferenceParams <Clone>$()
```
#### Returns
[InferenceParams](./llama.common.inferenceparams.md)<br>

Some files were not shown because too many files have changed in this diff Show More