2023-05-12 19:44:16 +08:00
# LLamaSharp - .NET Binding for llama.cpp
2023-05-11 16:32:07 +08:00
![logo ](Assets/LLamaSharpLogo.png )
2023-05-18 11:47:49 +08:00
[![Discord ](https://img.shields.io/discord/1106946823282761851?label=Discord )](https://discord.gg/M2fS4PNj)
[![LLamaSharp Badge ](https://img.shields.io/nuget/v/LLamaSharp?label=LLamaSharp )](https://www.nuget.org/packages/LLamaSharp)
[![LLamaSharp Badge ](https://img.shields.io/nuget/v/LLamaSharp.Backend.Cpu?label=LLamaSharp.Backend.Cpu )](https://www.nuget.org/packages/LLamaSharp.Backend.Cpu)
[![LLamaSharp Badge ](https://img.shields.io/nuget/v/LLamaSharp.Backend.Cuda11?label=LLamaSharp.Backend.Cuda11 )](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda11)
[![LLamaSharp Badge ](https://img.shields.io/nuget/v/LLamaSharp.Backend.Cuda12?label=LLamaSharp.Backend.Cuda12 )](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda12)
2023-05-11 21:18:40 +08:00
The C#/.NET binding of [llama.cpp ](https://github.com/ggerganov/llama.cpp ). It provides APIs to inference the LLaMa Models and deploy it on native environment or Web. It works on
2023-05-12 19:44:16 +08:00
both Windows and Linux and does NOT require compiling llama.cpp yourself.
2023-05-11 16:32:07 +08:00
2023-05-11 17:44:42 +08:00
- Load and inference LLaMa models
- Simple APIs for chat session
- Quantize the model in C#/.NET
- ASP.NET core integration
- Native UI integration
2023-05-11 16:32:07 +08:00
## Installation
2023-05-17 03:27:14 +08:00
Firstly, search `LLamaSharp` in nuget package manager and install it.
2023-05-11 16:32:07 +08:00
```
2023-05-11 20:30:36 +08:00
PM> Install-Package LLamaSharp
2023-05-11 16:32:07 +08:00
```
2023-05-17 03:27:14 +08:00
Then, search and install one of the following backends:
2023-05-13 02:44:53 +08:00
2023-05-17 03:27:14 +08:00
```
LLamaSharp.Backend.Cpu
LLamaSharp.Backend.Cuda11
LLamaSharp.Backend.Cuda12
```
2023-05-18 04:52:45 +08:00
The latest version of `LLamaSharp` and `LLamaSharp.Backend` may not always be the same. `LLamaSharp.Backend` follows up [llama.cpp ](https://github.com/ggerganov/llama.cpp ) because sometimes the
break change of it makes some model weights invalid. If you are not sure which version of backend to install, just install the latest version.
2023-05-17 03:27:14 +08:00
Note that version v0.2.1 has a package named `LLamaSharp.Cpu` . After v0.2.2 it will be dropped.
We publish the backend with cpu, cuda11 and cuda12 because they are the most popular ones. If none of them matches, please compile the [llama.cpp ](https://github.com/ggerganov/llama.cpp )
from source and put the `libllama` under your project's output path. When building from source, please add `-DBUILD_SHARED_LIBS=ON` to enable the library generation.
2023-05-13 02:44:53 +08:00
2023-05-18 20:34:08 +08:00
## FAQ
1. GPU out of memory: v0.2.3 put all layers into GPU by default. If the momory use is out of the capacity of your GPU, please set `n_gpu_layers` to a smaller number.
2. Unsupported model: `llama.cpp` is under quick development and often has break changes. Please check the release date of the model and find a suitable version of LLamaSharp to install.
2023-05-11 21:18:40 +08:00
## Simple Benchmark
Currently it's only a simple benchmark to indicate that the performance of `LLamaSharp` is close to `llama.cpp` . Experiments run on a computer
2023-05-17 03:27:14 +08:00
with Intel i7-12700, 3060Ti with 7B model. Note that the benchmark uses `LLamaModel` instead of `LLamaModelV1` .
2023-05-11 21:18:40 +08:00
#### Windows
- llama.cpp: 2.98 words / second
- LLamaSharp: 2.94 words / second
2023-05-11 16:32:07 +08:00
## Usages
2023-05-12 12:04:44 +08:00
#### Model Inference and Chat Session
2023-05-11 16:32:07 +08:00
Currently, `LLamaSharp` provides two kinds of model, `LLamaModelV1` and `LLamaModel` . Both of them works but `LLamaModel` is more recommended
because it provides better alignment with the master branch of [llama.cpp ](https://github.com/ggerganov/llama.cpp ).
Besides, `ChatSession` makes it easier to wrap your own chat bot. The code below is a simple example. For all examples, please refer to
[Examples ](./LLama.Examples ).
```cs
var model = new LLamaModel(new LLamaParams(model: "< Your path > ", n_ctx: 512, repeat_penalty: 1.0f));
var session = new ChatSession< LLamaModel > (model).WithPromptFile("< Your prompt file path > ")
2023-05-14 05:01:13 +08:00
.WithAntiprompt(new string[] { "User:" });
2023-05-11 16:32:07 +08:00
Console.Write("\nUser:");
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
var question = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
2023-05-11 16:34:30 +08:00
var outputs = session.Chat(question); // It's simple to use the chat API.
2023-05-11 16:32:07 +08:00
foreach (var output in outputs)
{
Console.Write(output);
}
}
```
2023-05-12 12:04:44 +08:00
#### Quantization
2023-05-11 17:44:42 +08:00
The following example shows how to quantize the model. With LLamaSharp you needn't to compile c++ project and run scripts to quantize the model, instead, just run it in C#.
```cs
string srcFilename = "< Your source path > ";
string dstFilename = "< Your destination path > ";
string ftype = "q4_0";
if(Quantizer.Quantize(srcFileName, dstFilename, ftype))
{
Console.WriteLine("Quantization succeed!");
}
else
{
Console.WriteLine("Quantization failed!");
}
```
2023-05-17 03:27:14 +08:00
For more usages, please refer to [Examples ](./LLama.Examples ).
2023-05-12 12:04:44 +08:00
#### Web API
We provide the integration of ASP.NET core [here ](./LLama.WebAPI ). Since currently the API is not stable, please clone the repo and use it. In the future we'll publish it on NuGet.
2023-05-11 16:32:07 +08:00
## Demo
![demo-console ](Assets/console_demo.gif )
## Roadmap
✅ LLaMa model inference.
✅ Embeddings generation.
✅ Chat session.
2023-05-11 17:44:42 +08:00
✅ Quantization
2023-05-11 16:32:07 +08:00
2023-05-12 12:04:44 +08:00
✅ ASP.NET core Integration
2023-05-11 16:32:07 +08:00
2023-05-13 22:48:40 +08:00
🔳 UI Integration
2023-05-11 16:32:07 +08:00
2023-05-12 19:44:16 +08:00
🔳 Follow up llama.cpp and improve performance
2023-05-11 16:32:07 +08:00
## Assets
2023-05-13 22:48:40 +08:00
The model weights are too large to be included in the repository. However some resources could be found below:
2023-05-11 16:32:07 +08:00
- [eachadea/ggml-vicuna-13b-1.1 ](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main )
- [TheBloke/wizardLM-7B-GGML ](https://huggingface.co/TheBloke/wizardLM-7B-GGML )
- Magnet: [magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA ](magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA )
The weights included in the magnet is exactly the weights from [Facebook LLaMa ](https://github.com/facebookresearch/llama ).
The prompts could be found below:
2023-05-13 22:48:40 +08:00
2023-05-11 16:32:07 +08:00
- [llama.cpp prompts ](https://github.com/ggerganov/llama.cpp/tree/master/prompts )
- [ChatGPT_DAN ](https://github.com/0xk1h0/ChatGPT_DAN )
2023-05-17 03:29:21 +08:00
- [awesome-chatgpt-prompts ](https://github.com/f/awesome-chatgpt-prompts )
- [awesome-chatgpt-prompts-zh ](https://github.com/PlexPt/awesome-chatgpt-prompts-zh ) (Chinese)
2023-05-11 16:32:07 +08:00
2023-05-13 22:48:40 +08:00
## Contact us
Join our chat on [Discord ](https://discord.gg/quBc2jrz ).
2023-05-11 16:32:07 +08:00
## License
2023-05-13 20:15:09 +08:00
This project is licensed under the terms of the MIT license.