LLamaSharp/README.md

122 lines
3.8 KiB
Markdown
Raw Normal View History

2023-05-12 19:44:16 +08:00
# LLamaSharp - .NET Binding for llama.cpp
2023-05-11 16:32:07 +08:00
![logo](Assets/LLamaSharpLogo.png)
2023-05-11 21:18:40 +08:00
The C#/.NET binding of [llama.cpp](https://github.com/ggerganov/llama.cpp). It provides APIs to inference the LLaMa Models and deploy it on native environment or Web. It works on
2023-05-12 19:44:16 +08:00
both Windows and Linux and does NOT require compiling llama.cpp yourself.
2023-05-11 16:32:07 +08:00
2023-05-11 17:44:42 +08:00
- Load and inference LLaMa models
- Simple APIs for chat session
- Quantize the model in C#/.NET
- ASP.NET core integration
- Native UI integration
2023-05-11 16:32:07 +08:00
## Installation
2023-05-11 20:30:36 +08:00
Just search `LLamaSharp` in nuget package manager and install it!
2023-05-11 16:32:07 +08:00
```
2023-05-11 20:30:36 +08:00
PM> Install-Package LLamaSharp
2023-05-11 16:32:07 +08:00
```
2023-05-11 21:18:40 +08:00
## Simple Benchmark
Currently it's only a simple benchmark to indicate that the performance of `LLamaSharp` is close to `llama.cpp`. Experiments run on a computer
with Intel i7-12700, 3060Ti with 7B model. Note that the benchmark uses `LLamaModel` instead of `LLamaModelV1`.
#### Windows
- llama.cpp: 2.98 words / second
- LLamaSharp: 2.94 words / second
2023-05-11 16:32:07 +08:00
## Usages
2023-05-12 12:04:44 +08:00
#### Model Inference and Chat Session
2023-05-11 16:32:07 +08:00
Currently, `LLamaSharp` provides two kinds of model, `LLamaModelV1` and `LLamaModel`. Both of them works but `LLamaModel` is more recommended
because it provides better alignment with the master branch of [llama.cpp](https://github.com/ggerganov/llama.cpp).
Besides, `ChatSession` makes it easier to wrap your own chat bot. The code below is a simple example. For all examples, please refer to
[Examples](./LLama.Examples).
```cs
var model = new LLamaModel(new LLamaParams(model: "<Your path>", n_ctx: 512, repeat_penalty: 1.0f));
var session = new ChatSession<LLamaModel>(model).WithPromptFile("<Your prompt file path>")
.WithAntiprompt(new string[] { "User:" );
Console.Write("\nUser:");
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
var question = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
2023-05-11 16:34:30 +08:00
var outputs = session.Chat(question); // It's simple to use the chat API.
2023-05-11 16:32:07 +08:00
foreach (var output in outputs)
{
Console.Write(output);
}
}
```
2023-05-12 12:04:44 +08:00
#### Quantization
2023-05-11 17:44:42 +08:00
The following example shows how to quantize the model. With LLamaSharp you needn't to compile c++ project and run scripts to quantize the model, instead, just run it in C#.
```cs
string srcFilename = "<Your source path>";
string dstFilename = "<Your destination path>";
string ftype = "q4_0";
if(Quantizer.Quantize(srcFileName, dstFilename, ftype))
{
Console.WriteLine("Quantization succeed!");
}
else
{
Console.WriteLine("Quantization failed!");
}
```
2023-05-12 12:04:44 +08:00
#### Web API
We provide the integration of ASP.NET core [here](./LLama.WebAPI). Since currently the API is not stable, please clone the repo and use it. In the future we'll publish it on NuGet.
2023-05-11 16:32:07 +08:00
## Demo
![demo-console](Assets/console_demo.gif)
## Roadmap
✅ LLaMa model inference.
✅ Embeddings generation.
✅ Chat session.
2023-05-11 17:44:42 +08:00
✅ Quantization
2023-05-11 16:32:07 +08:00
2023-05-12 12:04:44 +08:00
✅ ASP.NET core Integration
2023-05-11 16:32:07 +08:00
🔳 WPF UI Integration
2023-05-12 19:44:16 +08:00
🔳 Follow up llama.cpp and improve performance
2023-05-11 16:32:07 +08:00
## Assets
The model weights is too large to include in the project. However some resources could be found below:
- [eachadea/ggml-vicuna-13b-1.1](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main)
- [TheBloke/wizardLM-7B-GGML](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
- Magnet: [magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA](magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA)
The weights included in the magnet is exactly the weights from [Facebook LLaMa](https://github.com/facebookresearch/llama).
The prompts could be found below:
- [llama.cpp prompts](https://github.com/ggerganov/llama.cpp/tree/master/prompts)
- [ChatGPT_DAN](https://github.com/0xk1h0/ChatGPT_DAN)
- [awesome-chatgpt-prompts-zh](https://github.com/PlexPt/awesome-chatgpt-prompts-zh)
## License
This project is licensed under the terms of the MIT license.