LLamaSharp/README.md

158 lines
6.0 KiB
Markdown
Raw Normal View History

2023-05-12 19:44:16 +08:00
# LLamaSharp - .NET Binding for llama.cpp
2023-05-11 16:32:07 +08:00
![logo](Assets/LLamaSharpLogo.png)
2023-05-18 11:47:49 +08:00
[![Discord](https://img.shields.io/discord/1106946823282761851?label=Discord)](https://discord.gg/M2fS4PNj)
[![LLamaSharp Badge](https://img.shields.io/nuget/v/LLamaSharp?label=LLamaSharp)](https://www.nuget.org/packages/LLamaSharp)
[![LLamaSharp Badge](https://img.shields.io/nuget/v/LLamaSharp.Backend.Cpu?label=LLamaSharp.Backend.Cpu)](https://www.nuget.org/packages/LLamaSharp.Backend.Cpu)
[![LLamaSharp Badge](https://img.shields.io/nuget/v/LLamaSharp.Backend.Cuda11?label=LLamaSharp.Backend.Cuda11)](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda11)
[![LLamaSharp Badge](https://img.shields.io/nuget/v/LLamaSharp.Backend.Cuda12?label=LLamaSharp.Backend.Cuda12)](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda12)
2023-05-11 21:18:40 +08:00
The C#/.NET binding of [llama.cpp](https://github.com/ggerganov/llama.cpp). It provides APIs to inference the LLaMa Models and deploy it on native environment or Web. It works on
2023-05-12 19:44:16 +08:00
both Windows and Linux and does NOT require compiling llama.cpp yourself.
2023-05-11 16:32:07 +08:00
2023-05-11 17:44:42 +08:00
- Load and inference LLaMa models
- Simple APIs for chat session
- Quantize the model in C#/.NET
- ASP.NET core integration
- Native UI integration
2023-05-11 16:32:07 +08:00
## Installation
2023-05-17 03:27:14 +08:00
Firstly, search `LLamaSharp` in nuget package manager and install it.
2023-05-11 16:32:07 +08:00
```
2023-05-11 20:30:36 +08:00
PM> Install-Package LLamaSharp
2023-05-11 16:32:07 +08:00
```
2023-05-17 03:27:14 +08:00
Then, search and install one of the following backends:
2023-05-17 03:27:14 +08:00
```
LLamaSharp.Backend.Cpu
LLamaSharp.Backend.Cuda11
LLamaSharp.Backend.Cuda12
```
2023-05-18 04:52:45 +08:00
The latest version of `LLamaSharp` and `LLamaSharp.Backend` may not always be the same. `LLamaSharp.Backend` follows up [llama.cpp](https://github.com/ggerganov/llama.cpp) because sometimes the
break change of it makes some model weights invalid. If you are not sure which version of backend to install, just install the latest version.
2023-05-17 03:27:14 +08:00
Note that version v0.2.1 has a package named `LLamaSharp.Cpu`. After v0.2.2 it will be dropped.
We publish the backend with cpu, cuda11 and cuda12 because they are the most popular ones. If none of them matches, please compile the [llama.cpp](https://github.com/ggerganov/llama.cpp)
from source and put the `libllama` under your project's output path. When building from source, please add `-DBUILD_SHARED_LIBS=ON` to enable the library generation.
2023-05-18 20:34:08 +08:00
## FAQ
1. GPU out of memory: v0.2.3 put all layers into GPU by default. If the momory use is out of the capacity of your GPU, please set `n_gpu_layers` to a smaller number.
2. Unsupported model: `llama.cpp` is under quick development and often has break changes. Please check the release date of the model and find a suitable version of LLamaSharp to install.
2023-05-11 21:18:40 +08:00
## Simple Benchmark
Currently it's only a simple benchmark to indicate that the performance of `LLamaSharp` is close to `llama.cpp`. Experiments run on a computer
2023-05-17 03:27:14 +08:00
with Intel i7-12700, 3060Ti with 7B model. Note that the benchmark uses `LLamaModel` instead of `LLamaModelV1`.
2023-05-11 21:18:40 +08:00
#### Windows
- llama.cpp: 2.98 words / second
- LLamaSharp: 2.94 words / second
2023-05-11 16:32:07 +08:00
## Usages
2023-05-12 12:04:44 +08:00
#### Model Inference and Chat Session
2023-05-11 16:32:07 +08:00
Currently, `LLamaSharp` provides two kinds of model, `LLamaModelV1` and `LLamaModel`. Both of them works but `LLamaModel` is more recommended
because it provides better alignment with the master branch of [llama.cpp](https://github.com/ggerganov/llama.cpp).
Besides, `ChatSession` makes it easier to wrap your own chat bot. The code below is a simple example. For all examples, please refer to
[Examples](./LLama.Examples).
```cs
var model = new LLamaModel(new LLamaParams(model: "<Your path>", n_ctx: 512, repeat_penalty: 1.0f));
var session = new ChatSession<LLamaModel>(model).WithPromptFile("<Your prompt file path>")
.WithAntiprompt(new string[] { "User:" });
2023-05-11 16:32:07 +08:00
Console.Write("\nUser:");
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
var question = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
2023-05-11 16:34:30 +08:00
var outputs = session.Chat(question); // It's simple to use the chat API.
2023-05-11 16:32:07 +08:00
foreach (var output in outputs)
{
Console.Write(output);
}
}
```
2023-05-12 12:04:44 +08:00
#### Quantization
2023-05-11 17:44:42 +08:00
The following example shows how to quantize the model. With LLamaSharp you needn't to compile c++ project and run scripts to quantize the model, instead, just run it in C#.
```cs
string srcFilename = "<Your source path>";
string dstFilename = "<Your destination path>";
string ftype = "q4_0";
if(Quantizer.Quantize(srcFileName, dstFilename, ftype))
{
Console.WriteLine("Quantization succeed!");
}
else
{
Console.WriteLine("Quantization failed!");
}
```
2023-05-17 03:27:14 +08:00
For more usages, please refer to [Examples](./LLama.Examples).
2023-05-12 12:04:44 +08:00
#### Web API
We provide the integration of ASP.NET core [here](./LLama.WebAPI). Since currently the API is not stable, please clone the repo and use it. In the future we'll publish it on NuGet.
2023-05-11 16:32:07 +08:00
## Demo
![demo-console](Assets/console_demo.gif)
## Roadmap
✅ LLaMa model inference.
✅ Embeddings generation.
✅ Chat session.
2023-05-11 17:44:42 +08:00
✅ Quantization
2023-05-11 16:32:07 +08:00
2023-05-12 12:04:44 +08:00
✅ ASP.NET core Integration
2023-05-11 16:32:07 +08:00
2023-05-13 22:48:40 +08:00
🔳 UI Integration
2023-05-11 16:32:07 +08:00
2023-05-12 19:44:16 +08:00
🔳 Follow up llama.cpp and improve performance
2023-05-11 16:32:07 +08:00
## Assets
2023-05-13 22:48:40 +08:00
The model weights are too large to be included in the repository. However some resources could be found below:
2023-05-11 16:32:07 +08:00
- [eachadea/ggml-vicuna-13b-1.1](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main)
- [TheBloke/wizardLM-7B-GGML](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
- Magnet: [magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA](magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA)
The weights included in the magnet is exactly the weights from [Facebook LLaMa](https://github.com/facebookresearch/llama).
The prompts could be found below:
2023-05-13 22:48:40 +08:00
2023-05-11 16:32:07 +08:00
- [llama.cpp prompts](https://github.com/ggerganov/llama.cpp/tree/master/prompts)
- [ChatGPT_DAN](https://github.com/0xk1h0/ChatGPT_DAN)
2023-05-17 03:29:21 +08:00
- [awesome-chatgpt-prompts](https://github.com/f/awesome-chatgpt-prompts)
- [awesome-chatgpt-prompts-zh](https://github.com/PlexPt/awesome-chatgpt-prompts-zh) (Chinese)
2023-05-11 16:32:07 +08:00
2023-05-13 22:48:40 +08:00
## Contact us
Join our chat on [Discord](https://discord.gg/quBc2jrz).
2023-05-11 16:32:07 +08:00
## License
2023-05-13 20:15:09 +08:00
This project is licensed under the terms of the MIT license.