Commit Graph

240 Commits

Author SHA1 Message Date
SignalRT 348f2c7d72 Update llama.cpp binaries to 5f631c2 and align the context to that version
It solves the problem with netstandard2 (is it really netstandard2 a thing right now?)
Change context to solve problems.

5f631c26794b6371fcf2660e8d0c53494a5575f7
2023-08-05 12:45:34 +02:00
Rinne 8d37abd787
Merge pull request #68 from martindevans/sampling_improvements
Fixed Memory pinning in Sampling API
2023-08-05 08:55:12 +08:00
Martin Evans add3d5528b Removed `MarshalAs` on array 2023-08-03 14:16:41 +01:00
Martin Evans 2245b84906
Update LLamaContextParams.cs 2023-08-02 23:13:07 +01:00
sa_ddam213 3e252c81f6 LLamaContextParams epsilon and tensor split changes 2023-07-28 19:15:19 +12:00
Martin Evans ec49bdd6eb - Most importantly: Fixed issue in `SamplingApi`, `Memory` was pinned, but never unpinned!
- Moved repeated code to convert `LLamaTokenDataArray` into a `LLamaTokenDataArrayNative` into a helper method.
   - Modified all call sites to dispose the `MemoryHandle`
 - Saved one copy of the `List<LLamaTokenData>` into a `LLamaTokenData[]` in `LlamaModel`
2023-07-27 20:45:59 +01:00
Martin Evans 6985d3ab60 Added comments on two properties 2023-07-27 18:58:29 +01:00
Martin Evans c974c8429e Removed leftover `using` 2023-07-25 20:30:10 +01:00
Martin Evans afb9d24f3a Added model `Tokenize` method 2023-07-25 20:29:35 +01:00
Martin Evans 369c915afe Added TokenToString conversion on model handle 2023-07-25 16:55:04 +01:00
Martin Evans b721072aa5 Exposed some extra model properties on safe handle 2023-07-25 16:41:17 +01:00
Martin Evans 44b1e93609 Moved LoRA loading into `SafeLlamaModelHandle` 2023-07-25 16:35:24 +01:00
Martin Evans c95b14d8b3 - Fixed null check
- Additional comments
2023-07-25 16:23:25 +01:00
Martin Evans f16aa58e12 Updated to use the new loading system in llama (llama_state). This new system has split model weights and contexts into two separate things, allowing one set of weights to be shared between many contexts.
This change _only_ implements the low level API and makes no effort to update the LlamaSharp higher level abstraction.

It is built upon llama `b3f138d`, necessary DLLs are **not** included in this commit.
2023-07-25 01:18:12 +01:00
Rinne c5e8b3eba2
Merge pull request #56 from martindevans/memory_mapped_save_loading_and_saving
Memory Mapped LoadState/SaveState
2023-07-24 22:49:00 +08:00
Rinne d17fa991cc
Merge pull request #53 from martindevans/xml_docs_fixes
XML docs fixes
2023-07-24 22:31:51 +08:00
Rinne 1b0523f630
Merge branch 'master' into master 2023-07-22 23:27:50 +08:00
Martin Evans 4d72420a04 Replaced `SaveState` and `LoadState` implementations. These new implementations map the file into memory and then pass the pointer directly into the native API. This improves things in two ways:
- A C# array cannot exceed 2,147,483,591 bytes. In my own use of LlamaSharp I encountered this limit.
 - This saves an extra copy of the entire state data into a C# `byte[]`, so it should be faster.

This does _not_ fix some other places where `GetStateData` is used. I'll look at those in a separate PR.
2023-07-21 18:54:31 +01:00
Martin Evans 2e76b79af6 Various minor XML docs fixes 2023-07-20 16:07:53 +01:00
SignalRT 56a37a0d7d Update to lates llama.cpp
Adapt the interface change in llama_backend_init
2023-07-15 11:42:19 +02:00
unknown dba866ffcf Update API method name 2023-07-13 22:39:26 -07:00
Yaohui Liu 1062fe1a7e
feat: upgrade the native libraries. 2023-06-21 15:21:27 +08:00
Yaohui Liu 9850417a12
feat: update quantize native params. 2023-06-20 23:32:58 +08:00
Yaohui Liu 3bf74ec9b9
feat: add chat session for refactored code. 2023-06-12 02:47:25 +08:00
Yaohui Liu 264fb9a706
refactor: LLamaModel and LLamaExecutor. 2023-06-10 18:37:58 +08:00
Yaohui Liu 3a62f087fe
fix: encoding error when using other languages. 2023-06-03 18:51:20 +08:00
Yaohui Liu 18c2ff2395
refactor: instruct mode and examples. 2023-05-21 20:36:49 +08:00
Yaohui Liu 55d5a8ae51
fix: quantization error with fp16. 2023-05-20 23:51:22 +08:00
Yaohui Liu 19979f664a
feat: support loading and saving state. 2023-05-20 14:01:20 +08:00
Yaohui Liu 00d91cf99e
refactor: some parts of code of LLamaModel. 2023-05-18 03:59:55 +08:00
Yaohui Liu 1fca06dc7f
fix: n_gpu_layers miss in llama context. 2023-05-17 04:22:54 +08:00
Yaohui Liu 4314f64b9c
feat: add check for backend package. 2023-05-17 03:40:45 +08:00
Yaohui Liu 6ffcb5306b
refactor: use official api of quantization instead. 2023-05-13 15:02:19 +08:00
Yaohui Liu 0958bbac2c
feat: add get-embedding api to LLamaModel. 2023-05-13 02:08:03 +08:00
Yaohui Liu 33067f990f
feat: run quantization in csharp. 2023-05-11 17:38:28 +08:00
Yaohui Liu 118d410d52
build: revise build informations. 2023-05-11 13:57:57 +08:00
Yaohui Liu 856d6549de build: add linux support. 2023-05-11 04:20:56 +08:00
Yaohui Liu 02524ae4eb
build: add package informations. 2023-05-11 04:07:02 +08:00
Yaohui Liu d6a7997e46
feat: add gpt model. 2023-05-10 20:48:16 +08:00
Yaohui Liu 5a79edeb51
feat: add the framework and basic usages. 2023-05-10 02:13:41 +08:00