Commit Graph

240 Commits

Author SHA1 Message Date
Martin Evans 1970023ef4
Merge pull request #292 from martindevans/dotnet8.0
dotnet8.0
2023-11-17 01:30:00 +00:00
Martin Evans 89fef05362 This commit (5fe721bdbe) accidentally removed a load of stuff that it shouldn't. Fixed that.
Originally from these PRs:
 - https://github.com/SciSharp/LLamaSharp/pull/263
 - https://github.com/SciSharp/LLamaSharp/pull/259
2023-11-15 01:36:33 +00:00
Martin Evans e9f5dbba89 Processing AVX512 branch on all dotnet versions 2023-11-15 01:18:22 +00:00
Martin Evans e850115b5f Added dotnet8.0 as a build target 2023-11-15 01:09:37 +00:00
Martin Evans b44e780b0f
Merge pull request #281 from martindevans/NativeLibraryConfig_improvements
CPU Feature Detection 2
2023-11-13 23:26:31 +00:00
Martin Evans e3468d04f0
Merge pull request #277 from martindevans/feature/min_p
MinP Sampler
2023-11-13 02:15:52 +00:00
Martin Evans a9d1f6cb47 - Renamed `NativeLibraryConfig.Default` to `NativeLibraryConfig.Instance`. It's not default any more as soon as you call `WithX`!
- using `Lazy<T>` to initialize it automatically.
 - Added in `AVX512` support for all dotnet versions (but not autodetected).
 - Added in AVX version auto detection.
2023-11-13 02:05:07 +00:00
Rinne da6718c387
docs: adjust some descriptions. 2023-11-12 15:26:45 +08:00
Yaohui Liu d7675f7936
Merge branch 'master' of github.com:AsakusaRinne/LLamaSharp into cuda_detection 2023-11-12 12:10:31 +08:00
Martin Evans d743516070 - Added support for the MinP sampler
- Cleaned up comments in implementations of `IInferenceParams`
 - Removed default values for all parameters in `LLamaContext.Sample` - they're never used and probably _shouldn't_ ever be used
2023-11-12 00:05:18 +00:00
Yaohui Liu cb5fb210b1
feat: optimize apis for cuda feature detection. 2023-11-12 04:30:08 +08:00
SignalRT 97006a214f Merge remote-tracking branch 'upstream/master' into RuntimeDetection 2023-11-11 20:31:53 +01:00
Yaohui Liu bbbfbd20b5
fix: cannot load library under some conditions. 2023-11-12 01:55:09 +08:00
Martin Evans 31244ae691
Merge branch 'master' into YaRN_scaling_parameters 2023-11-11 16:27:42 +00:00
SignalRT 7691f83516 Test build and nuget packages 2023-11-11 13:12:07 +01:00
Yaohui Liu d03e1dbe30
feat: support cuda feature detection. 2023-11-11 19:44:58 +08:00
SignalRT 5fe721bdbe Revert "Merge branch 'pr/268' into RuntimeDetection"
This reverts commit 091b8d58b3, reversing
changes made to 9b2ca9cf8e.
2023-11-09 22:13:18 +01:00
SignalRT 200011e186 Revert "Merge feat: add detection template for cuda and avx. #268"
This reverts commit b4b3ea9d99.
2023-11-09 22:12:58 +01:00
SignalRT b4b3ea9d99 Merge feat: add detection template for cuda and avx. #268
Just merge cuda and avx detection and change layout.
2023-11-08 23:20:12 +01:00
Yaohui Liu b893c6f609
feat: add detection template for cuda and avx. 2023-11-09 03:27:21 +08:00
Martin Evans db1bc741b0 Modified `ContextSize` in parameters to be nullable. A null value means autodetect from the model. 2023-11-07 19:41:44 +00:00
Martin Evans 04ee64a6be Exposed YaRN scaling parameters in IContextParams 2023-11-06 21:59:18 +00:00
SignalRT 46fb472d42 Align with llama.cpp b1488 2023-11-05 16:16:29 +01:00
Martin Evans a03fdc4818 Using a reference to an array instead of pointer arithmetic. This means it will benefit from bounds checking on the array. 2023-11-04 16:17:32 +00:00
Martin Evans 08c29d52c5 Slightly refactored `SafeLLamaGrammarHandle.Create` to solve CodeQL warning about pointer arithmetic. 2023-11-04 16:02:33 +00:00
Martin Evans b6d242193e Debugging slowdown by removing some things:
- Removed all `record struct` uses in native code
 - Removed usage of `readonly` in native structs

Minor fix:
 - Added sequential layout to `LLamaModelQuantizeParams`
2023-10-30 21:35:46 +00:00
Martin Evans 51c292ebd8 Added a safe method for `llama_get_logits_ith` 2023-10-28 23:15:45 +01:00
Martin Evans 7e3cde4c13 Moved helper methods into `LLamaBatchSafeHandle` 2023-10-28 22:09:09 +01:00
Martin Evans c7fdb9712c Added binaries, built from `6961c4bd0b` 2023-10-28 21:32:22 +01:00
Martin Evans e81b3023d5 Rewritten sampling API to be accessed through the `LLamaTokenDataArray` object 2023-10-28 21:32:21 +01:00
Martin Evans 3c5547b2b7 Reduced some uses of `NativeApi` in `BatchedDecoding` by adding some helper methods 2023-10-28 21:32:21 +01:00
Martin Evans a024d2242e It works!
had to update binary to `b1426`
2023-10-28 21:32:21 +01:00
Martin Evans 8cd81251b4 initial setup 2023-10-28 21:32:21 +01:00
Martin Evans 321d0b58c4
Merge pull request #202 from martindevans/multi_gpu
Multi GPU
2023-10-26 14:40:49 +01:00
Martin Evans a03fe003de Fixed decoding of text "accumulating" over time (never properly clearing buffer) 2023-10-23 16:42:38 +01:00
Martin Evans 51d4411a58 Added two new classes for detokenization tasks:
- `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected.
 - `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens.

Added tests for these classes and updated StatelessExecutor to use them.

Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).
2023-10-23 00:33:50 +01:00
Martin Evans efdf3d630c - Removed all `TokenToString` methods (it's never correct to use them, because sometimes one single character may be represented by multiple tokens).
- Built a new (hacky) `Detokenize` method which handles this
2023-10-22 21:43:36 +01:00
Martin Evans 1d0620e634 Created a test that "roundtrips" strings through tokenization. This reveals some flaws with certain characters 2023-10-22 15:28:36 +01:00
Martin Evans 04acbf8c42 Improved doc comment on `tensor_split` 2023-10-20 14:13:46 +01:00
Martin Evans 15db194c17 Added multi GPU support 2023-10-20 13:43:46 +01:00
Martin Evans e89ca5cc17 Fixed a few minor warnings 2023-10-19 00:43:50 +01:00
Martin Evans 9daf586ba8 Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc) 2023-10-19 00:26:30 +01:00
Martin Evans 1f8c94e386 Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538) 2023-10-17 23:55:46 +01:00
Martin Evans 2a38808bca - Added threads to context params, replaced all thread args with `uint?`
- Replaced all binaries
2023-10-12 18:49:41 +01:00
Martin Evans 9a0a0ae9fe Removed cloning support 2023-09-30 15:48:26 +01:00
Martin Evans 0d40338692 Fixed out-of-context handling in stateless executor 2023-09-29 23:53:07 +01:00
Martin Evans b306ac23dd Added `Decode` method to `SafeLLamaContextHandle` 2023-09-29 22:24:44 +01:00
Martin Evans 9e958e896b safe handle for batch 2023-09-29 22:18:23 +01:00
Martin Evans ce1fc51163 Added some more native methods 2023-09-29 16:05:19 +01:00
Martin Evans bca55eace0 Initial changes to match the llama.cpp changes 2023-09-29 01:18:21 +01:00
Haiping 10678a83d6
Merge pull request #65 from martindevans/alternative_dependency_loading
CPU Feature Detection
2023-09-17 10:21:37 -05:00
Martin Evans daf09eae64 Skipping tokenization of empty strings (saves allocating an empty array every time) 2023-09-12 01:03:27 +01:00
Martin Evans bba801f4b7 Added a property to get the KV cache size from a context 2023-09-11 00:10:08 +01:00
sa_ddam213 09d8f434f2
Extract LLamaLogLevel, Remove Logger class 2023-09-09 10:25:05 +12:00
Martin Evans d3b8ee988c
Beam Search (#155)
* Added the low level bindings to beam search.
2023-09-07 19:26:51 +01:00
Martin Evans 614ba40948 - Added a `TokensEndsWithAnyString` extension to `IReadOnlyList<int>` which efficiently checks if a set of tokens ends with one of a set of strings.
- Minimal amount of characters converted
   - Allocation free
 - Added `TokensToSpan` to `SafeLlamaModelHandle` which converts as many tokens as possible into a character span
   - Allocation free
2023-09-06 19:44:19 +01:00
Martin Evans 6a842014ac Removed duplicate `llama_sample_classifier_free_guidance` method 2023-09-04 00:48:27 +01:00
Martin Evans 8f58a40fb9 Added Linux dependency loading 2023-09-02 14:21:06 +01:00
Martin Evans dd4957471f Changed paths to match what the GitHub build action produces 2023-09-02 14:10:18 +01:00
Martin Evans 756a1ad0ba Added a new way to load dependencies, performing CPU feature detection 2023-09-02 14:03:37 +01:00
Rinne 4e83e48ad1
Merge pull request #122 from martindevans/gguf
Add GGUF support
2023-09-02 11:54:50 +08:00
Martin Evans bcf06e2652 Added some comments on various native methods 2023-09-02 02:22:11 +01:00
Martin Evans a70c7170dd - Created a higher level `Grammar` class which is immutable and contains a list of grammar rules. This is the main "entry point" to the grammar system.
- Made all the mechanics of grammar parsing (GBNFGrammarParser, ParseState) internal. Just call `Grammar.Parse("whatever")`.
 - Added a `GrammarRule` class which validates elements on construction (this allows constructing grammar without parsing GBNF).
   - It should be impossible for a `GrammarRule` to represent an invalid rule.
2023-08-31 00:02:50 +01:00
Mihai 0bd495276b Add initial tests + fix bugs. Still WIP since the test is failing. 2023-08-30 14:10:56 +03:00
Martin Evans 2022b82947 Added binaries generated by this action: https://github.com/SciSharp/LLamaSharp/actions/runs/6002797872/job/16279896150
Based on this version: 6b73ef1201
2023-08-28 19:48:31 +01:00
Martin Evans 31287b5e6e Rewritten TokenToSpan/TokenToString to better fit the new way it's done in llama.cpp with a few different options:
- Just convert it to a `string`, nice and simple
 - Write the bytes to a `Span<byte>` no allocations
 - Write the chars to a `StringBuilder` potentially no allocations
2023-08-27 00:15:56 +01:00
Martin Evans 0c98ae1955 Passing ctx to `llama_token_nl(_ctx)` 2023-08-27 00:15:55 +01:00
Martin Evans 6ffa28f964 Removed `LLAMA_MAX_DEVICES` (not used) 2023-08-27 00:14:40 +01:00
Martin Evans 2056078aef Initial changes required for GGUF support 2023-08-27 00:14:40 +01:00
Martin Evans cf4754db44 Removed unnecessary parameters from some low level sampler methods 2023-08-26 21:38:24 +01:00
Martin Evans f70525fec2 Two small improvements to the native sampling API:
- Modified `llama_sample_token_mirostat` and `llama_sample_token_mirostat_v2` to take `ref float` instead of as a `float*`. Less pointers is always good.
 - Modified `llama_sample_repetition_penalty` and `llama_sample_frequency_and_presence_penalties` to take pointers instead of arrays. This allows the use non non allocating types (e.g. Span) instead of arrays
 - Modified higher level API to accept `Memory<int>` instead of `int[]`, which can be used to reduce allocations at call sites
2023-08-26 01:25:48 +01:00
Martin Evans a911b77dec Various minor changes, resolving about 100 ReSharper code quality warnings 2023-08-24 23:15:53 +01:00
Martin Evans ebacdb666d - Moved the lower level state get/set methods onto SafeLLamaContextHandle
- Used those methods to add a `Clone` method to SafeLLamaContextHandle
 - Simplified `LLamaContext` by using the new methods
 - Sealed `LLamaContext` and `LLamaEmbedder`
2023-08-24 17:03:27 +01:00
Martin Evans 829f32b27d - Added `Obsolete` attributes to the entire `OldVersion` namespace, so they can be removed in the future
- Minor changes to cleanup some of the compiler warnings
2023-08-24 00:59:32 +01:00
zombieguy 45b01d5a78 Improved type conversion
Type conversion is now done in the property rather than the utils class and uses the System.Convert class to ensure consistency.
2023-08-23 19:36:14 +01:00
Martin Evans 2830e5755c - Applied a lot of minor R# code quality suggestions. Lots of unnecessary imports removed.
- Deleted `NativeInfo` (internal class, not used anywhere)
2023-08-22 23:20:13 +01:00
Martin Evans 4b7d718551 Added native symbol for CFG 2023-08-22 17:11:49 +01:00
Martin Evans 759ae26f36
Merge branch 'master' into grammar_basics 2023-08-22 14:06:57 +01:00
Martin Evans a9e6f21ab8 - Creating and destroying contexts in the stateless executor, saving memory. It now uses zero memory when not inferring!
- Passing encoding in the `IModelParams`, which reduces how often encoding needs to be passed around
2023-08-22 01:30:13 +01:00
Martin Evans ae8ef17a4a - Added various convenience overloads to `LLamaContext.Eval`
- Converted `SafeLLamaContextHandle` to take a `ReadOnlySpan` for Eval, narrower type better represents what's really needed
2023-08-22 01:28:28 +01:00
Martin Evans 64416ca23c - Created a slightly nicer way to create grammar (from `IReadOnlyList<IReadOnlyList<LLamaGrammarElement>>`)
- Integrated grammar into sampling
 - Added a test for the grammar sampling
2023-08-17 19:29:15 +01:00
Martin Evans 0294bb1303 Some of the basics of the grammar API 2023-08-17 19:28:17 +01:00
Rinne 62331852bc
Merge pull request #90 from martindevans/proposal_multi_context
Multi Context
2023-08-17 21:59:05 +08:00
zombieguy 10f88ebd0e
Potential fix for .Net Framework issues (#103)
* Added a bool to sbyte Utils convertor

As an attempt to avoid using any MarshalAs attribute for .Net Framework support this Utils method will take in a bool value and return a 1 for true or 0 for false sbyte.

* Changed all bool "MarshalAs" types to sbytes

Changed all previous BOOL types with "MarshalAs" attributes to SBYTEs and changed all the setters of them to use the Utils.BoolToSignedByte() convertor method.

* Fixed Utils bool convertor & added sbyte to bool

Improved the Utils bool convertor just casting an sbyte value to get rid of the unneeded sbyte array and added an sbyte to bool convertor to convert back the way to a C# bool assuming any positive value above 0 is a bool and no bools are packed in the single byte integer.

* bool to & from sbyte conversions via properties

All 1byte bools are now handled where they "sit", via public properties which perform the conversions to keep all external data able to communicate as it did before.
2023-08-16 00:09:52 +01:00
Martin Evans 6c84accce8 Added `llama_sample_classifier_free_guidance` method from native API 2023-08-13 23:14:53 +01:00
Martin Evans 479ff57853 Renamed `EmbeddingCount` to `EmbeddingSize` 2023-08-13 01:10:09 +01:00
Martin Evans d0a7a8fcd6 - Cleaned up disposal in LLamaContext
- sealed some classes not intended to be extended
2023-08-13 01:10:08 +01:00
Martin Evans f3511e390f WIP demonstrating changes to support multi-context. You can see this in use in `TalkToYourself`, along with notes on what still needs improving.
The biggest single change is renaming `LLamaModel` to `LLamaContext`
2023-08-13 01:10:08 +01:00
Martin Evans d7f971fc22 Improved `NativeApi` file a bit:
- Added some more comments
 - Modified `llama_tokenize` to not allocate
 - Modified `llama_tokenize_native` to take a pointer instead of an array, allowing use with no allocations
 - Removed GgmlInitParams (not used)
2023-08-12 00:45:23 +01:00
Martin Evans 841cf88e3b
Merge pull request #96 from martindevans/minor_quantizer_improvements
Minor quantizer improvements
2023-08-10 18:01:40 +01:00
Martin Evans ce325b49c7 Rewritten comments 2023-08-10 17:00:54 +01:00
sa_ddam213 726987b761 Add native logging output 2023-08-10 23:01:50 +12:00
Martin Evans acd91341e6 Added lots of comments to all the LLamaFtype variants 2023-08-10 02:14:21 +01:00
Martin Evans 2b2d3af26b Moved `Eval` out of `Utils` and into `SafeLLamaContextHandle` 2023-08-07 15:15:34 +01:00
Martin Evans 0e5e00e300 Moved `TokenToString` from Utils into `SafeLLamaContextHandle` (thin wrappers around the same method in `SafeLlamaModelHandle`) 2023-08-07 15:15:34 +01:00
Martin Evans 2d811b2603 - Moved `GetLogits` into `SafeLLamaContextHandle`
- Added disposal check into `SafeLLamaContextHandle`
2023-08-07 15:13:24 +01:00
Martin Evans cd3cf2b77d - Moved tokenization from `Utils.Tokenize` into `SafeLLamaContextHandle.Tokenize`, one less thing in `Utils`.
- Also refactored it to return an `int[]` instead of an `IEnumerable<int>`, solving the "multiple enumeration" problems at the source!
2023-08-07 15:13:24 +01:00
Rinne bfe9cc8961
Merge pull request #78 from SciSharp/rinne-dev
feat: update the llama backends.
2023-08-06 20:59:24 +08:00
Yaohui Liu bb46a990d0
fix: add bug info for native api. 2023-08-06 14:46:23 +08:00
sa_ddam213 372894e1d4 Expose some native classes 2023-08-06 14:44:46 +12:00
SignalRT 348f2c7d72 Update llama.cpp binaries to 5f631c2 and align the context to that version
It solves the problem with netstandard2 (is it really netstandard2 a thing right now?)
Change context to solve problems.

5f631c26794b6371fcf2660e8d0c53494a5575f7
2023-08-05 12:45:34 +02:00
Rinne 8d37abd787
Merge pull request #68 from martindevans/sampling_improvements
Fixed Memory pinning in Sampling API
2023-08-05 08:55:12 +08:00
Martin Evans add3d5528b Removed `MarshalAs` on array 2023-08-03 14:16:41 +01:00
Martin Evans 2245b84906
Update LLamaContextParams.cs 2023-08-02 23:13:07 +01:00
sa_ddam213 3e252c81f6 LLamaContextParams epsilon and tensor split changes 2023-07-28 19:15:19 +12:00
Martin Evans ec49bdd6eb - Most importantly: Fixed issue in `SamplingApi`, `Memory` was pinned, but never unpinned!
- Moved repeated code to convert `LLamaTokenDataArray` into a `LLamaTokenDataArrayNative` into a helper method.
   - Modified all call sites to dispose the `MemoryHandle`
 - Saved one copy of the `List<LLamaTokenData>` into a `LLamaTokenData[]` in `LlamaModel`
2023-07-27 20:45:59 +01:00
Martin Evans 6985d3ab60 Added comments on two properties 2023-07-27 18:58:29 +01:00
Martin Evans c974c8429e Removed leftover `using` 2023-07-25 20:30:10 +01:00
Martin Evans afb9d24f3a Added model `Tokenize` method 2023-07-25 20:29:35 +01:00
Martin Evans 369c915afe Added TokenToString conversion on model handle 2023-07-25 16:55:04 +01:00
Martin Evans b721072aa5 Exposed some extra model properties on safe handle 2023-07-25 16:41:17 +01:00
Martin Evans 44b1e93609 Moved LoRA loading into `SafeLlamaModelHandle` 2023-07-25 16:35:24 +01:00
Martin Evans c95b14d8b3 - Fixed null check
- Additional comments
2023-07-25 16:23:25 +01:00
Martin Evans f16aa58e12 Updated to use the new loading system in llama (llama_state). This new system has split model weights and contexts into two separate things, allowing one set of weights to be shared between many contexts.
This change _only_ implements the low level API and makes no effort to update the LlamaSharp higher level abstraction.

It is built upon llama `b3f138d`, necessary DLLs are **not** included in this commit.
2023-07-25 01:18:12 +01:00
Rinne c5e8b3eba2
Merge pull request #56 from martindevans/memory_mapped_save_loading_and_saving
Memory Mapped LoadState/SaveState
2023-07-24 22:49:00 +08:00
Rinne d17fa991cc
Merge pull request #53 from martindevans/xml_docs_fixes
XML docs fixes
2023-07-24 22:31:51 +08:00
Rinne 1b0523f630
Merge branch 'master' into master 2023-07-22 23:27:50 +08:00
Martin Evans 4d72420a04 Replaced `SaveState` and `LoadState` implementations. These new implementations map the file into memory and then pass the pointer directly into the native API. This improves things in two ways:
- A C# array cannot exceed 2,147,483,591 bytes. In my own use of LlamaSharp I encountered this limit.
 - This saves an extra copy of the entire state data into a C# `byte[]`, so it should be faster.

This does _not_ fix some other places where `GetStateData` is used. I'll look at those in a separate PR.
2023-07-21 18:54:31 +01:00
Martin Evans 2e76b79af6 Various minor XML docs fixes 2023-07-20 16:07:53 +01:00
SignalRT 56a37a0d7d Update to lates llama.cpp
Adapt the interface change in llama_backend_init
2023-07-15 11:42:19 +02:00
unknown dba866ffcf Update API method name 2023-07-13 22:39:26 -07:00
Yaohui Liu 1062fe1a7e
feat: upgrade the native libraries. 2023-06-21 15:21:27 +08:00
Yaohui Liu 9850417a12
feat: update quantize native params. 2023-06-20 23:32:58 +08:00
Yaohui Liu 3bf74ec9b9
feat: add chat session for refactored code. 2023-06-12 02:47:25 +08:00
Yaohui Liu 264fb9a706
refactor: LLamaModel and LLamaExecutor. 2023-06-10 18:37:58 +08:00
Yaohui Liu 3a62f087fe
fix: encoding error when using other languages. 2023-06-03 18:51:20 +08:00
Yaohui Liu 18c2ff2395
refactor: instruct mode and examples. 2023-05-21 20:36:49 +08:00
Yaohui Liu 55d5a8ae51
fix: quantization error with fp16. 2023-05-20 23:51:22 +08:00
Yaohui Liu 19979f664a
feat: support loading and saving state. 2023-05-20 14:01:20 +08:00
Yaohui Liu 00d91cf99e
refactor: some parts of code of LLamaModel. 2023-05-18 03:59:55 +08:00
Yaohui Liu 1fca06dc7f
fix: n_gpu_layers miss in llama context. 2023-05-17 04:22:54 +08:00
Yaohui Liu 4314f64b9c
feat: add check for backend package. 2023-05-17 03:40:45 +08:00
Yaohui Liu 6ffcb5306b
refactor: use official api of quantization instead. 2023-05-13 15:02:19 +08:00
Yaohui Liu 0958bbac2c
feat: add get-embedding api to LLamaModel. 2023-05-13 02:08:03 +08:00
Yaohui Liu 33067f990f
feat: run quantization in csharp. 2023-05-11 17:38:28 +08:00
Yaohui Liu 118d410d52
build: revise build informations. 2023-05-11 13:57:57 +08:00
Yaohui Liu 856d6549de build: add linux support. 2023-05-11 04:20:56 +08:00
Yaohui Liu 02524ae4eb
build: add package informations. 2023-05-11 04:07:02 +08:00
Yaohui Liu d6a7997e46
feat: add gpt model. 2023-05-10 20:48:16 +08:00
Yaohui Liu 5a79edeb51
feat: add the framework and basic usages. 2023-05-10 02:13:41 +08:00