Martin Evans
1970023ef4
Merge pull request #292 from martindevans/dotnet8.0
...
dotnet8.0
2023-11-17 01:30:00 +00:00
Martin Evans
89fef05362
This commit ( 5fe721bdbe
) accidentally removed a load of stuff that it shouldn't. Fixed that.
...
Originally from these PRs:
- https://github.com/SciSharp/LLamaSharp/pull/263
- https://github.com/SciSharp/LLamaSharp/pull/259
2023-11-15 01:36:33 +00:00
Martin Evans
e9f5dbba89
Processing AVX512 branch on all dotnet versions
2023-11-15 01:18:22 +00:00
Martin Evans
e850115b5f
Added dotnet8.0 as a build target
2023-11-15 01:09:37 +00:00
Martin Evans
b44e780b0f
Merge pull request #281 from martindevans/NativeLibraryConfig_improvements
...
CPU Feature Detection 2
2023-11-13 23:26:31 +00:00
Martin Evans
e3468d04f0
Merge pull request #277 from martindevans/feature/min_p
...
MinP Sampler
2023-11-13 02:15:52 +00:00
Martin Evans
a9d1f6cb47
- Renamed `NativeLibraryConfig.Default` to `NativeLibraryConfig.Instance`. It's not default any more as soon as you call `WithX`!
...
- using `Lazy<T>` to initialize it automatically.
- Added in `AVX512` support for all dotnet versions (but not autodetected).
- Added in AVX version auto detection.
2023-11-13 02:05:07 +00:00
Rinne
da6718c387
docs: adjust some descriptions.
2023-11-12 15:26:45 +08:00
Yaohui Liu
d7675f7936
Merge branch 'master' of github.com:AsakusaRinne/LLamaSharp into cuda_detection
2023-11-12 12:10:31 +08:00
Martin Evans
d743516070
- Added support for the MinP sampler
...
- Cleaned up comments in implementations of `IInferenceParams`
- Removed default values for all parameters in `LLamaContext.Sample` - they're never used and probably _shouldn't_ ever be used
2023-11-12 00:05:18 +00:00
Yaohui Liu
cb5fb210b1
feat: optimize apis for cuda feature detection.
2023-11-12 04:30:08 +08:00
SignalRT
97006a214f
Merge remote-tracking branch 'upstream/master' into RuntimeDetection
2023-11-11 20:31:53 +01:00
Yaohui Liu
bbbfbd20b5
fix: cannot load library under some conditions.
2023-11-12 01:55:09 +08:00
Martin Evans
31244ae691
Merge branch 'master' into YaRN_scaling_parameters
2023-11-11 16:27:42 +00:00
SignalRT
7691f83516
Test build and nuget packages
2023-11-11 13:12:07 +01:00
Yaohui Liu
d03e1dbe30
feat: support cuda feature detection.
2023-11-11 19:44:58 +08:00
SignalRT
5fe721bdbe
Revert "Merge branch 'pr/268' into RuntimeDetection"
...
This reverts commit 091b8d58b3
, reversing
changes made to 9b2ca9cf8e
.
2023-11-09 22:13:18 +01:00
SignalRT
200011e186
Revert "Merge feat: add detection template for cuda and avx. #268"
...
This reverts commit b4b3ea9d99
.
2023-11-09 22:12:58 +01:00
SignalRT
b4b3ea9d99
Merge feat: add detection template for cuda and avx. #268
...
Just merge cuda and avx detection and change layout.
2023-11-08 23:20:12 +01:00
Yaohui Liu
b893c6f609
feat: add detection template for cuda and avx.
2023-11-09 03:27:21 +08:00
Martin Evans
db1bc741b0
Modified `ContextSize` in parameters to be nullable. A null value means autodetect from the model.
2023-11-07 19:41:44 +00:00
Martin Evans
04ee64a6be
Exposed YaRN scaling parameters in IContextParams
2023-11-06 21:59:18 +00:00
SignalRT
46fb472d42
Align with llama.cpp b1488
2023-11-05 16:16:29 +01:00
Martin Evans
a03fdc4818
Using a reference to an array instead of pointer arithmetic. This means it will benefit from bounds checking on the array.
2023-11-04 16:17:32 +00:00
Martin Evans
08c29d52c5
Slightly refactored `SafeLLamaGrammarHandle.Create` to solve CodeQL warning about pointer arithmetic.
2023-11-04 16:02:33 +00:00
Martin Evans
b6d242193e
Debugging slowdown by removing some things:
...
- Removed all `record struct` uses in native code
- Removed usage of `readonly` in native structs
Minor fix:
- Added sequential layout to `LLamaModelQuantizeParams`
2023-10-30 21:35:46 +00:00
Martin Evans
51c292ebd8
Added a safe method for `llama_get_logits_ith`
2023-10-28 23:15:45 +01:00
Martin Evans
7e3cde4c13
Moved helper methods into `LLamaBatchSafeHandle`
2023-10-28 22:09:09 +01:00
Martin Evans
c7fdb9712c
Added binaries, built from ` 6961c4bd0b
`
2023-10-28 21:32:22 +01:00
Martin Evans
e81b3023d5
Rewritten sampling API to be accessed through the `LLamaTokenDataArray` object
2023-10-28 21:32:21 +01:00
Martin Evans
3c5547b2b7
Reduced some uses of `NativeApi` in `BatchedDecoding` by adding some helper methods
2023-10-28 21:32:21 +01:00
Martin Evans
a024d2242e
It works!
...
had to update binary to `b1426`
2023-10-28 21:32:21 +01:00
Martin Evans
8cd81251b4
initial setup
2023-10-28 21:32:21 +01:00
Martin Evans
321d0b58c4
Merge pull request #202 from martindevans/multi_gpu
...
Multi GPU
2023-10-26 14:40:49 +01:00
Martin Evans
a03fe003de
Fixed decoding of text "accumulating" over time (never properly clearing buffer)
2023-10-23 16:42:38 +01:00
Martin Evans
51d4411a58
Added two new classes for detokenization tasks:
...
- `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected.
- `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens.
Added tests for these classes and updated StatelessExecutor to use them.
Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).
2023-10-23 00:33:50 +01:00
Martin Evans
efdf3d630c
- Removed all `TokenToString` methods (it's never correct to use them, because sometimes one single character may be represented by multiple tokens).
...
- Built a new (hacky) `Detokenize` method which handles this
2023-10-22 21:43:36 +01:00
Martin Evans
1d0620e634
Created a test that "roundtrips" strings through tokenization. This reveals some flaws with certain characters
2023-10-22 15:28:36 +01:00
Martin Evans
04acbf8c42
Improved doc comment on `tensor_split`
2023-10-20 14:13:46 +01:00
Martin Evans
15db194c17
Added multi GPU support
2023-10-20 13:43:46 +01:00
Martin Evans
e89ca5cc17
Fixed a few minor warnings
2023-10-19 00:43:50 +01:00
Martin Evans
9daf586ba8
Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc)
2023-10-19 00:26:30 +01:00
Martin Evans
1f8c94e386
Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538 )
2023-10-17 23:55:46 +01:00
Martin Evans
2a38808bca
- Added threads to context params, replaced all thread args with `uint?`
...
- Replaced all binaries
2023-10-12 18:49:41 +01:00
Martin Evans
9a0a0ae9fe
Removed cloning support
2023-09-30 15:48:26 +01:00
Martin Evans
0d40338692
Fixed out-of-context handling in stateless executor
2023-09-29 23:53:07 +01:00
Martin Evans
b306ac23dd
Added `Decode` method to `SafeLLamaContextHandle`
2023-09-29 22:24:44 +01:00
Martin Evans
9e958e896b
safe handle for batch
2023-09-29 22:18:23 +01:00
Martin Evans
ce1fc51163
Added some more native methods
2023-09-29 16:05:19 +01:00
Martin Evans
bca55eace0
Initial changes to match the llama.cpp changes
2023-09-29 01:18:21 +01:00
Haiping
10678a83d6
Merge pull request #65 from martindevans/alternative_dependency_loading
...
CPU Feature Detection
2023-09-17 10:21:37 -05:00
Martin Evans
daf09eae64
Skipping tokenization of empty strings (saves allocating an empty array every time)
2023-09-12 01:03:27 +01:00
Martin Evans
bba801f4b7
Added a property to get the KV cache size from a context
2023-09-11 00:10:08 +01:00
sa_ddam213
09d8f434f2
Extract LLamaLogLevel, Remove Logger class
2023-09-09 10:25:05 +12:00
Martin Evans
d3b8ee988c
Beam Search ( #155 )
...
* Added the low level bindings to beam search.
2023-09-07 19:26:51 +01:00
Martin Evans
614ba40948
- Added a `TokensEndsWithAnyString` extension to `IReadOnlyList<int>` which efficiently checks if a set of tokens ends with one of a set of strings.
...
- Minimal amount of characters converted
- Allocation free
- Added `TokensToSpan` to `SafeLlamaModelHandle` which converts as many tokens as possible into a character span
- Allocation free
2023-09-06 19:44:19 +01:00
Martin Evans
6a842014ac
Removed duplicate `llama_sample_classifier_free_guidance` method
2023-09-04 00:48:27 +01:00
Martin Evans
8f58a40fb9
Added Linux dependency loading
2023-09-02 14:21:06 +01:00
Martin Evans
dd4957471f
Changed paths to match what the GitHub build action produces
2023-09-02 14:10:18 +01:00
Martin Evans
756a1ad0ba
Added a new way to load dependencies, performing CPU feature detection
2023-09-02 14:03:37 +01:00
Rinne
4e83e48ad1
Merge pull request #122 from martindevans/gguf
...
Add GGUF support
2023-09-02 11:54:50 +08:00
Martin Evans
bcf06e2652
Added some comments on various native methods
2023-09-02 02:22:11 +01:00
Martin Evans
a70c7170dd
- Created a higher level `Grammar` class which is immutable and contains a list of grammar rules. This is the main "entry point" to the grammar system.
...
- Made all the mechanics of grammar parsing (GBNFGrammarParser, ParseState) internal. Just call `Grammar.Parse("whatever")`.
- Added a `GrammarRule` class which validates elements on construction (this allows constructing grammar without parsing GBNF).
- It should be impossible for a `GrammarRule` to represent an invalid rule.
2023-08-31 00:02:50 +01:00
Mihai
0bd495276b
Add initial tests + fix bugs. Still WIP since the test is failing.
2023-08-30 14:10:56 +03:00
Martin Evans
2022b82947
Added binaries generated by this action: https://github.com/SciSharp/LLamaSharp/actions/runs/6002797872/job/16279896150
...
Based on this version: 6b73ef1201
2023-08-28 19:48:31 +01:00
Martin Evans
31287b5e6e
Rewritten TokenToSpan/TokenToString to better fit the new way it's done in llama.cpp with a few different options:
...
- Just convert it to a `string`, nice and simple
- Write the bytes to a `Span<byte>` no allocations
- Write the chars to a `StringBuilder` potentially no allocations
2023-08-27 00:15:56 +01:00
Martin Evans
0c98ae1955
Passing ctx to `llama_token_nl(_ctx)`
2023-08-27 00:15:55 +01:00
Martin Evans
6ffa28f964
Removed `LLAMA_MAX_DEVICES` (not used)
2023-08-27 00:14:40 +01:00
Martin Evans
2056078aef
Initial changes required for GGUF support
2023-08-27 00:14:40 +01:00
Martin Evans
cf4754db44
Removed unnecessary parameters from some low level sampler methods
2023-08-26 21:38:24 +01:00
Martin Evans
f70525fec2
Two small improvements to the native sampling API:
...
- Modified `llama_sample_token_mirostat` and `llama_sample_token_mirostat_v2` to take `ref float` instead of as a `float*`. Less pointers is always good.
- Modified `llama_sample_repetition_penalty` and `llama_sample_frequency_and_presence_penalties` to take pointers instead of arrays. This allows the use non non allocating types (e.g. Span) instead of arrays
- Modified higher level API to accept `Memory<int>` instead of `int[]`, which can be used to reduce allocations at call sites
2023-08-26 01:25:48 +01:00
Martin Evans
a911b77dec
Various minor changes, resolving about 100 ReSharper code quality warnings
2023-08-24 23:15:53 +01:00
Martin Evans
ebacdb666d
- Moved the lower level state get/set methods onto SafeLLamaContextHandle
...
- Used those methods to add a `Clone` method to SafeLLamaContextHandle
- Simplified `LLamaContext` by using the new methods
- Sealed `LLamaContext` and `LLamaEmbedder`
2023-08-24 17:03:27 +01:00
Martin Evans
829f32b27d
- Added `Obsolete` attributes to the entire `OldVersion` namespace, so they can be removed in the future
...
- Minor changes to cleanup some of the compiler warnings
2023-08-24 00:59:32 +01:00
zombieguy
45b01d5a78
Improved type conversion
...
Type conversion is now done in the property rather than the utils class and uses the System.Convert class to ensure consistency.
2023-08-23 19:36:14 +01:00
Martin Evans
2830e5755c
- Applied a lot of minor R# code quality suggestions. Lots of unnecessary imports removed.
...
- Deleted `NativeInfo` (internal class, not used anywhere)
2023-08-22 23:20:13 +01:00
Martin Evans
4b7d718551
Added native symbol for CFG
2023-08-22 17:11:49 +01:00
Martin Evans
759ae26f36
Merge branch 'master' into grammar_basics
2023-08-22 14:06:57 +01:00
Martin Evans
a9e6f21ab8
- Creating and destroying contexts in the stateless executor, saving memory. It now uses zero memory when not inferring!
...
- Passing encoding in the `IModelParams`, which reduces how often encoding needs to be passed around
2023-08-22 01:30:13 +01:00
Martin Evans
ae8ef17a4a
- Added various convenience overloads to `LLamaContext.Eval`
...
- Converted `SafeLLamaContextHandle` to take a `ReadOnlySpan` for Eval, narrower type better represents what's really needed
2023-08-22 01:28:28 +01:00
Martin Evans
64416ca23c
- Created a slightly nicer way to create grammar (from `IReadOnlyList<IReadOnlyList<LLamaGrammarElement>>`)
...
- Integrated grammar into sampling
- Added a test for the grammar sampling
2023-08-17 19:29:15 +01:00
Martin Evans
0294bb1303
Some of the basics of the grammar API
2023-08-17 19:28:17 +01:00
Rinne
62331852bc
Merge pull request #90 from martindevans/proposal_multi_context
...
Multi Context
2023-08-17 21:59:05 +08:00
zombieguy
10f88ebd0e
Potential fix for .Net Framework issues ( #103 )
...
* Added a bool to sbyte Utils convertor
As an attempt to avoid using any MarshalAs attribute for .Net Framework support this Utils method will take in a bool value and return a 1 for true or 0 for false sbyte.
* Changed all bool "MarshalAs" types to sbytes
Changed all previous BOOL types with "MarshalAs" attributes to SBYTEs and changed all the setters of them to use the Utils.BoolToSignedByte() convertor method.
* Fixed Utils bool convertor & added sbyte to bool
Improved the Utils bool convertor just casting an sbyte value to get rid of the unneeded sbyte array and added an sbyte to bool convertor to convert back the way to a C# bool assuming any positive value above 0 is a bool and no bools are packed in the single byte integer.
* bool to & from sbyte conversions via properties
All 1byte bools are now handled where they "sit", via public properties which perform the conversions to keep all external data able to communicate as it did before.
2023-08-16 00:09:52 +01:00
Martin Evans
6c84accce8
Added `llama_sample_classifier_free_guidance` method from native API
2023-08-13 23:14:53 +01:00
Martin Evans
479ff57853
Renamed `EmbeddingCount` to `EmbeddingSize`
2023-08-13 01:10:09 +01:00
Martin Evans
d0a7a8fcd6
- Cleaned up disposal in LLamaContext
...
- sealed some classes not intended to be extended
2023-08-13 01:10:08 +01:00
Martin Evans
f3511e390f
WIP demonstrating changes to support multi-context. You can see this in use in `TalkToYourself`, along with notes on what still needs improving.
...
The biggest single change is renaming `LLamaModel` to `LLamaContext`
2023-08-13 01:10:08 +01:00
Martin Evans
d7f971fc22
Improved `NativeApi` file a bit:
...
- Added some more comments
- Modified `llama_tokenize` to not allocate
- Modified `llama_tokenize_native` to take a pointer instead of an array, allowing use with no allocations
- Removed GgmlInitParams (not used)
2023-08-12 00:45:23 +01:00
Martin Evans
841cf88e3b
Merge pull request #96 from martindevans/minor_quantizer_improvements
...
Minor quantizer improvements
2023-08-10 18:01:40 +01:00
Martin Evans
ce325b49c7
Rewritten comments
2023-08-10 17:00:54 +01:00
sa_ddam213
726987b761
Add native logging output
2023-08-10 23:01:50 +12:00
Martin Evans
acd91341e6
Added lots of comments to all the LLamaFtype variants
2023-08-10 02:14:21 +01:00
Martin Evans
2b2d3af26b
Moved `Eval` out of `Utils` and into `SafeLLamaContextHandle`
2023-08-07 15:15:34 +01:00
Martin Evans
0e5e00e300
Moved `TokenToString` from Utils into `SafeLLamaContextHandle` (thin wrappers around the same method in `SafeLlamaModelHandle`)
2023-08-07 15:15:34 +01:00
Martin Evans
2d811b2603
- Moved `GetLogits` into `SafeLLamaContextHandle`
...
- Added disposal check into `SafeLLamaContextHandle`
2023-08-07 15:13:24 +01:00
Martin Evans
cd3cf2b77d
- Moved tokenization from `Utils.Tokenize` into `SafeLLamaContextHandle.Tokenize`, one less thing in `Utils`.
...
- Also refactored it to return an `int[]` instead of an `IEnumerable<int>`, solving the "multiple enumeration" problems at the source!
2023-08-07 15:13:24 +01:00
Rinne
bfe9cc8961
Merge pull request #78 from SciSharp/rinne-dev
...
feat: update the llama backends.
2023-08-06 20:59:24 +08:00
Yaohui Liu
bb46a990d0
fix: add bug info for native api.
2023-08-06 14:46:23 +08:00
sa_ddam213
372894e1d4
Expose some native classes
2023-08-06 14:44:46 +12:00
SignalRT
348f2c7d72
Update llama.cpp binaries to 5f631c2 and align the context to that version
...
It solves the problem with netstandard2 (is it really netstandard2 a thing right now?)
Change context to solve problems.
5f631c26794b6371fcf2660e8d0c53494a5575f7
2023-08-05 12:45:34 +02:00
Rinne
8d37abd787
Merge pull request #68 from martindevans/sampling_improvements
...
Fixed Memory pinning in Sampling API
2023-08-05 08:55:12 +08:00
Martin Evans
add3d5528b
Removed `MarshalAs` on array
2023-08-03 14:16:41 +01:00
Martin Evans
2245b84906
Update LLamaContextParams.cs
2023-08-02 23:13:07 +01:00
sa_ddam213
3e252c81f6
LLamaContextParams epsilon and tensor split changes
2023-07-28 19:15:19 +12:00
Martin Evans
ec49bdd6eb
- Most importantly: Fixed issue in `SamplingApi`, `Memory` was pinned, but never unpinned!
...
- Moved repeated code to convert `LLamaTokenDataArray` into a `LLamaTokenDataArrayNative` into a helper method.
- Modified all call sites to dispose the `MemoryHandle`
- Saved one copy of the `List<LLamaTokenData>` into a `LLamaTokenData[]` in `LlamaModel`
2023-07-27 20:45:59 +01:00
Martin Evans
6985d3ab60
Added comments on two properties
2023-07-27 18:58:29 +01:00
Martin Evans
c974c8429e
Removed leftover `using`
2023-07-25 20:30:10 +01:00
Martin Evans
afb9d24f3a
Added model `Tokenize` method
2023-07-25 20:29:35 +01:00
Martin Evans
369c915afe
Added TokenToString conversion on model handle
2023-07-25 16:55:04 +01:00
Martin Evans
b721072aa5
Exposed some extra model properties on safe handle
2023-07-25 16:41:17 +01:00
Martin Evans
44b1e93609
Moved LoRA loading into `SafeLlamaModelHandle`
2023-07-25 16:35:24 +01:00
Martin Evans
c95b14d8b3
- Fixed null check
...
- Additional comments
2023-07-25 16:23:25 +01:00
Martin Evans
f16aa58e12
Updated to use the new loading system in llama (llama_state). This new system has split model weights and contexts into two separate things, allowing one set of weights to be shared between many contexts.
...
This change _only_ implements the low level API and makes no effort to update the LlamaSharp higher level abstraction.
It is built upon llama `b3f138d`, necessary DLLs are **not** included in this commit.
2023-07-25 01:18:12 +01:00
Rinne
c5e8b3eba2
Merge pull request #56 from martindevans/memory_mapped_save_loading_and_saving
...
Memory Mapped LoadState/SaveState
2023-07-24 22:49:00 +08:00
Rinne
d17fa991cc
Merge pull request #53 from martindevans/xml_docs_fixes
...
XML docs fixes
2023-07-24 22:31:51 +08:00
Rinne
1b0523f630
Merge branch 'master' into master
2023-07-22 23:27:50 +08:00
Martin Evans
4d72420a04
Replaced `SaveState` and `LoadState` implementations. These new implementations map the file into memory and then pass the pointer directly into the native API. This improves things in two ways:
...
- A C# array cannot exceed 2,147,483,591 bytes. In my own use of LlamaSharp I encountered this limit.
- This saves an extra copy of the entire state data into a C# `byte[]`, so it should be faster.
This does _not_ fix some other places where `GetStateData` is used. I'll look at those in a separate PR.
2023-07-21 18:54:31 +01:00
Martin Evans
2e76b79af6
Various minor XML docs fixes
2023-07-20 16:07:53 +01:00
SignalRT
56a37a0d7d
Update to lates llama.cpp
...
Adapt the interface change in llama_backend_init
2023-07-15 11:42:19 +02:00
unknown
dba866ffcf
Update API method name
2023-07-13 22:39:26 -07:00
Yaohui Liu
1062fe1a7e
feat: upgrade the native libraries.
2023-06-21 15:21:27 +08:00
Yaohui Liu
9850417a12
feat: update quantize native params.
2023-06-20 23:32:58 +08:00
Yaohui Liu
3bf74ec9b9
feat: add chat session for refactored code.
2023-06-12 02:47:25 +08:00
Yaohui Liu
264fb9a706
refactor: LLamaModel and LLamaExecutor.
2023-06-10 18:37:58 +08:00
Yaohui Liu
3a62f087fe
fix: encoding error when using other languages.
2023-06-03 18:51:20 +08:00
Yaohui Liu
18c2ff2395
refactor: instruct mode and examples.
2023-05-21 20:36:49 +08:00
Yaohui Liu
55d5a8ae51
fix: quantization error with fp16.
2023-05-20 23:51:22 +08:00
Yaohui Liu
19979f664a
feat: support loading and saving state.
2023-05-20 14:01:20 +08:00
Yaohui Liu
00d91cf99e
refactor: some parts of code of LLamaModel.
2023-05-18 03:59:55 +08:00
Yaohui Liu
1fca06dc7f
fix: n_gpu_layers miss in llama context.
2023-05-17 04:22:54 +08:00
Yaohui Liu
4314f64b9c
feat: add check for backend package.
2023-05-17 03:40:45 +08:00
Yaohui Liu
6ffcb5306b
refactor: use official api of quantization instead.
2023-05-13 15:02:19 +08:00
Yaohui Liu
0958bbac2c
feat: add get-embedding api to LLamaModel.
2023-05-13 02:08:03 +08:00
Yaohui Liu
33067f990f
feat: run quantization in csharp.
2023-05-11 17:38:28 +08:00
Yaohui Liu
118d410d52
build: revise build informations.
2023-05-11 13:57:57 +08:00
Yaohui Liu
856d6549de
build: add linux support.
2023-05-11 04:20:56 +08:00
Yaohui Liu
02524ae4eb
build: add package informations.
2023-05-11 04:07:02 +08:00
Yaohui Liu
d6a7997e46
feat: add gpt model.
2023-05-10 20:48:16 +08:00
Yaohui Liu
5a79edeb51
feat: add the framework and basic usages.
2023-05-10 02:13:41 +08:00