Commit Graph

240 Commits

Author SHA1 Message Date
ksanchez 46a9d603f4 Add method to get BOS token. 2024-05-02 23:29:33 -06:00
Rinne 495177fd0f fix: typos. 2024-04-29 18:19:20 +08:00
Martin Evans 377ebf3664 - Added `LoadFromFileAsync` method for `LLavaWeights`
- Fixed checking for invalid handles in `clip_model_load`
2024-04-27 23:31:07 +01:00
Martin Evans 00df7c1516 - Added `LLamaWeights.LoadFromFileAsync`.
- Async loading supports cancellation through a `CancellationToken`. If loading is cancelled an `OperationCanceledException` is thrown.  If it fails for another reason a `LoadWeightsFailedException` is thrown.
 - Updated examples to use `LoadFromFileAsync`
2024-04-27 02:52:41 +01:00
Martin Evans 18586cc43b
Merge pull request #696 from martindevans/safe_handle_constructor_refactor
Removed Unnecessary Constructor From Safe Handles
2024-04-26 16:14:42 +01:00
Martin Evans e9fd7f96e0
Merge pull request #691 from martindevans/empty_batch_check
Empty batch check
2024-04-26 16:14:28 +01:00
Martin Evans a2f8573831
Merge pull request #698 from martindevans/slightly_safer_quantize_params
Slightly Safer Quantize Params
2024-04-26 13:53:55 +01:00
Martin Evans d4f793a7eb Using `is` check instead of `== null` 2024-04-26 13:53:04 +01:00
Martin Evans ecb359c9e7
- Using more specific `LoadWeightsFailedException` when a llava model fails to load (#697)
- Passing model path, instead of a message, to `LoadWeightsFailedException` constructor
2024-04-26 13:39:09 +01:00
Martin Evans 58ec798bff Modified `llama_model_quantize` to accept argument by `ref` instead of pointer. 2024-04-26 01:35:13 +01:00
Martin Evans 54dab273cd - Removed unnecessary constructors from safe handles
- Returning SafeLLamaGrammarHandle directly from `llama_grammar_init` and `llama_grammar_copy`
2024-04-26 01:03:26 +01:00
Martin Evans 25812762c9 Added checks in `Decode` to skip doing anything if the batch is empty. 2024-04-24 14:54:02 +01:00
Martin Evans 3c76440957 - Added tests for generating embeddings with generative model and embedding model
- Rewritten native API methods for embeddings to return pointers - null is a valid value for these methods to return so `Span` is not appropriate
2024-04-19 16:30:32 +01:00
Martin Evans c325ac9127
April 2024 Binary Update (#662)
* Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`.

 - Added all new functions.
 - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs`
 - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here.
 - Changed all token properties to return nullable tokens, to handle some models not having some tokens.
 - Fixed `DefaultSamplingPipeline` to handle no newline token in some models.

* Moved native methods to more specific locations.

 - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already.
 - Checking that GPU layer count is zero if GPU offload is not supported.
 - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.

* Removed exception if `GpuLayerCount > 0` when GPU is not supported.

* - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle`
 - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext`
 - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`

* Added update and defrag methods for KV cache in `SafeLLamaContextHandle`

* Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`

* Passing the sequence ID when saving a single sequence state
2024-04-16 23:19:47 +01:00
Martin Evans 58107bb5b9
Logging interceptor (#649)
* - Added `NativeLogConfig` which allows overriding the llama.cpp log callback
 - Delaying binding of this into llama.cpp until after `NativeLibraryConfig` has loaded

* Using the log callback to show loading log messages during loading.

* Registering log callbacks before any calls to llama.cpp except `llama_empty_call`, this is specifically selected to be a method that does nothing and is just there for triggering DLL loading.

* - Removed much of the complexity of logging from `NativeApi.Load`. It always call whatever log callbacks you have registered.
 - Removed alternative path for `ILogger` in NativeLibraryConfig, instead it redirects to wrapping it in a delegate.

* Saving a GC handle to keep the log callback alive

* Removed prefix, logger should already do that.

* Buffering up messages until a newline is encountered before passing log message to ILogger.

* - Added trailing `\n` to log messages from loading.
 - Using `ThreadLocal<StringBuilder>` to ensure messages from separate threads don't get mixed together.
2024-04-05 16:42:27 +01:00
evolcano 353412923f Merge branch 'master' of https://github.com/SciSharp/LLamaSharp 2024-03-30 10:55:42 +08:00
evolcano 9d091c0316 Add path to find llama.dll for MAUI
This commit is originally made by lcarrere in https://github.com/SciSharp/LLamaSharp/issues/180 .

I have confirmed this modification is OK in my windows 11 laptop, add make this commit according require of AsakusaRinne.
2024-03-30 10:54:44 +08:00
SignalRT 2d9a114f66 Include comments and include some checks 2024-03-26 23:19:32 +01:00
SignalRT e8732efadd Example InteractiveExecutor
Add an Example and modifications to the interactive executor to enable Llava Models.

Just a preview / demo
2024-03-26 23:19:32 +01:00
Martin Evans e2705be6c8
Fixed off by one error in LLamaBatch sampling position (#626) 2024-03-25 22:56:26 +00:00
Martin Evans 91d72e7465
Keeping track of positions where logits will be generated in a batch and what sequence those logits are associated with. (#624) 2024-03-25 21:02:48 +00:00
Martin Evans 024787225b
`SetDllImportResolver` based loading (#603)
- Modified library loading to be based on `SetDllImportResolver`. This replaces the built in loading system and ensures there can't be two libraries loaded at once.
 - llava and llama are loaded separately, as needed.
 - All the previous loading logic is still used, within the `SetDllImportResolver`
 - Split out CUDA, AVX and MacOS paths to separate helper methods.
 - `Description` now specifies if it is for `llama` or `llava`
2024-03-17 19:54:20 +00:00
jlsantiago 3b2836eac4
Llava api (#563)
* Add llava_binaries, update all binaries to make the test

* Llava API + LlavaTest

Preliminary

* First prototype of Load + Unit Test

* Temporary run test con branch LlavaAPI

* Disable Embed test to review the rest of the test

* Restore Embedding test

* Use BatchThread to eval image embeddings

Test Threads default value to ensure it doesn´t produce problems.

* Rename test file

* Update action versions

* Test only one method, no release embeddings

* Revert "Test only one method, no release embeddings"

This reverts commit 264e176dccc9cd0be318b800ae5e102a4635d01c.

* Correct API call

* Only test llava related functionality

* Cuda and Cblast binaries

* Restore build policy

* Changes related with code review

* Add SafeHandles

* Set overwrite to upload-artifact@v4

* Revert to upload-artifact@v3

* revert to upload-artifact@v3
2024-03-13 22:10:44 +00:00
Martin Evans ce4de7d607
llama_decode lock (#595)
* Added a lock object into `SafeLlamaModelHandle` which all calls to `llama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp.

* Modified the lock to be global over _all_ inferences. This seems to be necessary (at least with the CUDA backend).
2024-03-13 00:33:16 +00:00
Clovis Henrique Ribeiro d0f79814e9
Added conditional compilation code to progress_callback (in LlamaModelParams struct) so the struct plays nice with legacy NET Framework 4.8 (#593) 2024-03-11 14:36:50 +00:00
Martin Evans f0b0bbcbb7
Mutable Logits (#586)
Modified LLamaBatch to not share tokens with other sequences if logits is true. This ensures that the logit span at the end in used by exactly one sequence - therefore it's safe to mutate. This removes the need for copying _very_ large arrays (vocab size) and simplifies sampling pipelines.
2024-03-10 13:56:11 +00:00
Martin Evans a8ba9f05b3
March Binary Update (#565)
* Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586)

* Added abort callback

* Added properties to get/set thread count on `LLamaContext`

* Fixed LLamaLogLevel numbering
2024-03-06 15:19:42 +00:00
Martin Evans 8ac1634233
Removed `llama_eval`. It is going to be completely removed in the next version of llama.cpp (#553) 2024-02-28 21:41:39 +00:00
Martin Evans f0e7e7cc0a
Removed `SamplingApi`. it has been marked as Obsolete for a while, replaced by instance methods on `LLamaTokenDataArray` (#552) 2024-02-28 19:30:53 +00:00
Martin Evans 7d84625a67
Classifier Free Guidance (#536)
* Added a `Guidance` method to `LLamaTokenDataArray` which applies classifier free guidance

* Factored out a safer `llama_sample_apply_guidance` method based on spans

* Created a guided sampling demo using the batched executor

* fixed comment, "classifier free" not "context free"

* Rebased onto master and fixed breakage due to changes in `BaseSamplingPipeline`

* Asking user for guidance weight

* Progress bar in batched fork demo

* Improved fork example (using tree display)

* Added proper disposal of resources in batched examples

* Added some more comments in BatchedExecutorGuidance
2024-02-26 15:41:57 +00:00
Scott W Harden a6394001a1
NativeLibraryConfig: WithLogs(LLamaLogLevel) (#529)
Adds a NativeLibraryConfig.WithLogs() overload to let the user indicate the log level (with "info" as the default)
2024-02-21 23:51:09 +00:00
Martin Evans c7d0dc915a Assorted small changes to clean up some code warnings 2024-02-17 23:07:10 +00:00
Martin Evans e9d9042576 Added `Divide` to `KvAccessor` 2024-02-12 15:54:13 +00:00
Martin Evans 949861a581 - Added a `Modify` method to `Conversation`. This grants **temporary** access to directly modify the KV cache.
- Re-implmented `Rewind` as an extension method using `Modify` internally
 - Implemented `ShiftLeft`, which shifts everything over except for some starting tokens. This is the same as the `StatelessExecutor` out-of-context handling.
 - Starting batch at epoch 1, this ensures that conversations (starting at zero) are below the current epoch. It also means `0` can always be used as a value guaranteed to be below the current epoch.
2024-02-11 23:20:05 +00:00
Martin Evans b0acecf080 Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix).
Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state.

Added two new examples, demonstrating forking and rewinding.
2024-02-09 23:57:03 +00:00
Martin Evans 90915c5a99 Added increment and decrement operators to `LLamaPos` 2024-02-07 17:04:57 +00:00
Martin Evans c5146bac23 - Exposed KV debug view through `SafeLLamaContextHandle`
- Added `KvCacheSequenceDivide`
 - Moved count tokens/cells methods to `SafeLLamaContextHandle`
2024-02-07 16:35:39 +00:00
Martin Evans 15a98b36d8 Updated everything to work with llama.cpp ce32060198b7e2d6a13a9b8e1e1369e3c295ae2a 2024-02-01 16:35:05 +00:00
Martin Evans 5da2a2f64b - Removed one of the constructors of `SafeLLamaHandleBase`, which implicitly states that memory is owned. Better to be explicit about this kind of thing!
- Also fixed `ToString()` in `SafeLLamaHandleBase`
2024-01-31 18:01:03 +00:00
Jason Couture ec59c5bf9e Fix missing library name prefix for cuda 2024-01-30 12:41:23 -05:00
Jason Couture 443ce4fff4 While the dllimport changes work, manual path searching needed to be updated 2024-01-30 11:10:51 -05:00
Jason Couture db7e1e88f8 Use llama instead of libllama in `[DllImport]`
This results in windows users not needing to rename the DLL. This allows native llama builds to be dropped in, even on windows.

I also took the time to update the documentation, removing references to renaming the files, since the names now match.

Fixes #463
2024-01-30 02:40:13 -05:00
Martin Evans 92b9bbe779 Added methods to `SafeLLamaContextHandle` for KV cache manipulation 2024-01-23 16:16:02 +00:00
Martin Evans 96c26c25f5
Merge pull request #445 from martindevans/stateless_executor_llama_decode
Swapped `StatelessExecutor` to use `llama_decode`!
2024-01-23 03:02:51 +00:00
Martin Evans 9fe878ae1f - Fixed example
- Growing more than double, if necessary
2024-01-21 01:00:24 +00:00
Martin Evans 9ede1bedc2 Automatically growing batch n_seq_max when exceeded. This means no parameters need to be picked when the batch is created. 2024-01-21 00:55:14 +00:00
Martin Evans a2e29d393c Swapped `StatelessExecutor` to use `llama_decode`!
- Added `logits_i` argument to `Context.ApplyPenalty`
 - Added a new exception type for `llama_decode` return code
2024-01-20 21:18:35 +00:00
Martin Evans 99969e538e - Removed some unused `eval` methods.
- Added a `DecodeAsync` overload which runs the work in a task
 - Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents.
 - Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.
2024-01-20 02:38:45 +00:00
Martin Evans 36a9335588 Removed `LLamaBatchSafeHandle` (using unmanaged memory, created by llama.cpp) and replaced it with a fully managed `LLamaBatch`. Modified the `BatchedDecoding` example to use new managed batch. 2024-01-19 23:26:36 +00:00
Martin Evans 1472704e12 Added a test with examples of troublesome strings from 0.9.1 2024-01-16 15:02:23 +00:00