- Async loading supports cancellation through a `CancellationToken`. If loading is cancelled an `OperationCanceledException` is thrown. If it fails for another reason a `LoadWeightsFailedException` is thrown.
- Updated examples to use `LoadFromFileAsync`
* Added the ability to save and load individual conversations in a batched executor.
- New example
- Added `BatchedExecutor.Load(filepath)` method
- Added `Conversation.Save(filepath)` method
- Added new (currently internal) `SaveState`/`LoadState` methods in LLamaContext which can stash some extra binary data in the header
* Added ability to save/load a `Conversation` to an in-memory state, instead of to file.
* Moved the new save/load methods out to an extension class specifically for the batched executor.
* Removed unnecessary spaces
* Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`.
- Added all new functions.
- Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs`
- Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here.
- Changed all token properties to return nullable tokens, to handle some models not having some tokens.
- Fixed `DefaultSamplingPipeline` to handle no newline token in some models.
* Moved native methods to more specific locations.
- Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already.
- Checking that GPU layer count is zero if GPU offload is not supported.
- Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.
* Removed exception if `GpuLayerCount > 0` when GPU is not supported.
* - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle`
- Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext`
- Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`
* Added update and defrag methods for KV cache in `SafeLLamaContextHandle`
* Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`
* Passing the sequence ID when saving a single sequence state
* - Added `NativeLogConfig` which allows overriding the llama.cpp log callback
- Delaying binding of this into llama.cpp until after `NativeLibraryConfig` has loaded
* Using the log callback to show loading log messages during loading.
* Registering log callbacks before any calls to llama.cpp except `llama_empty_call`, this is specifically selected to be a method that does nothing and is just there for triggering DLL loading.
* - Removed much of the complexity of logging from `NativeApi.Load`. It always call whatever log callbacks you have registered.
- Removed alternative path for `ILogger` in NativeLibraryConfig, instead it redirects to wrapping it in a delegate.
* Saving a GC handle to keep the log callback alive
* Removed prefix, logger should already do that.
* Buffering up messages until a newline is encountered before passing log message to ILogger.
* - Added trailing `\n` to log messages from loading.
- Using `ThreadLocal<StringBuilder>` to ensure messages from separate threads don't get mixed together.
This commit is originally made by lcarrere in https://github.com/SciSharp/LLamaSharp/issues/180 .
I have confirmed this modification is OK in my windows 11 laptop, add make this commit according require of AsakusaRinne.