Commit Graph

23 Commits

Author SHA1 Message Date
Martin Evans 1ec0fee5ba Added optional `IProgress` parameter to `LoadFromFileAsync` 2024-04-27 15:04:54 +01:00
Martin Evans 9867b4c85d Only setting callback if the token can be cancelled. 2024-04-27 02:55:35 +01:00
Martin Evans 00df7c1516 - Added `LLamaWeights.LoadFromFileAsync`.
- Async loading supports cancellation through a `CancellationToken`. If loading is cancelled an `OperationCanceledException` is thrown.  If it fails for another reason a `LoadWeightsFailedException` is thrown.
 - Updated examples to use `LoadFromFileAsync`
2024-04-27 02:52:41 +01:00
Martin Evans c325ac9127
April 2024 Binary Update (#662)
* Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`.

 - Added all new functions.
 - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs`
 - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here.
 - Changed all token properties to return nullable tokens, to handle some models not having some tokens.
 - Fixed `DefaultSamplingPipeline` to handle no newline token in some models.

* Moved native methods to more specific locations.

 - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already.
 - Checking that GPU layer count is zero if GPU offload is not supported.
 - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.

* Removed exception if `GpuLayerCount > 0` when GPU is not supported.

* - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle`
 - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext`
 - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`

* Added update and defrag methods for KV cache in `SafeLLamaContextHandle`

* Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`

* Passing the sequence ID when saving a single sequence state
2024-04-16 23:19:47 +01:00
Martin Evans 5b6e82a594 Improved the BatchedDecoding demo:
- using less `NativeHandle`
 - Using `StreamingTokenDecoder` instead of obsolete detokenize method
2024-01-20 17:39:50 +00:00
Martin Evans 42be9b136d Switched form using raw integers, to a `LLamaToken` struct 2024-01-02 20:47:21 +00:00
Martin Evans f860f88c36 Code cleanup driven by R# suggestions:
- Made `NativeApi` into a `static class` (it's not intended to be instantiated)
 - Moved `LLamaTokenType` enum out into a separate file
 - Made `LLamaSeqId` and `LLamaPos` into `record struct`, convenient to have equality etc
2024-01-02 03:20:21 +00:00
Martin Evans 2a1e1b6183 Removed unused imports 2023-12-20 15:47:09 +00:00
Martin Evans a2bae178fa Added a `Metadata` property to `LLamaWeights` 2023-12-20 15:45:24 +00:00
Martin Evans 3c5547b2b7 Reduced some uses of `NativeApi` in `BatchedDecoding` by adding some helper methods 2023-10-28 21:32:21 +01:00
Martin Evans f1e5a8f995 - Passing the `ILogger` through to every call of `CreateContext`
- Passing `ILogger` into executors
2023-10-19 21:09:44 +01:00
sa_ddam213 4ec9aed47a
Revert LLamasSharp project changes 2023-10-20 08:29:26 +13:00
sa_ddam213 b4b4000342
Merge branch 'master' into upstream_master
# Conflicts:
#	LLama.Web/Common/ModelOptions.cs
#	LLama.Web/Services/ConnectionSessionService.cs
#	LLama/LLamaStatelessExecutor.cs
#	LLama/LLamaWeights.cs
2023-10-20 08:02:27 +13:00
Martin Evans 9daf586ba8 Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc) 2023-10-19 00:26:30 +01:00
sa_ddam213 9b8de007dc Propagate ILogger 2023-10-04 13:47:08 +13:00
Martin Evans 669ae47ef7 - Split parameters into two interfaces
- params contains a list of loras, instead of just one
2023-09-30 16:21:18 +01:00
Martin Evans bca55eace0 Initial changes to match the llama.cpp changes 2023-09-29 01:18:21 +01:00
Martin Evans 93f24f8a51 Switched to properly typed `Encoding` property 2023-08-24 00:09:00 +01:00
Martin Evans 9fc17f3136 Fixed unit tests 2023-08-22 14:16:20 +01:00
Martin Evans a9e6f21ab8 - Creating and destroying contexts in the stateless executor, saving memory. It now uses zero memory when not inferring!
- Passing encoding in the `IModelParams`, which reduces how often encoding needs to be passed around
2023-08-22 01:30:13 +01:00
Martin Evans d0a7a8fcd6 - Cleaned up disposal in LLamaContext
- sealed some classes not intended to be extended
2023-08-13 01:10:08 +01:00
Martin Evans 20bdc2ec6f - Apply LoRA in `LLamaWeights.LoadFromFile`
- Sanity checking that weights are not disposed when creating a context from them
 - Further simplified `Utils.InitLLamaContextFromModelParams`
2023-08-13 01:10:08 +01:00
Martin Evans e2fe08a9a2 Added a higher level `LLamaWeights` wrapper around `SafeLlamaModelHandle` 2023-08-13 01:10:08 +01:00