Commit Graph

65 Commits

Author SHA1 Message Date
Rinne b677cdc6a3
Merge pull request #560 from eublefar/feature/chat-session-state-management
Chat session state management
2024-03-26 11:44:29 +08:00
Martin Evans ad682fbebd
`BatchedExecutor.Create()` method (#613)
Replaced `BatchedExecutor.Prompt(string)` method with `BatchedExecutor.Create()` method. This improves the API in two ways:
 - A conversation can be created, without immediately prompting it
 - Other prompting overloads (e.g. prompt with token list) can be used without duplicating all the overloads onto `BatchedExecutor`

Added `BatchSize` property to `LLamaContext`
2024-03-20 02:20:35 +00:00
eublefar a31391edd7 Polymorphic serialization for executor state and transforms 2024-03-17 15:34:36 +01:00
Martin Evans a8ba9f05b3
March Binary Update (#565)
* Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586)

* Added abort callback

* Added properties to get/set thread count on `LLamaContext`

* Fixed LLamaLogLevel numbering
2024-03-06 15:19:42 +00:00
eublefar e05d5d4e14 Remove resetting state ops and make SessionState.ExecutorState and SessionState.ContextState no nullable 2024-03-02 20:07:17 +01:00
eublefar b2f7dbb39b AddPromptAsync method for stateful executors, Chat session initialize from history and process system message methods for pre-processing prompts. Serializing executor state to JSON, to avoid saved states from being updated by reference. 2024-03-02 17:26:06 +01:00
eublefar 35153a77dd Chat session Get/Load in-memory state operations, reset state ops for stateful executors and context 2024-03-02 14:51:03 +01:00
Martin Evans 8ac1634233
Removed `llama_eval`. It is going to be completely removed in the next version of llama.cpp (#553) 2024-02-28 21:41:39 +00:00
Martin Evans 91a7967869
`ReadOnlySpan<float>` in ISamplingPipeline (#538)
* - Modified ISamplingPipeline to accept `ReadOnlySpan<float>` of logits directly. This moves responsibility to copy the logits into the pipeline.
 - Added a flag to `BaseSamplingPipeline` indicating if a logit copy is necessary. Skipping it in most cases.

* Fixed `RestoreProtectedTokens` not working if logit processing is skipped

* - Implemented a new greedy sampling pipeline (always sample most likely token)
 - Moved `Grammar` into `BaseSamplingPipeline`
 - Removed "protected tokens" concept from `BaseSamplingPipeline`. Was introducing a lot of incidental complexity.
 - Implemented newline logit save/restore in `DefaultSamplingPipeline` (only place protected tokens was used)

* Implemented pipelines for mirostat v1 and v2
2024-02-25 02:12:00 +00:00
Martin Evans b0acecf080 Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix).
Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state.

Added two new examples, demonstrating forking and rewinding.
2024-02-09 23:57:03 +00:00
Martin Evans 15a98b36d8 Updated everything to work with llama.cpp ce32060198b7e2d6a13a9b8e1e1369e3c295ae2a 2024-02-01 16:35:05 +00:00
Martin Evans 5da2a2f64b - Removed one of the constructors of `SafeLLamaHandleBase`, which implicitly states that memory is owned. Better to be explicit about this kind of thing!
- Also fixed `ToString()` in `SafeLLamaHandleBase`
2024-01-31 18:01:03 +00:00
Martin Evans a2e29d393c Swapped `StatelessExecutor` to use `llama_decode`!
- Added `logits_i` argument to `Context.ApplyPenalty`
 - Added a new exception type for `llama_decode` return code
2024-01-20 21:18:35 +00:00
Martin Evans 99969e538e - Removed some unused `eval` methods.
- Added a `DecodeAsync` overload which runs the work in a task
 - Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents.
 - Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.
2024-01-20 02:38:45 +00:00
Martin Evans 2eb52b1630 made casts to/from int explicit, fixed places affected 2024-01-02 20:57:37 +00:00
Martin Evans 42be9b136d Switched form using raw integers, to a `LLamaToken` struct 2024-01-02 20:47:21 +00:00
Martin Evans f860f88c36 Code cleanup driven by R# suggestions:
- Made `NativeApi` into a `static class` (it's not intended to be instantiated)
 - Moved `LLamaTokenType` enum out into a separate file
 - Made `LLamaSeqId` and `LLamaPos` into `record struct`, convenient to have equality etc
2024-01-02 03:20:21 +00:00
Martin Evans dc8e5d88f7
Update LLama/LLamaContext.cs 2023-12-15 23:14:39 +00:00
Martin Evans 2df3e7617e Added a method to set the RNG seed on the context 2023-12-15 22:55:04 +00:00
Martin Evans b34f72a883 - Added `SamplingPipeline` to inference params which overrides all other options with an entirely custom pipeline.
- Added a `Sample` method to `LLamaContext` which uses a custom pipeline
 - Modified all executors to use the custom pipeline if it exists
2023-12-08 01:02:27 +00:00
Martin Evans 16ab33ba3c Added Obsolete markings to all `Eval` overloads 2023-11-17 01:53:49 +00:00
Martin Evans d743516070 - Added support for the MinP sampler
- Cleaned up comments in implementations of `IInferenceParams`
 - Removed default values for all parameters in `LLamaContext.Sample` - they're never used and probably _shouldn't_ ever be used
2023-11-12 00:05:18 +00:00
Martin Evans dcc82e582e Fixed `Eval` on platforms < dotnet 5 2023-10-29 15:12:41 +00:00
Martin Evans e81b3023d5 Rewritten sampling API to be accessed through the `LLamaTokenDataArray` object 2023-10-28 21:32:21 +01:00
Martin Evans a024d2242e It works!
had to update binary to `b1426`
2023-10-28 21:32:21 +01:00
Martin Evans 36c71abcfb Fixed `LLama.StreamingTokenDecoderLLamaLLama.StreamingTokenDecoderLLamaLLama.StreamingTokenDecoderLLama` spam in all executors except Stateless. 2023-10-25 13:57:00 +01:00
Martin Evans 51d4411a58 Added two new classes for detokenization tasks:
- `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected.
 - `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens.

Added tests for these classes and updated StatelessExecutor to use them.

Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).
2023-10-23 00:33:50 +01:00
Martin Evans efdf3d630c - Removed all `TokenToString` methods (it's never correct to use them, because sometimes one single character may be represented by multiple tokens).
- Built a new (hacky) `Detokenize` method which handles this
2023-10-22 21:43:36 +01:00
Martin Evans 9daf586ba8 Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc) 2023-10-19 00:26:30 +01:00
Martin Evans d8434ea9d6
Merge pull request #185 from martindevans/wip_major_api_change
Major llama.cpp API Change
2023-10-18 20:50:32 +01:00
Martin Evans 1f8c94e386 Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538) 2023-10-17 23:55:46 +01:00
Martin Evans 669ae47ef7 - Split parameters into two interfaces
- params contains a list of loras, instead of just one
2023-09-30 16:21:18 +01:00
Martin Evans 9a0a0ae9fe Removed cloning support 2023-09-30 15:48:26 +01:00
Martin Evans 0d40338692 Fixed out-of-context handling in stateless executor 2023-09-29 23:53:07 +01:00
Martin Evans ce1fc51163 Added some more native methods 2023-09-29 16:05:19 +01:00
Martin Evans bca55eace0 Initial changes to match the llama.cpp changes 2023-09-29 01:18:21 +01:00
Martin Evans 08f1615e60 - Converted LLamaStatelessExecutor to run `Exec` calls inside an awaited task. This unblocks async callers while the model is being evaluated.
- Added a "spinner" to the `StatelessModeExecute` demo, which spins while waiting for the next token (demonstrating that it's not blocked).
2023-09-23 15:22:57 +01:00
redthing1 b78044347c
fix opaque GetState (fixes #176) 2023-09-18 20:56:14 -07:00
Martin Evans 466722dcff
Merge pull request #165 from martindevans/better_instruct_antiprompt_checking
better_instruct_antiprompt_checking
2023-09-11 00:32:43 +01:00
Martin Evans d08a125020 Using the `TokensEndsWithAnyString` extensions for antiprompt checking in instruct executor. Simpler and more efficient. 2023-09-11 00:22:17 +01:00
Martin Evans bba801f4b7 Added a property to get the KV cache size from a context 2023-09-11 00:10:08 +01:00
Martin Evans 4dac142bd5
Merge pull request #160 from martindevans/GetState_fix
`GetState()` fix
2023-09-09 01:44:08 +01:00
Martin Evans 832bf7dbe0 Simplified implementation of `GetState` and fixed a memory leak (`bigMemory` was never freed) 2023-09-09 01:30:35 +01:00
Martin Evans 4f7b6ffdcc Removed `GenerateResult` method that was only used in one place 2023-09-09 01:09:27 +01:00
sa_ddam213 949b0cde16
Replace ILLamaLogger for ILogger 2023-09-09 10:13:07 +12:00
Martin Evans 31287b5e6e Rewritten TokenToSpan/TokenToString to better fit the new way it's done in llama.cpp with a few different options:
- Just convert it to a `string`, nice and simple
 - Write the bytes to a `Span<byte>` no allocations
 - Write the chars to a `StringBuilder` potentially no allocations
2023-08-27 00:15:56 +01:00
Martin Evans 0c98ae1955 Passing ctx to `llama_token_nl(_ctx)` 2023-08-27 00:15:55 +01:00
Martin Evans 826c6aaec3 cleaned up higher level code using the sampling API:
- Fixed multiple enumeration
 - Fixed newline penalisation
2023-08-26 21:47:41 +01:00
Martin Evans a911b77dec Various minor changes, resolving about 100 ReSharper code quality warnings 2023-08-24 23:15:53 +01:00
Martin Evans 5a6c6de0dc
Merge pull request #115 from martindevans/model_params_record
ModelsParams record class
2023-08-24 22:54:23 +01:00