Commit Graph

535 Commits

Author SHA1 Message Date
Rinne b4317eebbe
Merge pull request #632 from AsakusaRinne/master
Release version 0.11.0
2024-04-01 02:59:44 +08:00
Rinne d67658a0d6
docs: update the information to v0.11.0. 2024-04-01 01:38:40 +08:00
evolcano 353412923f Merge branch 'master' of https://github.com/SciSharp/LLamaSharp 2024-03-30 10:55:42 +08:00
evolcano 9d091c0316 Add path to find llama.dll for MAUI
This commit is originally made by lcarrere in https://github.com/SciSharp/LLamaSharp/issues/180 .

I have confirmed this modification is OK in my windows 11 laptop, add make this commit according require of AsakusaRinne.
2024-03-30 10:54:44 +08:00
SignalRT 43677c511c Change interface to support multiple images and add the capabitlity to render the image in the console 2024-03-26 23:19:32 +01:00
SignalRT 2d9a114f66 Include comments and include some checks 2024-03-26 23:19:32 +01:00
SignalRT 8907adcd8e Clean up duplicate property 2024-03-26 23:19:32 +01:00
SignalRT e8732efadd Example InteractiveExecutor
Add an Example and modifications to the interactive executor to enable Llava Models.

Just a preview / demo
2024-03-26 23:19:32 +01:00
Rinne b677cdc6a3
Merge pull request #560 from eublefar/feature/chat-session-state-management
Chat session state management
2024-03-26 11:44:29 +08:00
Martin Evans e2705be6c8
Fixed off by one error in LLamaBatch sampling position (#626) 2024-03-25 22:56:26 +00:00
Martin Evans 91d72e7465
Keeping track of positions where logits will be generated in a batch and what sequence those logits are associated with. (#624) 2024-03-25 21:02:48 +00:00
eublefar b8cd5b7ee5 loadTransforms flag for LoadSession methods 2024-03-21 12:18:38 +01:00
eublefar 9440f153da Make process message method more flexible 2024-03-21 12:14:15 +01:00
Martin Evans 268f3a6b07
BatchedExecutor Fixed Forking (#621)
* Previously when a conversation was forked this would result in both the parent and the child sharing exactly the same logits. Since sampling is allowed to modify logits this could lead to issues in sampling (e.g. one conversation is sampled and overwrites logits to be all zero, second conversation is sampled and generates nonsense). Fixed this by setting a "forked" flag, logits are copied if this flag is set. Flag is cleared next time the conversation is prompted so this extra copying only happens once after a fork occurs.

* Removed finalizer from `BatchedExecutor`. This class does not directly own any unmanaged resources so it is not necessary.
2024-03-20 16:36:01 +00:00
Martin Evans ad682fbebd
`BatchedExecutor.Create()` method (#613)
Replaced `BatchedExecutor.Prompt(string)` method with `BatchedExecutor.Create()` method. This improves the API in two ways:
 - A conversation can be created, without immediately prompting it
 - Other prompting overloads (e.g. prompt with token list) can be used without duplicating all the overloads onto `BatchedExecutor`

Added `BatchSize` property to `LLamaContext`
2024-03-20 02:20:35 +00:00
Martin Evans 024787225b
`SetDllImportResolver` based loading (#603)
- Modified library loading to be based on `SetDllImportResolver`. This replaces the built in loading system and ensures there can't be two libraries loaded at once.
 - llava and llama are loaded separately, as needed.
 - All the previous loading logic is still used, within the `SetDllImportResolver`
 - Split out CUDA, AVX and MacOS paths to separate helper methods.
 - `Description` now specifies if it is for `llama` or `llava`
2024-03-17 19:54:20 +00:00
eublefar d88f9e1199 Return null executor state if it's serialized in an old way 2024-03-17 16:22:25 +01:00
eublefar 00c873a197 Avoid saving empty context state in binary format, it smh messes with the llama.cpp 2024-03-17 15:55:35 +01:00
eublefar a31391edd7 Polymorphic serialization for executor state and transforms 2024-03-17 15:34:36 +01:00
eublefar 6f76d77350 Make text transform interfaces have explicit copy operation 2024-03-17 12:37:02 +01:00
eublefar 5f3803d23c Make state editable by the user, add deepcopy to fields that require it 2024-03-17 12:21:52 +01:00
eublefar 87fe982f10 Change method signature as suggested 2024-03-17 12:11:19 +01:00
eublefar af796fc3e9 Change List types in executor state to arrays to enforce copy on get/set operations 2024-03-17 11:58:26 +01:00
jlsantiago 3b2836eac4
Llava api (#563)
* Add llava_binaries, update all binaries to make the test

* Llava API + LlavaTest

Preliminary

* First prototype of Load + Unit Test

* Temporary run test con branch LlavaAPI

* Disable Embed test to review the rest of the test

* Restore Embedding test

* Use BatchThread to eval image embeddings

Test Threads default value to ensure it doesn´t produce problems.

* Rename test file

* Update action versions

* Test only one method, no release embeddings

* Revert "Test only one method, no release embeddings"

This reverts commit 264e176dccc9cd0be318b800ae5e102a4635d01c.

* Correct API call

* Only test llava related functionality

* Cuda and Cblast binaries

* Restore build policy

* Changes related with code review

* Add SafeHandles

* Set overwrite to upload-artifact@v4

* Revert to upload-artifact@v3

* revert to upload-artifact@v3
2024-03-13 22:10:44 +00:00
Martin Evans ce4de7d607
llama_decode lock (#595)
* Added a lock object into `SafeLlamaModelHandle` which all calls to `llama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp.

* Modified the lock to be global over _all_ inferences. This seems to be necessary (at least with the CUDA backend).
2024-03-13 00:33:16 +00:00
Clovis Henrique Ribeiro d0f79814e9
Added conditional compilation code to progress_callback (in LlamaModelParams struct) so the struct plays nice with legacy NET Framework 4.8 (#593) 2024-03-11 14:36:50 +00:00
Martin Evans f0b0bbcbb7
Mutable Logits (#586)
Modified LLamaBatch to not share tokens with other sequences if logits is true. This ensures that the logit span at the end in used by exactly one sequence - therefore it's safe to mutate. This removes the need for copying _very_ large arrays (vocab size) and simplifies sampling pipelines.
2024-03-10 13:56:11 +00:00
Martin Evans a8ba9f05b3
March Binary Update (#565)
* Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586)

* Added abort callback

* Added properties to get/set thread count on `LLamaContext`

* Fixed LLamaLogLevel numbering
2024-03-06 15:19:42 +00:00
dependabot[bot] 4068a6f03b
build(deps): bump System.Text.Json from 8.0.1 to 8.0.2
Bumps [System.Text.Json](https://github.com/dotnet/runtime) from 8.0.1 to 8.0.2.
- [Release notes](https://github.com/dotnet/runtime/releases)
- [Commits](https://github.com/dotnet/runtime/compare/v8.0.1...v8.0.2)

---
updated-dependencies:
- dependency-name: System.Text.Json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-04 07:04:42 +00:00
Martin Evans defac000ad
Added a `%(RecursiveDir)` element to the props file, this causes files to be copied along with the folder structure rather than dumped into the root. (#561) 2024-03-03 17:58:50 +00:00
eublefar e05d5d4e14 Remove resetting state ops and make SessionState.ExecutorState and SessionState.ContextState no nullable 2024-03-02 20:07:17 +01:00
eublefar b2f7dbb39b AddPromptAsync method for stateful executors, Chat session initialize from history and process system message methods for pre-processing prompts. Serializing executor state to JSON, to avoid saved states from being updated by reference. 2024-03-02 17:26:06 +01:00
eublefar 35153a77dd Chat session Get/Load in-memory state operations, reset state ops for stateful executors and context 2024-03-02 14:51:03 +01:00
Martin Evans 8ac1634233
Removed `llama_eval`. It is going to be completely removed in the next version of llama.cpp (#553) 2024-02-28 21:41:39 +00:00
Martin Evans f0e7e7cc0a
Removed `SamplingApi`. it has been marked as Obsolete for a while, replaced by instance methods on `LLamaTokenDataArray` (#552) 2024-02-28 19:30:53 +00:00
Martin Evans 7d84625a67
Classifier Free Guidance (#536)
* Added a `Guidance` method to `LLamaTokenDataArray` which applies classifier free guidance

* Factored out a safer `llama_sample_apply_guidance` method based on spans

* Created a guided sampling demo using the batched executor

* fixed comment, "classifier free" not "context free"

* Rebased onto master and fixed breakage due to changes in `BaseSamplingPipeline`

* Asking user for guidance weight

* Progress bar in batched fork demo

* Improved fork example (using tree display)

* Added proper disposal of resources in batched examples

* Added some more comments in BatchedExecutorGuidance
2024-02-26 15:41:57 +00:00
Martin Evans 91a7967869
`ReadOnlySpan<float>` in ISamplingPipeline (#538)
* - Modified ISamplingPipeline to accept `ReadOnlySpan<float>` of logits directly. This moves responsibility to copy the logits into the pipeline.
 - Added a flag to `BaseSamplingPipeline` indicating if a logit copy is necessary. Skipping it in most cases.

* Fixed `RestoreProtectedTokens` not working if logit processing is skipped

* - Implemented a new greedy sampling pipeline (always sample most likely token)
 - Moved `Grammar` into `BaseSamplingPipeline`
 - Removed "protected tokens" concept from `BaseSamplingPipeline`. Was introducing a lot of incidental complexity.
 - Implemented newline logit save/restore in `DefaultSamplingPipeline` (only place protected tokens was used)

* Implemented pipelines for mirostat v1 and v2
2024-02-25 02:12:00 +00:00
Scott W Harden a6394001a1
NativeLibraryConfig: WithLogs(LLamaLogLevel) (#529)
Adds a NativeLibraryConfig.WithLogs() overload to let the user indicate the log level (with "info" as the default)
2024-02-21 23:51:09 +00:00
Scott W Harden 4c3077d0f0
ChatSession: improve exception message
The original message contained the word "preceeded" which should be spelled as "preceded"
2024-02-19 17:50:37 -05:00
Martin Evans c7d0dc915a Assorted small changes to clean up some code warnings 2024-02-17 23:07:10 +00:00
Martin Evans 174f21a385 0.10.0 2024-02-15 14:40:56 +00:00
Martin Evans d03c1a9201
Merge pull request #503 from martindevans/batched_executor_again
Introduced a new `BatchedExecutor`
2024-02-15 14:26:57 +00:00
Martin Evans d47b6afe4d Normalizing embeddings in `LLamaEmbedder`. As is done in llama.cpp: 2891c8aa9a/examples/embedding/embedding.cpp (L92) 2024-02-13 02:09:35 +00:00
Martin Evans e9d9042576 Added `Divide` to `KvAccessor` 2024-02-12 15:54:13 +00:00
Martin Evans 1cc463b9b7 Added a finalizer to `BatchedExecutor` 2024-02-12 15:34:52 +00:00
Martin Evans 0c2cff0e1c Added a Finalizer for `Conversation` in case it is not correctly disposed. 2024-02-12 02:58:35 +00:00
Martin Evans 949861a581 - Added a `Modify` method to `Conversation`. This grants **temporary** access to directly modify the KV cache.
- Re-implmented `Rewind` as an extension method using `Modify` internally
 - Implemented `ShiftLeft`, which shifts everything over except for some starting tokens. This is the same as the `StatelessExecutor` out-of-context handling.
 - Starting batch at epoch 1, this ensures that conversations (starting at zero) are below the current epoch. It also means `0` can always be used as a value guaranteed to be below the current epoch.
2024-02-11 23:20:05 +00:00
Martin Evans b0acecf080 Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix).
Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state.

Added two new examples, demonstrating forking and rewinding.
2024-02-09 23:57:03 +00:00
Martin Evans 90915c5a99 Added increment and decrement operators to `LLamaPos` 2024-02-07 17:04:57 +00:00
Martin Evans 82c471eac4
Merge pull request #500 from martindevans/improved_kv_cache_methods
Small KV Cache Handling Improvements
2024-02-07 16:54:32 +00:00