Commit Graph

140 Commits

Author SHA1 Message Date
Joshua Lochner 0af1e2f3c2
Remove old import from `stream/web` for `ReadableStream` (#752)
Node 16 is no longer supported
2024-05-20 18:05:48 +02:00
Joshua Lochner 992f643e2a [version] Update to 2.17.1 2024-04-18 17:25:49 +02:00
Joshua Lochner 642743136e [version] Update to 2.17.0 2024-04-11 01:20:24 +02:00
Joshua Lochner aa542cf548
Update dependencies (#704)
* Update dependencies

* Bump protobufjs from 7.2.4 to 7.2.6 in /examples/semantic-image-search (#705)

Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.2.4 to 7.2.6.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md)
- [Commits](https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.6)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump protobufjs from 7.2.4 to 7.2.6 in /examples/next-client (#706)

Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.2.4 to 7.2.6.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md)
- [Commits](https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.6)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump protobufjs from 7.2.4 to 7.2.6 in /examples/next-server (#707)

Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.2.4 to 7.2.6.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md)
- [Commits](https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.6)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump protobufjs in /examples/semantic-image-search-client (#708)

Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.2.4 to 7.2.6.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md)
- [Commits](https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.6)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-11 00:24:51 +02:00
Joshua Lochner d50b3193fb [version] Update to 2.16.1 2024-03-20 16:44:39 +02:00
Joshua Lochner f0ef2e8eee
Update tokenizer `apply_chat_template` functionality (#647)
* Allow custom kwargs in `tokenizer.apply_chat_template`

* Update jinja dependency version

* Add `tokenizer_kwargs` options

* Add support for dictionaries of chat templates in the tokenizer config

* Add `CohereTokenizer`

* `apply_chat_template` is no longer async

* Add unit test for multiple chat templates

* Update tokenizers.js

* Also update when `chat_template` is undefined

* Support setting tokenizer and text from URL

* Update Claude tokenizer display name

* Add Cohere Command-R tokenizer to playground

* Add `Grok1Tokenizer`

* Throw error if chat template object is malformed

* Improved error checking

* Remove redundant error check

* `template_dict` can be a null-prototype object
2024-03-20 15:22:01 +02:00
Joshua Lochner 314b7f0dc4 [version] Update to 2.16.0 2024-03-07 15:45:22 +02:00
Joshua Lochner 8694a856ee
Bump `@huggingface/jinja` (#629) 2024-03-07 01:59:06 +02:00
Joshua Lochner 2e53f51ce7
Update `@huggingface/jinja` -> 0.2.0 (#627) 2024-03-05 17:04:08 +02:00
Joshua Lochner 7772d1db0a [version] Update to 2.15.1 2024-02-21 16:26:05 +02:00
Joshua Lochner 68ed7f6cbb
Add Gemma Tokenizer (#598)
* Fix styling for whitespace tokens

* Add `GemmaTokenizer`

* Update minimum `@huggingface/jinja` version

* Add Gemma to tokenizer playground

* Add Gemma tokenizer unit test

* Update tokenizer names in playground

* Update Gemma tokenizer test
2024-02-21 16:22:22 +02:00
Joshua Lochner 41f98b761f [version] Update to 2.15.0 2024-02-06 15:06:50 +02:00
Joshua Lochner dbeb314323
Update `jsdoc-to-markdown` dev dependency (#574) 2024-02-06 14:59:51 +02:00
Joshua Lochner 9f877eea95 [version] Update to 2.14.2 2024-01-29 14:02:11 +02:00
Joshua Lochner a2fcd110a3 [version] Update to 2.14.1 2024-01-25 15:26:57 +02:00
Joshua Lochner 5b5aa4cf6a [version] Update to 2.14.0 2024-01-10 18:30:36 +02:00
Joshua Lochner 07df34ff33 [version] Update to 2.13.4 2024-01-04 19:08:32 +02:00
Joshua Lochner f3482baa51 [version] Update to 2.13.3 2024-01-04 02:14:10 +02:00
Joshua Lochner 733f98277d [version] Update to 2.13.2 2024-01-03 16:41:35 +02:00
Joshua Lochner e8d1236c11 [version] Update to 2.13.1 2024-01-03 12:59:13 +02:00
Joshua Lochner 61459e38d8 [version] Update to 2.13.0 2023-12-27 16:27:51 +02:00
Joshua Lochner 0bf6e6712f [version] Update to 2.12.1 2023-12-18 23:25:00 +02:00
Joshua Lochner 1427125dc3
Update jinja dependency (#459)
* Make `@huggingface/jinja` a dependency

* Update package-lock.json

* Update JSDoc
2023-12-18 23:22:24 +02:00
Joshua Lochner 81aab022ff [version] Update to 2.12.0 2023-12-18 17:04:41 +02:00
Joshua Lochner d4f7cd5024
Add support for chat templates (#408)
* Add basic support for chat templates

* Cleanup

* JSDoc improvements

* Support conversion of user-defined functions

* Cleanup

* Fix function creation

* Add unit tests for templates

* Cleanup

* Improve JSDoc

* Add missing return types

* Add chat templates docs to table of contents

* Add support for logical negation

* Fix nested logical negation

* Add unit tests for logical operators

* Add loop variables

* Add support for `RuntimeValue` built-in functions

* Add unit tests for string instance methods

* Fix conversion of normal function to `FunctionValue`

* Update object method unit tests

* Save chat template to tokenizer_config.json during conversion

* Fix `raise_exception` error

* Add `!=` operator for booleans

* Remember to increment loop index

* Cleanup for loop evaluator

* Use `is` helper function

* Add support for text nodes

i.e., non Jinja statements/expressions

* Add auto-generated templating tests

* Update unit tests

* Remove unused function

* Add default chat templates

* Use repo with up-to-date tokenizer config

* Temporarily disable zephyr test

* Delete templates.test.js

* Move Jinja functionality to `@huggingface/jinja`

* Fix template cache type

* Update chat template unit tests

* Update `@huggingface/jinja` version

* Fix default llama2 system prompt usage

* Add unit test for llama2 w/o chat template set

* Update jinja version

* Update jinja version

* Add unit test for user-defined chat templates

Example from https://discuss.huggingface.co/t/issue-with-llama-2-chat-template-and-out-of-date-documentation/61645/3

* Add `AddedToken` for improved tokenization

* Add example usage for chat templates

* Add 'first' Metaspace pretokenizer prepend scheme

* Formatting

* Update wav2vec2 converter special tokens whitespace split

* Fix Metaspace pretokenizer split criteria

* Update inputs of `PreTokenizerSequence`

* Improve Metaspace pretokenizer

* Update llama tokenizer tests

* Improve handling of legacy llama tokenizer

* Re-enable SPM tests

* Add static tokenizer test cases

* Add llama2 static tests

* Allow user to override legacy tokenizer behaviour in `.from_pretrained`

* Add legacy tokenizer unit tests

* Bump jinja version to 0.1.0
2023-12-18 17:00:50 +02:00
Joshua Lochner 6129e45b2b [version] Update to 2.11.0 2023-12-13 15:19:12 +02:00
Joshua Lochner cb8a5961df [version] Update to 2.10.1 2023-12-06 18:48:41 +02:00
Joshua Lochner 57487744e7 [version] Update to 2.10.0 2023-12-05 15:34:53 +02:00
Joshua Lochner 768a2e26d7 [version] Update to 2.9.0 2023-11-21 14:51:11 +02:00
Joshua Lochner b8719b12dd
Ensure WASM fallback does not crash in GH actions (#402)
* Ensure WASM fallback does not crash in GH actions

* Add unit test for WordPiece `max_input_chars_per_word`

* Cleanup

* Set max test concurrency to 1
2023-11-19 08:06:49 +02:00
Joshua Lochner c98073042f [version] Update to 2.8.0 2023-11-09 18:00:58 +02:00
Kit PANG e457c4b5d0
Upgrade typescript dependency version (#368)
Fix typegen incorrectly outputs const for non-readonly fields
2023-10-24 19:52:24 +02:00
Joshua Lochner d0915294ae [version] Update to 2.7.0 2023-10-23 16:39:27 +02:00
Joshua Lochner 5b31129218 [version] Update to 2.6.2 2023-09-27 15:15:09 +02:00
Joshua Lochner b3a2a5b00f [version] Update to 2.6.1 2023-09-18 14:56:06 +02:00
Joshua Lochner ad7e8758bc [version] Update to 2.6.0 2023-09-08 15:41:59 +02:00
Joshua Lochner 0c2dcc7498 [version] Update to 2.5.4 2023-08-28 20:07:06 +02:00
Joshua Lochner 7076c8e401 [version] Update to 2.5.3 2023-08-22 23:31:00 +02:00
Joshua Lochner 254e99ef9a [version] Update to 2.5.2 2023-08-14 22:55:54 +02:00
Joshua Lochner b420a8841e [version] Update to 2.5.1 2023-08-09 22:25:53 +02:00
Joshua Lochner 9aa1a29dac [version] Update to 2.5.0 2023-08-01 14:24:56 +02:00
Joshua Lochner 27920d8483 [version] Update to 2.4.4 2023-07-28 13:28:37 +02:00
Joshua Lochner da67f41434 [version] Update to 2.4.3 2023-07-27 06:06:42 +02:00
Joshua Lochner f181e135d4 [version] Update to 2.4.2 2023-07-22 05:08:12 +02:00
Joshua Lochner 4e947aa657 [version] Update to 2.4.1 2023-07-11 02:12:28 +02:00
Joshua Lochner 86de50d0f2
Whisper word-level timestamps (#184)
* Support outputting attentions in generate function

* Add unit tests for concatenating tensors

* Implement `cat` for `dim>0`

* Add `cat` unit tests for > 2 tensors

* Allow for negative indexing + bounds checking

* Add test case for `cat` with negative indexing

* Clean up `safeIndex` helper function

* Allow indexing error message to include dimension

* Reuse `safeIndex` helper function for `normalize_`

* Optimize `cat` indexing

* Implement `stack` tensor operation

+ add unit tests

* Add TODOs

* Implement `mean` tensor operation

* Implement `std_mean` tensor ops

* Fix order of `std_mean` returns

* Implement median filter

* Implement dynamic time warping

* Implement `neg` tensor op

* Throw error if audio sent to processor is not a `Float32Array`

* Add `round` helper function

* [WIP] Implement basic version of word-level-timestamps

Known issues:
- timestamps not correct for index > 0
- punctuation not same as python version

* Fix typo

* Fix timestamps

* Round to 2 decimals

* Fix punctuation

* Fix typing

* Remove debug statements

* Cleanup code

* Cleanup

* Remove debug statements

* Update JSDoc for extract token timestamps function

* Add return type for `std_mean` tensor function

* Improve typing of private whisper tokenizer functions

* Indicate method is private

* Allow whisper feature extractor to be called with Float64Array input

* Fix typo

* Throw error if `cross_attentions` are not present in model output when extracting token timestamps

* Throw error during generate function

* Allow whisper models to be exported with `output_attentions=True`

* Add alignment heads to generation config

* Remove print statement

* Update versions

* Override protobufjs version

* Update package-lock.json

* Require onnx==1.13.1 for conversion

Will update once onnxruntime-web supports onnx IR version 9

* Add unit test for word-level timestamps

* Extract add attentions function out of `generate`

* Fix `findLongestCommonSequence` return types

* Downgrade back to onnxruntime 1.14.0

1.15.1 is a little to unstable right now.

* Cleanup

- use `.map`
- rename variables

* Update comments

* Add examples for how to transcribe w/ word-level timestamps

* Add example for transcribing/translating audio longer than 30 seconds

* Make example more compact
2023-07-09 23:21:43 +02:00
Joshua Lochner 1563b434bc [version] Update to 2.3.1 2023-07-01 04:15:13 +02:00
Joshua Lochner 8d6622ef9b [version] Update to 2.3.0 2023-06-22 15:36:02 +02:00
Joshua Lochner d90f58110a
Add whisper unit tests (#155)
* Only run encoder with required inputs

* Add basic whisper unit tests

* Add newline after heading for docs

* Add unit test for transcribing english with timestamps

* Add multilingual test case
2023-06-21 23:58:16 +02:00
Joshua Lochner 035f69f79a [version] Update to 2.2.0 2023-06-09 15:18:29 +02:00