* Allow custom kwargs in `tokenizer.apply_chat_template`
* Update jinja dependency version
* Add `tokenizer_kwargs` options
* Add support for dictionaries of chat templates in the tokenizer config
* Add `CohereTokenizer`
* `apply_chat_template` is no longer async
* Add unit test for multiple chat templates
* Update tokenizers.js
* Also update when `chat_template` is undefined
* Support setting tokenizer and text from URL
* Update Claude tokenizer display name
* Add Cohere Command-R tokenizer to playground
* Add `Grok1Tokenizer`
* Throw error if chat template object is malformed
* Improved error checking
* Remove redundant error check
* `template_dict` can be a null-prototype object
* Add basic support for chat templates
* Cleanup
* JSDoc improvements
* Support conversion of user-defined functions
* Cleanup
* Fix function creation
* Add unit tests for templates
* Cleanup
* Improve JSDoc
* Add missing return types
* Add chat templates docs to table of contents
* Add support for logical negation
* Fix nested logical negation
* Add unit tests for logical operators
* Add loop variables
* Add support for `RuntimeValue` built-in functions
* Add unit tests for string instance methods
* Fix conversion of normal function to `FunctionValue`
* Update object method unit tests
* Save chat template to tokenizer_config.json during conversion
* Fix `raise_exception` error
* Add `!=` operator for booleans
* Remember to increment loop index
* Cleanup for loop evaluator
* Use `is` helper function
* Add support for text nodes
i.e., non Jinja statements/expressions
* Add auto-generated templating tests
* Update unit tests
* Remove unused function
* Add default chat templates
* Use repo with up-to-date tokenizer config
* Temporarily disable zephyr test
* Delete templates.test.js
* Move Jinja functionality to `@huggingface/jinja`
* Fix template cache type
* Update chat template unit tests
* Update `@huggingface/jinja` version
* Fix default llama2 system prompt usage
* Add unit test for llama2 w/o chat template set
* Update jinja version
* Update jinja version
* Add unit test for user-defined chat templates
Example from https://discuss.huggingface.co/t/issue-with-llama-2-chat-template-and-out-of-date-documentation/61645/3
* Add `AddedToken` for improved tokenization
* Add example usage for chat templates
* Add 'first' Metaspace pretokenizer prepend scheme
* Formatting
* Update wav2vec2 converter special tokens whitespace split
* Fix Metaspace pretokenizer split criteria
* Update inputs of `PreTokenizerSequence`
* Improve Metaspace pretokenizer
* Update llama tokenizer tests
* Improve handling of legacy llama tokenizer
* Re-enable SPM tests
* Add static tokenizer test cases
* Add llama2 static tests
* Allow user to override legacy tokenizer behaviour in `.from_pretrained`
* Add legacy tokenizer unit tests
* Bump jinja version to 0.1.0
* Support outputting attentions in generate function
* Add unit tests for concatenating tensors
* Implement `cat` for `dim>0`
* Add `cat` unit tests for > 2 tensors
* Allow for negative indexing + bounds checking
* Add test case for `cat` with negative indexing
* Clean up `safeIndex` helper function
* Allow indexing error message to include dimension
* Reuse `safeIndex` helper function for `normalize_`
* Optimize `cat` indexing
* Implement `stack` tensor operation
+ add unit tests
* Add TODOs
* Implement `mean` tensor operation
* Implement `std_mean` tensor ops
* Fix order of `std_mean` returns
* Implement median filter
* Implement dynamic time warping
* Implement `neg` tensor op
* Throw error if audio sent to processor is not a `Float32Array`
* Add `round` helper function
* [WIP] Implement basic version of word-level-timestamps
Known issues:
- timestamps not correct for index > 0
- punctuation not same as python version
* Fix typo
* Fix timestamps
* Round to 2 decimals
* Fix punctuation
* Fix typing
* Remove debug statements
* Cleanup code
* Cleanup
* Remove debug statements
* Update JSDoc for extract token timestamps function
* Add return type for `std_mean` tensor function
* Improve typing of private whisper tokenizer functions
* Indicate method is private
* Allow whisper feature extractor to be called with Float64Array input
* Fix typo
* Throw error if `cross_attentions` are not present in model output when extracting token timestamps
* Throw error during generate function
* Allow whisper models to be exported with `output_attentions=True`
* Add alignment heads to generation config
* Remove print statement
* Update versions
* Override protobufjs version
* Update package-lock.json
* Require onnx==1.13.1 for conversion
Will update once onnxruntime-web supports onnx IR version 9
* Add unit test for word-level timestamps
* Extract add attentions function out of `generate`
* Fix `findLongestCommonSequence` return types
* Downgrade back to onnxruntime 1.14.0
1.15.1 is a little to unstable right now.
* Cleanup
- use `.map`
- rename variables
* Update comments
* Add examples for how to transcribe w/ word-level timestamps
* Add example for transcribing/translating audio longer than 30 seconds
* Make example more compact
* Only run encoder with required inputs
* Add basic whisper unit tests
* Add newline after heading for docs
* Add unit test for transcribing english with timestamps
* Add multilingual test case