* Allow custom kwargs in `tokenizer.apply_chat_template`
* Update jinja dependency version
* Add `tokenizer_kwargs` options
* Add support for dictionaries of chat templates in the tokenizer config
* Add `CohereTokenizer`
* `apply_chat_template` is no longer async
* Add unit test for multiple chat templates
* Update tokenizers.js
* Also update when `chat_template` is undefined
* Support setting tokenizer and text from URL
* Update Claude tokenizer display name
* Add Cohere Command-R tokenizer to playground
* Add `Grok1Tokenizer`
* Throw error if chat template object is malformed
* Improved error checking
* Remove redundant error check
* `template_dict` can be a null-prototype object
* Add vocoder to export
* Add tokenizer.json export for speecht5 models
* Update speecht5 supported models
* Create `SpeechT5Tokenizer`
* Add `ones` and `ones_like` tensor functions
* Add support for speecht5 text-to-speech
* Disambiguate `SpeechSeq2Seq` and `Seq2SeqLM`
* Create `TextToAudioPipeline`
* Add listed support for `text-to-audio` / `text-to-speech`
* Use unquantized vocoder by default
* Skip speecht5 unit tests for now
Due to bug in transformers: https://github.com/huggingface/transformers/issues/26547
* Update example pipeline output
* Create simple in-browser TTS demo
* Add template README
* Delete package-lock.json
* Update required transformers.js version
* Add link to Transformers.js
* Double -> Single quotes
* Add link to text-to-speech demo
* Update sample speaker embeddings
* Update transformers.js version
* Use Singleton object in electron tutorial
* Create package-lock.json
* Remove models folder
* Remove step for copying models to local folder
* Add `CodeLlamaTokenizer`
* Add `codellama` for testing
* Update default quantization settings
* Refactor `PretrainedModel`
* Remove unnecessary error message
* Update llama-code-tokenizer test
* Add support for `GPTNeoX` models
* Fix `GPTNeoXPreTrainedModel` config
* Add support for `GPTJ` models
* Add support for `WavLM` models
* Update list of supported models
- CodeLlama
- GPT NeoX
- GPT-J
- WavLM
* Add support for XLM models
* Add support for `ResNet` models
* Add support for `BeiT` models
* Fix casing of `BeitModel`
* Remove duplicate code
* Update variable name
* Remove `ts-ignore`
* Remove unnecessary duplication
* Update demo model sizes
* [demo] Update default summarization parameters
* Update default quantization parameters for new models
* Remove duplication in mapping
* Update list of supported marian models
* Add support for `CamemBERT` models
* Add support for `MBart` models
* Add support for `OPT` models
* Add `MBartTokenizer` and `MBart50Tokenizer`
* Add example of multilingual translation with MBart models
* Add `CamembertTokenizer`
* Add support for `HerBERT` models
* Add support for `XLMTokenizer`
* Fix `fuse_unk` config
* Do not remove duplicate keys for `Unigram` models
See https://huggingface.co/camembert-base for an example of a Unigram tokenizer that has two tokens with the same value (`<unk>`)
* Update HerBERT supported model text
* Update generate_tests.py
* Update list of supported models
* Use enum object instead of classes for model types
Fixes https://github.com/xenova/transformers.js/issues/283
* Add link to issue
* Update dependencies for unit tests
* Add `sentencepiece` as a testing requirement
* Add `protobuf` to test dependency
* Remove duplicated models to test
* Add example `wav2vec2` models
* Add support for `CTCDecoder` and `Wav2Vec2CTCTokenizer`
* Generate tokenizer.json files for wav2vec2 models
* Fix wav2vec2 custom tokenizer generation
* Implement wav2vec2 audio-speech-recognition
* Add `Wav2Vec2` as a supported architecture
* Update README.md
* Update generate_tests.py
* Ignore invalid tests
* Update supported wav2vec2 models
* Update supported_models.py
* Simplify pipeline construction
* Implement basic audio classification pipeline
* Update default topk value for audio classification pipeline
* Add example usage for the audio classification pipeline
* Move `loadAudio` to utils file
* Add audio classification unit test
* Add wav2vec2 ASR unit test
* Improve generated wav2vec2 tokenizer json
* Update supported_models.py
* Allow `added_tokens_regex` to be null
* Support exporting mms vocabs
* Supported nested vocabularies
* Update supported tasks and models
* Add warnings to ignore language and task for wav2vec2 models
Will add in future
* Mark internal methods as private
* Add typing to audio variable
* Update node-audio-processing.mdx
* Move node-audio-processing to guides
* Update table of contents
* Add example code for performing feature extraction w/ `Wav2Vec2Model`
NOTE: feature extraction of MMS models is currently broken in the python library, but it works correctly here. See
https://github.com/huggingface/transformers/issues/25485 for more info
* Refactor `Pipeline` class params
* Fix `pipeline` function
* Fix typo in `pipeline` JSDoc
* Fix second typo
* Create basic tokenizer playground app
* Default to no display when user adding large body of text
* Optimize BPE algorithm
- Use map instead of object for `bpe_ranks`
- Replace reduction in BPE algorithm with for loop
- Avoid conversions between sets and arrays
* Use for loop to avoid stack issues with `.push(...items)`
* Fix `mergeArrays` typing
* Remove unnecessary try-catch block in BPE
* Add Llama, T5, and BERT tokenizers to the playground
* Improve how BERT/T5 tokens are displayed
* Improve how token margins are displayed
* Use `Map` for cache
* Add efficient heap-based priority queue implementation
* Add more unit tests for LlamaTokenizer
Selected from https://github.com/belladoreai/llama-tokenizer-js/blob/master/llama-tokenizer.js#L381-L452
* Implement priority-queue-based BPE algorithm
* Remove old code
* Update `bpe` docstring
* Add `data-structures` page to docs
* Update JSDoc for data-structures.js
* Update data-structures.js
* Move `TokenLattice` and `CharTrie` to data-structures module
* Minor refactoring
* Update extension to be module
* Update example extension
* Allow user to specify a custom cache system
* Implement custom cache system
Emulates the Web Cache API using chrome's local storage API
* Use custom cache system in extension
* Fix serialization
* Remove old folders
* Update extension readme
* Add note about JSON requirement for local storage
* Define custom CLIP ONNX configs
* Update conversion script
* Support specifying custom model file name
* Use int64 for CLIP input ids
* Add support for CLIP text and vision models
* Fix JSDoc
* Add docs for `CLIPTextModelWithProjection`
* Add docs for `CLIPVisionModelWithProjection`
* Add unit test for CLIP text models
* Add unit test for CLIP vision models
* Set resize precision to 3 decimal places
* Fix `RawImage.save()` function
* Throw error when reading image and status != 200
* Create basic semantic image search application
* Separate out components
* Add `update-database` script
* Update transformers.js version
* Add example code for running text-generation models
* Fix non-greedy sampling functions
* Update samplers
* Remove duplicate requirement
`onnxruntime` is specified in `optimum[onnxruntime]`
* Align `generate` function output with python library
Include starting tokens in output
* [docs] Add example text-generation code
* Update demo site text streaming for causal language models
* Override default code highlighting for operators
* Fix order of link
* link to the conversion Space for maximum simplicity
* add some types to script (very optional)
* typo
* no need for trailing slash here
* Node is also a valid option
* Document how to find a compatible checkpoint on the hub
* Update README
* Fix typing
* Update docs index
---------
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Update typo in node tutorial
* Create node audio processing tutorial
* Point to tutorial in `read_audio` function
* Rename `.md` to `.mdx`
* Add node audio processing tutorial to table of contents
* Add link to model in tutorial
* Update error message grammar