* Add `CodeLlamaTokenizer`
* Add `codellama` for testing
* Update default quantization settings
* Refactor `PretrainedModel`
* Remove unnecessary error message
* Update llama-code-tokenizer test
* Add support for `GPTNeoX` models
* Fix `GPTNeoXPreTrainedModel` config
* Add support for `GPTJ` models
* Add support for `WavLM` models
* Update list of supported models
- CodeLlama
- GPT NeoX
- GPT-J
- WavLM
* Add support for XLM models
* Add support for `ResNet` models
* Add support for `BeiT` models
* Fix casing of `BeitModel`
* Remove duplicate code
* Update variable name
* Remove `ts-ignore`
* Remove unnecessary duplication
* Update demo model sizes
* [demo] Update default summarization parameters
* Update default quantization parameters for new models
* Remove duplication in mapping
* Update list of supported marian models
* Add support for `CamemBERT` models
* Add support for `MBart` models
* Add support for `OPT` models
* Add `MBartTokenizer` and `MBart50Tokenizer`
* Add example of multilingual translation with MBart models
* Add `CamembertTokenizer`
* Add support for `HerBERT` models
* Add support for `XLMTokenizer`
* Fix `fuse_unk` config
* Do not remove duplicate keys for `Unigram` models
See https://huggingface.co/camembert-base for an example of a Unigram tokenizer that has two tokens with the same value (`<unk>`)
* Update HerBERT supported model text
* Update generate_tests.py
* Update list of supported models
* Use enum object instead of classes for model types
Fixes https://github.com/xenova/transformers.js/issues/283
* Add link to issue
* Update dependencies for unit tests
* Add `sentencepiece` as a testing requirement
* Add `protobuf` to test dependency
* Remove duplicated models to test
* Make // @ts-ignore obsolete for _call overrides by respecting LSP
* oops can't be undefined, back to how it was
* Use `...unused` instead to fix LSP errors
* Add support `DeiT` models
* Add `Swin` models for image classification
* Add support for `yolos` models
* Add `YolosFeatureExtractor`
* Remove unused import
* Update list of supported models
* Remove SAM for now
Move SAM support to next release
* Add example code for zero shot image classification
* Add example code for text classification pipeline
* Fix links to custom usage from pipelines docs
Reported on discord https://discord.com/channels/879548962464493619/1142943169068154950/1142943169068154950
* Fix relative links
* Rename .mdx -> .md
GitHub recently changed how mdx files are displayed, breaking a lot of the formatting. So, we just use .md now (same as transformers)
* Add example code for token classification pipeline
* Add example code for fill-mask pipeline
* Add text2text and summarization pipeline examples
* Add example code for image segmentation pipeline
* Remove redundant `@extends Pipeline`
* Add example code for image-to-text pipeline
* Cleanup example code outputs
* Cleanup JSDoc
* Cleanup pipeline example code
* Update codegen example
* Add example `wav2vec2` models
* Add support for `CTCDecoder` and `Wav2Vec2CTCTokenizer`
* Generate tokenizer.json files for wav2vec2 models
* Fix wav2vec2 custom tokenizer generation
* Implement wav2vec2 audio-speech-recognition
* Add `Wav2Vec2` as a supported architecture
* Update README.md
* Update generate_tests.py
* Ignore invalid tests
* Update supported wav2vec2 models
* Update supported_models.py
* Simplify pipeline construction
* Implement basic audio classification pipeline
* Update default topk value for audio classification pipeline
* Add example usage for the audio classification pipeline
* Move `loadAudio` to utils file
* Add audio classification unit test
* Add wav2vec2 ASR unit test
* Improve generated wav2vec2 tokenizer json
* Update supported_models.py
* Allow `added_tokens_regex` to be null
* Support exporting mms vocabs
* Supported nested vocabularies
* Update supported tasks and models
* Add warnings to ignore language and task for wav2vec2 models
Will add in future
* Mark internal methods as private
* Add typing to audio variable
* Update node-audio-processing.mdx
* Move node-audio-processing to guides
* Update table of contents
* Add example code for performing feature extraction w/ `Wav2Vec2Model`
NOTE: feature extraction of MMS models is currently broken in the python library, but it works correctly here. See
https://github.com/huggingface/transformers/issues/25485 for more info
* Refactor `Pipeline` class params
* Fix `pipeline` function
* Fix typo in `pipeline` JSDoc
* Fix second typo
* Add `M2M100Tokenizer`
* Allow `added_tokens` list to be empty
* Apply hot-fix for issue in HF's `M2M100Tokenizer`
* Skip M2M100 tokenizer tests for now
TODO: Remove when https://github.com/huggingface/transformers/pull/25478 is merged
* Fix `_build_translation_inputs` for `M2M100Tokenizer`
* Add example code in JSDoc for `TranslationPipeline`
* Update supported_models.py
* Create basic tokenizer playground app
* Default to no display when user adding large body of text
* Optimize BPE algorithm
- Use map instead of object for `bpe_ranks`
- Replace reduction in BPE algorithm with for loop
- Avoid conversions between sets and arrays
* Use for loop to avoid stack issues with `.push(...items)`
* Fix `mergeArrays` typing
* Remove unnecessary try-catch block in BPE
* Add Llama, T5, and BERT tokenizers to the playground
* Improve how BERT/T5 tokens are displayed
* Improve how token margins are displayed
* Use `Map` for cache
* Add efficient heap-based priority queue implementation
* Add more unit tests for LlamaTokenizer
Selected from https://github.com/belladoreai/llama-tokenizer-js/blob/master/llama-tokenizer.js#L381-L452
* Implement priority-queue-based BPE algorithm
* Remove old code
* Update `bpe` docstring
* Add `data-structures` page to docs
* Update JSDoc for data-structures.js
* Update data-structures.js
* Move `TokenLattice` and `CharTrie` to data-structures module
* Minor refactoring
* Update extension to be module
* Update example extension
* Allow user to specify a custom cache system
* Implement custom cache system
Emulates the Web Cache API using chrome's local storage API
* Use custom cache system in extension
* Fix serialization
* Remove old folders
* Update extension readme
* Add note about JSON requirement for local storage
* Define custom CLIP ONNX configs
* Update conversion script
* Support specifying custom model file name
* Use int64 for CLIP input ids
* Add support for CLIP text and vision models
* Fix JSDoc
* Add docs for `CLIPTextModelWithProjection`
* Add docs for `CLIPVisionModelWithProjection`
* Add unit test for CLIP text models
* Add unit test for CLIP vision models
* Set resize precision to 3 decimal places
* Fix `RawImage.save()` function
* Throw error when reading image and status != 200
* Create basic semantic image search application
* Separate out components
* Add `update-database` script
* Update transformers.js version
* Add new tokenizer unit test (#199)
* Perform `NFKC` normalization for sentencepiece models w/ precompiled charmap
* Fix JSDoc indentation
* Add problematic string to unit tests
* Use consistent BPE split token
* Add second problematic string