* Add `Sequence` PostProcessor
Required by https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/
* Support `return_token_type_ids`
* Add llama3 tokenizer to unit tests
* Add test for allowing user to request for token type ids
* Add JSDoc
* Update generate_tests.py
* Make blob as valid URL
* Create function to detect the blob URI
* Change to `isValidUrl`
* Remove comment
Co-authored-by: Joshua Lochner <admin@xenova.com>
* Merge `isValidHttpUrl` into `isValidUrl`
* Correct implement
* Update docs
* Add test
* Remove export for `isValidUrl`
* Test read blob via `getFile`
* Use `res.text()` instead `res.body`
---------
Co-authored-by: Joshua Lochner <admin@xenova.com>
* Add `return_full_text` option for text-generation models
* [wip] Support chat inputs in text-generation pipeline
* Align return type with python version
* Remove conversational task (moved to text-generation)
* Fix typos
* Allow custom kwargs in `tokenizer.apply_chat_template`
* Update jinja dependency version
* Add `tokenizer_kwargs` options
* Add support for dictionaries of chat templates in the tokenizer config
* Add `CohereTokenizer`
* `apply_chat_template` is no longer async
* Add unit test for multiple chat templates
* Update tokenizers.js
* Also update when `chat_template` is undefined
* Support setting tokenizer and text from URL
* Update Claude tokenizer display name
* Add Cohere Command-R tokenizer to playground
* Add `Grok1Tokenizer`
* Throw error if chat template object is malformed
* Improved error checking
* Remove redundant error check
* `template_dict` can be a null-prototype object
* Add tensor permute unit tests
* Rename transpose -> permute
* Fix padding for non-square images
* Add vitmatte padding unit test
* Create `RawImage.toTensor` helper function
* Add bankers rounding test case
* `.toBe()` -> `.toBeCloseTo()` for floating point numbers
* Add povey window function
* Add `SeamlessM4TFeatureExtractor`
* Add support for wav2vec2-bert models
* Add `SeamlessM4TFeatureExtractor` processor unit tests
* Add pipeline support for `wav2vec2-bert` models
* Update JSDoc
* Update SamModel
* Make `AutoModel.from_pretrained` work with SamModel
* Add listed support for SAM (Segment Anything Model)
* Update types of `calculateDimensions`
* Throw error if reading image from tensor with dims.length != 3
* Make SamProcessor input points optional
* Fix type errors
* `let` -> `const`
* `cat` -> `stack`
* Expose `reshape_input_points` in `SamProcessor`
* Add `input_labels` input parameter for SAM
* Add `input_labels` to sam processor
* Update SAM unit tests
* Remove TODOs
* Update JSDoc
* Add custom VITS tokenizer converter
* Do not decode if expected input_ids is empty
* Update vits tokenizer tests
* Implement `VitsTokenizer`
* Add support for VITS model
* Support VITS through pipeline API
* Update JSDoc
* Add TTS unit test
* Add speecht5 unit test
* Fix typo
* Fix typo
* Update speecht5 model id
* Add note about using quantized speecht5 in unit tests
* Monkey-patch `BigInt64Array` and `BigUint64Array`
* Add `RoFormerTokenizer
* Use `clean_text` in bert normalizer config
* Add control characters test
* Add support for RoFormer models
* Use default label if id2label is not specified
* Update requirements.txt
* Skip roformer tokenizer tests
* Add basic support for chat templates
* Cleanup
* JSDoc improvements
* Support conversion of user-defined functions
* Cleanup
* Fix function creation
* Add unit tests for templates
* Cleanup
* Improve JSDoc
* Add missing return types
* Add chat templates docs to table of contents
* Add support for logical negation
* Fix nested logical negation
* Add unit tests for logical operators
* Add loop variables
* Add support for `RuntimeValue` built-in functions
* Add unit tests for string instance methods
* Fix conversion of normal function to `FunctionValue`
* Update object method unit tests
* Save chat template to tokenizer_config.json during conversion
* Fix `raise_exception` error
* Add `!=` operator for booleans
* Remember to increment loop index
* Cleanup for loop evaluator
* Use `is` helper function
* Add support for text nodes
i.e., non Jinja statements/expressions
* Add auto-generated templating tests
* Update unit tests
* Remove unused function
* Add default chat templates
* Use repo with up-to-date tokenizer config
* Temporarily disable zephyr test
* Delete templates.test.js
* Move Jinja functionality to `@huggingface/jinja`
* Fix template cache type
* Update chat template unit tests
* Update `@huggingface/jinja` version
* Fix default llama2 system prompt usage
* Add unit test for llama2 w/o chat template set
* Update jinja version
* Update jinja version
* Add unit test for user-defined chat templates
Example from https://discuss.huggingface.co/t/issue-with-llama-2-chat-template-and-out-of-date-documentation/61645/3
* Add `AddedToken` for improved tokenization
* Add example usage for chat templates
* Add 'first' Metaspace pretokenizer prepend scheme
* Formatting
* Update wav2vec2 converter special tokens whitespace split
* Fix Metaspace pretokenizer split criteria
* Update inputs of `PreTokenizerSequence`
* Improve Metaspace pretokenizer
* Update llama tokenizer tests
* Improve handling of legacy llama tokenizer
* Re-enable SPM tests
* Add static tokenizer test cases
* Add llama2 static tests
* Allow user to override legacy tokenizer behaviour in `.from_pretrained`
* Add legacy tokenizer unit tests
* Bump jinja version to 0.1.0
* Update `VitMatteImageProcessor` test comment
* Add support for ChineseCLIP models
* Add chinese-clip to list of supported models
* Sort zero-shot-image-classification results by score (desc)
* Update expected zero-shot image classification output
* Add support for `VitMatte` models
* Add `VitMatteImageProcessor`
* Add `VitMatteImageProcessor` unit test
* Fix typo
* Add example code for `VitMatteForImageMatting`
* Fix JSDoc
* Fix typo
* Add support for ESM models
* Add ESM tokenizer conversion methods
* Add special test cases for ESM tokenizer
* add special tokens in conversion script
* Do not save decoder
* Add special tokens tokenizer test
* Join tokens with space if decoder is null
* Treat all tokens as added tokens
* Use `WhitespaceSplit` pretokenizer
* `<eos>` and `<bos>` are not special tokens
* Update more supported ESM models
* Add `--tokenizer_id` to conversion script
* Add supported models comments
* Add link to optimum docs for supported architectures
Closes#288
* Refactor `SUPPORTED_MODELS` dict to include task
* Update example model id
* Update list of supported models
* Update generate_tests.py
* Remove requirement of `output_attentions` revision
* Add demo site to examples section (closes#233)
* Fix typo
* Include examples in docs index
* Update github issue templates
* Create config.yml
* Order supported models
* Cleanup
* Update 4_feature-request.yml
* Add FFT unit tests
* Refactor maths.js and audio.js
* Refactor audio processors
* Add support for AST models
* Add another audio-classification example
* Add audio processing unit tests
* Implement `log_mel='dB'` in `spectrogram` function
* Add `ClapFeatureExtractor`
* Implement `ClapFeatureExtractor` unit tests
* Add support for `CLAP`
* Add `ZeroShotAudioClassificationPipeline`
* Add listed support for `zero-shot-audio-classification` pipeline tag
* Cleanup
* `let` -> `const`
* Update `mel_filter_bank` unit test
* Add `'Xenova/tiny-random-ClapModel'`
* Add `ClapAudioModelWithProjection` and `ClapTextModelWithProjection`
* Move audio validation to helper function
* Optimize `mel_filter_bank` computation
-30ms
* Update mel filters unit test
* Cleanup
* Optimizations
* Fix jsdoc
* Optimizations
* Add WIP conversion scripts
Will be updated once https://github.com/huggingface/optimum/pull/1552 is merged
* Add `size` getter to `RawImage`
* Add `DPTFeatureExtractor`
* Add depth-estimation w/ DPT models
* Add GLPN models for depth estimation
* Add missing import in example
* Add `DPTFeatureExtractor` processor test
* Add unit test for GLPN processor
* Add support for `GLPNFeatureExtractor`
Uses `size_divisor` to determine resize width and height
* Add `GLPNForDepthEstimation` example code
* Add DPT to list of supported models
* Add GLPN to list of supported models
* Add `DepthEstimationPipeline`
* Add listed support for depth estimation pipeline
* Add depth estimation pipeline unit tests
* Fix formatting
* Update `pipeline` JSDoc
* Fix typo from merge
* Add `NougatTokenizer`
* Add nougat unit tests
* Add support for `NougatImageProcessor`
* Add `crop` function to `RawImage`
* Fix `RawImage` save function
OffscreenCanvas does not have `toDataURL` function
* Add listed support for nougat models
* Fix `min`/`max` function typing
* Add unknown token to tokenizer class
* Implement `NoBadWordsLogitsProcessor`
* Use `NoBadWordsLogitsProcessor` in `generate`
* Fix regex group substitutions
Python uses \1, \2, etc. for group substitutions, but JavaScript uses $1, $2, etc.
* Create `regexSplit` helper function to split but keep delimiter
* Fix splitting for String pattern types
* Fix docstring
* Set `batch_size=1` for owlvit exports
* Add support for owlvit models
* Update default quantization settings
* Add list of supported models
* Revert update of owlvit quantization settings
* Add `OwlViTProcessor`
* Move `get_bounding_box` to utils
* Add `ZeroShotObjectDetectionPipeline`
* Add unit tests
* Add owlvit processor test
* Add listed support for `zero-shot-object-detection`
* Add OWL-ViT to list of supported models
* Update README.md
* Fix typo from merge
* Add `Swin2SRImageProcessor`
* Add `RawImage.fromTensor` helper function
* Add clamp tensor function
* Add support for `.to` data type conversion
* Add `round` tensor function
* Add support for `mul` tensor function
* Fix image padding
* Only perform padding if it will affect size
* Create basic processors unit test suite
* Add SamProcessor test case
* Move `CONTENT_TYPE_MAP` outside `RawImage` class
* Perform reflective padding for swin2sr models
* Add swin2sr models for image super-resolution
* Add listed support for Swin2SR models
* Add image-to-image pipeline
* Add listed support for image-to-image task
* Add image-to-image unit tests
* Add `add` tensor functions
* Generalize `pad_image` helper function
* Add more unit tests for image processors
* Fix typo
* Add vocoder to export
* Add tokenizer.json export for speecht5 models
* Update speecht5 supported models
* Create `SpeechT5Tokenizer`
* Add `ones` and `ones_like` tensor functions
* Add support for speecht5 text-to-speech
* Disambiguate `SpeechSeq2Seq` and `Seq2SeqLM`
* Create `TextToAudioPipeline`
* Add listed support for `text-to-audio` / `text-to-speech`
* Use unquantized vocoder by default
* Skip speecht5 unit tests for now
Due to bug in transformers: https://github.com/huggingface/transformers/issues/26547
* Update example pipeline output
* Create simple in-browser TTS demo
* Add template README
* Delete package-lock.json
* Update required transformers.js version
* Add link to Transformers.js
* Double -> Single quotes
* Add link to text-to-speech demo
* Update sample speaker embeddings
* Add `add_special_tokens` option to tokenizers
* Improve error messages for loading processors
* Add `DonutFeatureExtractor`
* Add `DonutSwinModel` and `MBartForCausalLM` models
* Fix `addPastKeyValues` for `VisionEncoderDecoder` models
* Add `Donut` to list of supported models
* Make encode parameters optional
* Support batched decoder input ids
* Remove unused import
* Add `do_thumbnail` for donut image processing
* Fix `TypeError: decoder_input_ids[i].map is not a function`
* Only pad if width and height specified in size
* Only pad if `pad_size` is defined
* Only cut `decoder_input_ids` if past model output
* Add donut model
* Add example usage to JSDoc for `DonutSwinModel`
* Add support for `DocumentQuestionAnsweringPipeline`
* Add simple document question answering unit test
* Add listed support for document QA pipeline
* Add support for `Blenderbot` models
Closes#37
References #29
* Add support for `BlenderbotTokenizer`
* Add blenderbot to supported models
* Add support for `BlenderbotSmallTokenizer`
* Add custom tests for blenderbot-small
* Add support for `BlenderbotSmall` models
* Update list of supported models
* Improve `addPastKeyValues` function
* Allow skipping of adding encoder past key values
* Add support for `MinNewTokensLengthLogitsProcessor`
* Add support for `MinLengthLogitsProcessor`
* Fix `generation_config` defaults
* Fix `input_ids_seq_length`
* Add unit tests for generation
* Fix generation parameters test case
* Allow specification of multiple `eos_token_ids`
* Add `CodeLlamaTokenizer`
* Add `codellama` for testing
* Update default quantization settings
* Refactor `PretrainedModel`
* Remove unnecessary error message
* Update llama-code-tokenizer test
* Add support for `GPTNeoX` models
* Fix `GPTNeoXPreTrainedModel` config
* Add support for `GPTJ` models
* Add support for `WavLM` models
* Update list of supported models
- CodeLlama
- GPT NeoX
- GPT-J
- WavLM
* Add support for XLM models
* Add support for `ResNet` models
* Add support for `BeiT` models
* Fix casing of `BeitModel`
* Remove duplicate code
* Update variable name
* Remove `ts-ignore`
* Remove unnecessary duplication
* Update demo model sizes
* [demo] Update default summarization parameters
* Update default quantization parameters for new models
* Remove duplication in mapping
* Update list of supported marian models
* Add support for `CamemBERT` models
* Add support for `MBart` models
* Add support for `OPT` models
* Add `MBartTokenizer` and `MBart50Tokenizer`
* Add example of multilingual translation with MBart models
* Add `CamembertTokenizer`
* Add support for `HerBERT` models
* Add support for `XLMTokenizer`
* Fix `fuse_unk` config
* Do not remove duplicate keys for `Unigram` models
See https://huggingface.co/camembert-base for an example of a Unigram tokenizer that has two tokens with the same value (`<unk>`)
* Update HerBERT supported model text
* Update generate_tests.py
* Update list of supported models
* Use enum object instead of classes for model types
Fixes https://github.com/xenova/transformers.js/issues/283
* Add link to issue
* Update dependencies for unit tests
* Add `sentencepiece` as a testing requirement
* Add `protobuf` to test dependency
* Remove duplicated models to test