* Add link to optimum docs for supported architectures
Closes#288
* Refactor `SUPPORTED_MODELS` dict to include task
* Update example model id
* Update list of supported models
* Update generate_tests.py
* Remove requirement of `output_attentions` revision
* Add demo site to examples section (closes#233)
* Fix typo
* Include examples in docs index
* Update github issue templates
* Create config.yml
* Order supported models
* Cleanup
* Update 4_feature-request.yml
* Add example code for zero shot image classification
* Add example code for text classification pipeline
* Fix links to custom usage from pipelines docs
Reported on discord https://discord.com/channels/879548962464493619/1142943169068154950/1142943169068154950
* Fix relative links
* Rename .mdx -> .md
GitHub recently changed how mdx files are displayed, breaking a lot of the formatting. So, we just use .md now (same as transformers)
* Add example code for token classification pipeline
* Add example code for fill-mask pipeline
* Add text2text and summarization pipeline examples
* Add example code for image segmentation pipeline
* Remove redundant `@extends Pipeline`
* Add example code for image-to-text pipeline
* Cleanup example code outputs
* Cleanup JSDoc
* Cleanup pipeline example code
* Update codegen example
* Add example `wav2vec2` models
* Add support for `CTCDecoder` and `Wav2Vec2CTCTokenizer`
* Generate tokenizer.json files for wav2vec2 models
* Fix wav2vec2 custom tokenizer generation
* Implement wav2vec2 audio-speech-recognition
* Add `Wav2Vec2` as a supported architecture
* Update README.md
* Update generate_tests.py
* Ignore invalid tests
* Update supported wav2vec2 models
* Update supported_models.py
* Simplify pipeline construction
* Implement basic audio classification pipeline
* Update default topk value for audio classification pipeline
* Add example usage for the audio classification pipeline
* Move `loadAudio` to utils file
* Add audio classification unit test
* Add wav2vec2 ASR unit test
* Improve generated wav2vec2 tokenizer json
* Update supported_models.py
* Allow `added_tokens_regex` to be null
* Support exporting mms vocabs
* Supported nested vocabularies
* Update supported tasks and models
* Add warnings to ignore language and task for wav2vec2 models
Will add in future
* Mark internal methods as private
* Add typing to audio variable
* Update node-audio-processing.mdx
* Move node-audio-processing to guides
* Update table of contents
* Add example code for performing feature extraction w/ `Wav2Vec2Model`
NOTE: feature extraction of MMS models is currently broken in the python library, but it works correctly here. See
https://github.com/huggingface/transformers/issues/25485 for more info
* Refactor `Pipeline` class params
* Fix `pipeline` function
* Fix typo in `pipeline` JSDoc
* Fix second typo
* Create basic tokenizer playground app
* Default to no display when user adding large body of text
* Optimize BPE algorithm
- Use map instead of object for `bpe_ranks`
- Replace reduction in BPE algorithm with for loop
- Avoid conversions between sets and arrays
* Use for loop to avoid stack issues with `.push(...items)`
* Fix `mergeArrays` typing
* Remove unnecessary try-catch block in BPE
* Add Llama, T5, and BERT tokenizers to the playground
* Improve how BERT/T5 tokens are displayed
* Improve how token margins are displayed
* Use `Map` for cache
* Add efficient heap-based priority queue implementation
* Add more unit tests for LlamaTokenizer
Selected from https://github.com/belladoreai/llama-tokenizer-js/blob/master/llama-tokenizer.js#L381-L452
* Implement priority-queue-based BPE algorithm
* Remove old code
* Update `bpe` docstring
* Add `data-structures` page to docs
* Update JSDoc for data-structures.js
* Update data-structures.js
* Move `TokenLattice` and `CharTrie` to data-structures module
* Minor refactoring
* link to the conversion Space for maximum simplicity
* add some types to script (very optional)
* typo
* no need for trailing slash here
* Node is also a valid option
* Document how to find a compatible checkpoint on the hub
* Update README
* Fix typing
* Update docs index
---------
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Only run encoder with required inputs
* Add basic whisper unit tests
* Add newline after heading for docs
* Add unit test for transcribing english with timestamps
* Add multilingual test case
* Update typo in node tutorial
* Create node audio processing tutorial
* Point to tutorial in `read_audio` function
* Rename `.md` to `.mdx`
* Add node audio processing tutorial to table of contents
* Add link to model in tutorial
* Update error message grammar