transformers.js

Commit Graph

Author	SHA1	Message	Date
Joshua Lochner	5216fb461d	Fix `ByteLevel` pretokenizer * Re-enable other whisper tests * Fix `ByteLevel` pretokenizer Only add prefix space to first word, when option is enabled.	2023-09-10 00:37:04 +02:00
Joshua Lochner	ad7e8758bc	[version] Update to 2.6.0	2023-09-08 15:41:59 +02:00
Joshua Lochner	9a3339239e	New models and refactoring (#276 ) * Add `CodeLlamaTokenizer` * Add `codellama` for testing * Update default quantization settings * Refactor `PretrainedModel` * Remove unnecessary error message * Update llama-code-tokenizer test * Add support for `GPTNeoX` models * Fix `GPTNeoXPreTrainedModel` config * Add support for `GPTJ` models * Add support for `WavLM` models * Update list of supported models - CodeLlama - GPT NeoX - GPT-J - WavLM * Add support for XLM models * Add support for `ResNet` models * Add support for `BeiT` models * Fix casing of `BeitModel` * Remove duplicate code * Update variable name * Remove `ts-ignore` * Remove unnecessary duplication * Update demo model sizes * [demo] Update default summarization parameters * Update default quantization parameters for new models * Remove duplication in mapping * Update list of supported marian models * Add support for `CamemBERT` models * Add support for `MBart` models * Add support for `OPT` models * Add `MBartTokenizer` and `MBart50Tokenizer` * Add example of multilingual translation with MBart models * Add `CamembertTokenizer` * Add support for `HerBERT` models * Add support for `XLMTokenizer` * Fix `fuse_unk` config * Do not remove duplicate keys for `Unigram` models See https://huggingface.co/camembert-base for an example of a Unigram tokenizer that has two tokens with the same value (`<unk>`) * Update HerBERT supported model text * Update generate_tests.py * Update list of supported models * Use enum object instead of classes for model types Fixes https://github.com/xenova/transformers.js/issues/283 * Add link to issue * Update dependencies for unit tests * Add `sentencepiece` as a testing requirement * Add `protobuf` to test dependency * Remove duplicated models to test	2023-09-08 15:17:05 +02:00
Joshua Lochner	109a7f9711	Fix unit test	2023-09-04 23:53:05 +02:00
Joshua Lochner	dbea8a2990	Update to `checkout@v4` See https://github.com/actions/checkout/issues/1448 for more info.	2023-09-04 23:20:57 +02:00
Hermann Rolfes	1488079f81	Make // @ts-ignore obsolete for _call overrides by respecting LSP (#278 ) * Make // @ts-ignore obsolete for _call overrides by respecting LSP * oops can't be undefined, back to how it was * Use `...unused` instead to fix LSP errors	2023-09-04 23:06:44 +02:00
Joshua Lochner	57f2b5cd17	Add support for MPT models (Fixes #166 ) (#272 ) * Add support for MPT models * Fix `use_cache_branch` * Update list of supported models	2023-09-02 22:17:01 +02:00
Joshua Lochner	96b9143b33	Update masked-lm tests	2023-09-02 03:47:06 +02:00
Joshua Lochner	9077c21540	Add support for BLOOM models (#273 ) * Add support for Bloom models * Update `BloomTokenizer` to fix the default (invalid) regex * Update supported models * Update default quantization settings for bloom models * Fix `use_cache_branch`	2023-09-01 22:07:04 +02:00
Joshua Lochner	62159eb383	Fix `CustomWhisperOnnxConfig`	2023-09-01 16:14:49 +02:00
Joshua Lochner	0c2dcc7498	[version] Update to 2.5.4	2023-08-28 20:07:06 +02:00
Joshua Lochner	09cf91abd0	Add `DeiT`, `Swin`, and `Yolos` vision models (#262 ) * Add support `DeiT` models * Add `Swin` models for image classification * Add support for `yolos` models * Add `YolosFeatureExtractor` * Remove unused import * Update list of supported models * Remove SAM for now Move SAM support to next release	2023-08-28 17:29:15 +02:00
Joshua Lochner	f0573175fd	Add `DeiTFeatureExtractor`	2023-08-26 23:54:27 +02:00
Per Harald Borgen	76b8556110	Rename how-to guides to developer guides (#261 )	2023-08-25 17:56:18 +02:00
Joshua Lochner	7076c8e401	[version] Update to 2.5.3	2023-08-22 23:31:00 +02:00
josephrocca	9bb6923242	[docs] Add links and compatible models to supported tasks table (#257 )	2023-08-22 23:19:48 +02:00
Joshua Lochner	3fab8265cb	Update whisper unit test (#258 )	2023-08-22 22:18:17 +02:00
Joshua Lochner	9c449c151c	Fix caching for LFS files from the Hugging Face Hub (#251 ) * Fix model caching for LFS files from the HF Hub * Ignore local model check on demo site	2023-08-22 18:28:37 +02:00
Joshua Lochner	f61cc66e0e	Fix link to API reference	2023-08-22 17:19:49 +02:00
Joshua Lochner	c3af596443	Fix word-level timestamps for non-English languages w/ Whisper (#253 ) * Fix language detection * Remove debug statement * Fix punctuation regex for whisper decoding (Closes #223) * Fix word-level timestamps for audio < 30 seconds Issue in python library: https://github.com/huggingface/transformers/issues/25605 PR for above: https://github.com/huggingface/transformers/pull/25607 * Add multilingual transcription w/ word-level timestamps unit test * Fix unit tests	2023-08-22 15:50:30 +02:00
Joshua Lochner	276bdd06b8	Improve pipeline docs (w/ example code) - closes #134 (#255 ) * Add example code for zero shot image classification * Add example code for text classification pipeline * Fix links to custom usage from pipelines docs Reported on discord https://discord.com/channels/879548962464493619/1142943169068154950/1142943169068154950 * Fix relative links * Rename .mdx -> .md GitHub recently changed how mdx files are displayed, breaking a lot of the formatting. So, we just use .md now (same as transformers) * Add example code for token classification pipeline * Add example code for fill-mask pipeline * Add text2text and summarization pipeline examples * Add example code for image segmentation pipeline * Remove redundant `@extends Pipeline` * Add example code for image-to-text pipeline * Cleanup example code outputs * Cleanup JSDoc * Cleanup pipeline example code * Update codegen example	2023-08-22 04:30:56 +02:00
Joshua Lochner	254e99ef9a	[version] Update to 2.5.2	2023-08-14 22:55:54 +02:00
Joshua Lochner	d479953a62	[WIP] Add MMS and Wav2Vec2 models (Closes #209 ) (#220 ) * Add example `wav2vec2` models * Add support for `CTCDecoder` and `Wav2Vec2CTCTokenizer` * Generate tokenizer.json files for wav2vec2 models * Fix wav2vec2 custom tokenizer generation * Implement wav2vec2 audio-speech-recognition * Add `Wav2Vec2` as a supported architecture * Update README.md * Update generate_tests.py * Ignore invalid tests * Update supported wav2vec2 models * Update supported_models.py * Simplify pipeline construction * Implement basic audio classification pipeline * Update default topk value for audio classification pipeline * Add example usage for the audio classification pipeline * Move `loadAudio` to utils file * Add audio classification unit test * Add wav2vec2 ASR unit test * Improve generated wav2vec2 tokenizer json * Update supported_models.py * Allow `added_tokens_regex` to be null * Support exporting mms vocabs * Supported nested vocabularies * Update supported tasks and models * Add warnings to ignore language and task for wav2vec2 models Will add in future * Mark internal methods as private * Add typing to audio variable * Update node-audio-processing.mdx * Move node-audio-processing to guides * Update table of contents * Add example code for performing feature extraction w/ `Wav2Vec2Model` NOTE: feature extraction of MMS models is currently broken in the python library, but it works correctly here. See https://github.com/huggingface/transformers/issues/25485 for more info * Refactor `Pipeline` class params * Fix `pipeline` function * Fix typo in `pipeline` JSDoc * Fix second typo	2023-08-14 22:18:44 +02:00
Joshua Lochner	060ac830fc	Add M2M100 tokenizer (Closes #235 ) (#250 ) * Add `M2M100Tokenizer` * Allow `added_tokens` list to be empty * Apply hot-fix for issue in HF's `M2M100Tokenizer` * Skip M2M100 tokenizer tests for now TODO: Remove when https://github.com/huggingface/transformers/pull/25478 is merged * Fix `_build_translation_inputs` for `M2M100Tokenizer` * Add example code in JSDoc for `TranslationPipeline` * Update supported_models.py	2023-08-14 17:22:20 +02:00
Joshua Lochner	cc4b857d54	Add problem type (Fixes #248 ) (#249 ) * Add support for `problem_type` in text classification * Add unit test for `multi_label_classification` problem type * Update supported_models.py	2023-08-14 16:35:13 +02:00
Joshua Lochner	d7a734342c	Update tokenizer example documentation (Closes #245 )	2023-08-13 23:24:50 +02:00
Joshua Lochner	2f70a5d37c	Fix typo in supported-tasks snippet	2023-08-11 01:45:39 +02:00
Celso Dias	cfdfe9c6f1	word corrected in readme (#247 )	2023-08-11 01:41:20 +02:00
Joshua Lochner	b420a8841e	[version] Update to 2.5.1	2023-08-09 22:25:53 +02:00
Joshua Lochner	46dd49064f	[Llama + LLama2] Add model support (#232 ) * Add support for llama models * Fix JSDoc	2023-08-09 13:35:28 +02:00
Joshua Lochner	1e157ba2d8	Add support for Deberta models (#244 ) * add documentation for zero shot classification * add multi_label example * review comments * edit examples data * Add deberta and deberta-v2 model definitions * Update model mapping * Implement missing `Strip` normalizer * Add deberta and deberta-v2 tokenizers * Add fast path to `Strip` normalizer * Add token types to deberta tokenizer output * Update supported_models.py * Fix default Precompiled normalization * Update supported models list * Update JSDoc * Support `not_entailment` label * Update mult-label example JSDoc --------- Co-authored-by: Aschen <amaret93@gmail.com>	2023-08-09 11:58:16 +02:00
Joshua Lochner	db7d0f0f83	Tokenization improvements (#234 ) * Create basic tokenizer playground app * Default to no display when user adding large body of text * Optimize BPE algorithm - Use map instead of object for `bpe_ranks` - Replace reduction in BPE algorithm with for loop - Avoid conversions between sets and arrays * Use for loop to avoid stack issues with `.push(...items)` * Fix `mergeArrays` typing * Remove unnecessary try-catch block in BPE * Add Llama, T5, and BERT tokenizers to the playground * Improve how BERT/T5 tokens are displayed * Improve how token margins are displayed * Use `Map` for cache * Add efficient heap-based priority queue implementation * Add more unit tests for LlamaTokenizer Selected from https://github.com/belladoreai/llama-tokenizer-js/blob/master/llama-tokenizer.js#L381-L452 * Implement priority-queue-based BPE algorithm * Remove old code * Update `bpe` docstring * Add `data-structures` page to docs * Update JSDoc for data-structures.js * Update data-structures.js * Move `TokenLattice` and `CharTrie` to data-structures module * Minor refactoring	2023-08-08 12:11:35 +02:00
Joshua Lochner	ebc9722305	Update supported_models.py	2023-08-01 22:27:40 +02:00
Joshua Lochner	d2a0aa9133	Add link to semantic image search application	2023-08-01 18:56:51 +02:00
Joshua Lochner	a9a955c76f	Update .env.local.example	2023-08-01 18:55:46 +02:00
Joshua Lochner	99db37864d	Update semantic image search example README	2023-08-01 18:55:41 +02:00
Joshua Lochner	b1537e28dc	Create package-lock.json	2023-08-01 15:30:52 +02:00
Joshua Lochner	9aa1a29dac	[version] Update to 2.5.0	2023-08-01 14:24:56 +02:00
Joshua Lochner	f867226c7e	Improve browser extension sample/template (#196 ) * Update extension to be module * Update example extension * Allow user to specify a custom cache system * Implement custom cache system Emulates the Web Cache API using chrome's local storage API * Use custom cache system in extension * Fix serialization * Remove old folders * Update extension readme * Add note about JSON requirement for local storage	2023-08-01 14:23:21 +02:00
Joshua Lochner	2fde656791	Add support for computing CLIP image and text embeddings separately (Closes #148 ) (#227 ) * Define custom CLIP ONNX configs * Update conversion script * Support specifying custom model file name * Use int64 for CLIP input ids * Add support for CLIP text and vision models * Fix JSDoc * Add docs for `CLIPTextModelWithProjection` * Add docs for `CLIPVisionModelWithProjection` * Add unit test for CLIP text models * Add unit test for CLIP vision models * Set resize precision to 3 decimal places * Fix `RawImage.save()` function * Throw error when reading image and status != 200 * Create basic semantic image search application * Separate out components * Add `update-database` script * Update transformers.js version	2023-08-01 14:01:04 +02:00
Joshua Lochner	27920d8483	[version] Update to 2.4.4	2023-07-28 13:28:37 +02:00
Joshua Lochner	2015c685c7	Add Starcoder model support + demo (#225 ) * Add support for `gpt_bigcode` models * Create basic code-completion sample application * Update sidebar * Remove debug statement * Disable 1B model (for now) * Display progress bars * Reuse config if not specified * Update supported_models.py * Update comment * Add temperature/sample/topk generation params * Update sidebar * Add `gpt_bigcode` to supported models list * Add code playground example * Update title * Cleanup * Ignore `bigcode/starcoderbase-1b` from tests * Update transformers.js version for demo	2023-07-28 13:24:32 +02:00
Joshua Lochner	da67f41434	[version] Update to 2.4.3	2023-07-27 06:06:42 +02:00
Joshua Lochner	961c0cf860	Add MPNet to README	2023-07-27 06:01:50 +02:00
Joshua Lochner	f163f1a318	Add support for `mpnet` models (#221 )	2023-07-27 05:59:23 +02:00
Joshua Lochner	09ff83b90e	Create example next.js application (Closes #210 ) (#211 ) * Create example next app * Link to example app * Update next configs * Create tutorial for next.js application * Update next.js tutorial * Rename project `next` -> `next-client` * Clone `next-server` from `next-client` * Update next.config.js for server-side inference * Create basic server-side next.js application * Update example links * Update subheading for client-side next.js app * Update next.config.js files * Create example Dockerfile * Update next tutorial to include server-side inference * Improve wording * Update Dockerfile * Add step to create a Dockerfile * Update examples snippet * Fix wording	2023-07-26 01:48:13 +02:00
Joshua Lochner	f181e135d4	[version] Update to 2.4.2	2023-07-22 05:08:12 +02:00
Joshua Lochner	1165f04a9f	Fix BPE tokenization for weird whitespace characters (Closes #199 ) (#208 ) * Add new tokenizer unit test (#199) * Perform `NFKC` normalization for sentencepiece models w/ precompiled charmap * Fix JSDoc indentation * Add problematic string to unit tests * Use consistent BPE split token * Add second problematic string	2023-07-22 04:51:11 +02:00
Joshua Lochner	86e68bf9c0	Add support for private/gated model access (Closes #198 ) (#202 ) * Allow user to specify HF token as an environment variable * Add documentation for how to make authorized requests * Improve docs	2023-07-21 17:31:37 +02:00
Joshua Lochner	00c0e2935e	Fix documentation (Closes #201 )	2023-07-21 16:57:46 +02:00

... 2 3 4 5 6 ...

1043 Commits All Branches Search

1043 Commits

All Branches