transformers.js

Commit Graph

Author	SHA1	Message	Date
Joshua Lochner	f181e135d4	[version] Update to 2.4.2	2023-07-22 05:08:12 +02:00
Joshua Lochner	1165f04a9f	Fix BPE tokenization for weird whitespace characters (Closes #199 ) (#208 ) * Add new tokenizer unit test (#199) * Perform `NFKC` normalization for sentencepiece models w/ precompiled charmap * Fix JSDoc indentation * Add problematic string to unit tests * Use consistent BPE split token * Add second problematic string	2023-07-22 04:51:11 +02:00
Joshua Lochner	86e68bf9c0	Add support for private/gated model access (Closes #198 ) (#202 ) * Allow user to specify HF token as an environment variable * Add documentation for how to make authorized requests * Improve docs	2023-07-21 17:31:37 +02:00
Joshua Lochner	00c0e2935e	Fix documentation (Closes #201 )	2023-07-21 16:57:46 +02:00
Joshua Lochner	a298de39f3	Update list of examples - Add "Doodle Dash" - Reorder	2023-07-11 16:12:27 +02:00
Joshua Lochner	2e812458e4	Fix object-detection demo	2023-07-11 16:07:01 +02:00
Joshua Lochner	4e947aa657	[version] Update to 2.4.1	2023-07-11 02:12:28 +02:00
Joshua Lochner	f112349a28	Object-detection pipeline improvements + better documentation (#189 ) * Fix variable name * Add pipeline loading options section * Align object detection pipeline output with python library * Update unit tests * Update batched object detection unit test * Relax object detection unit tests	2023-07-11 02:09:03 +02:00
Joshua Lochner	13efa96122	Fix padding and truncation in pipelines (#190 )	2023-07-11 02:07:53 +02:00
Joshua Lochner	316d10e6ec	[version] Update to 2.4.0	2023-07-10 00:14:44 +02:00
Joshua Lochner	4e21189a0a	Fix loading of grayscale images in node.js (#181 ) * Ensure the image loaded by sharp.js has the correct number of channels * Do not assume default channels	2023-07-09 23:22:08 +02:00
Joshua Lochner	86de50d0f2	Whisper word-level timestamps (#184 ) * Support outputting attentions in generate function * Add unit tests for concatenating tensors * Implement `cat` for `dim>0` * Add `cat` unit tests for > 2 tensors * Allow for negative indexing + bounds checking * Add test case for `cat` with negative indexing * Clean up `safeIndex` helper function * Allow indexing error message to include dimension * Reuse `safeIndex` helper function for `normalize_` * Optimize `cat` indexing * Implement `stack` tensor operation + add unit tests * Add TODOs * Implement `mean` tensor operation * Implement `std_mean` tensor ops * Fix order of `std_mean` returns * Implement median filter * Implement dynamic time warping * Implement `neg` tensor op * Throw error if audio sent to processor is not a `Float32Array` * Add `round` helper function * [WIP] Implement basic version of word-level-timestamps Known issues: - timestamps not correct for index > 0 - punctuation not same as python version * Fix typo * Fix timestamps * Round to 2 decimals * Fix punctuation * Fix typing * Remove debug statements * Cleanup code * Cleanup * Remove debug statements * Update JSDoc for extract token timestamps function * Add return type for `std_mean` tensor function * Improve typing of private whisper tokenizer functions * Indicate method is private * Allow whisper feature extractor to be called with Float64Array input * Fix typo * Throw error if `cross_attentions` are not present in model output when extracting token timestamps * Throw error during generate function * Allow whisper models to be exported with `output_attentions=True` * Add alignment heads to generation config * Remove print statement * Update versions * Override protobufjs version * Update package-lock.json * Require onnx==1.13.1 for conversion Will update once onnxruntime-web supports onnx IR version 9 * Add unit test for word-level timestamps * Extract add attentions function out of `generate` * Fix `findLongestCommonSequence` return types * Downgrade back to onnxruntime 1.14.0 1.15.1 is a little to unstable right now. * Cleanup - use `.map` - rename variables * Update comments * Add examples for how to transcribe w/ word-level timestamps * Add example for transcribing/translating audio longer than 30 seconds * Make example more compact	2023-07-09 23:21:43 +02:00
Joshua Lochner	aceab9bf3d	Update supported models list	2023-07-01 04:46:09 +02:00
Joshua Lochner	1563b434bc	[version] Update to 2.3.1	2023-07-01 04:15:13 +02:00
Joshua Lochner	f2a2aeea44	Add xlm-roberta models (Fixes #177 ) (#178 )	2023-07-01 03:50:57 +02:00
lsb	3c8b15e39e	Update onnx.js (#174 ) * Update onnx.js * Update regex test for iOS 16.4 user agent --------- Co-authored-by: Joshua Lochner <admin@xenova.com>	2023-07-01 03:28:49 +02:00
Joshua Lochner	1bf7958cfa	Add example code for running text-generation models (#175 ) * Add example code for running text-generation models * Fix non-greedy sampling functions * Update samplers * Remove duplicate requirement `onnxruntime` is specified in `optimum[onnxruntime]` * Align `generate` function output with python library Include starting tokens in output * [docs] Add example text-generation code * Update demo site text streaming for causal language models * Override default code highlighting for operators * Fix order of link	2023-07-01 03:04:00 +02:00
Joshua Lochner	1914c0784d	Fix conversion to grayscale	2023-06-29 23:38:52 +02:00
Julien Chaumond	6eb924b7b1	Add `RobertaForTokenClassification` and an example checkpoint on Hub (#170 )	2023-06-29 20:15:58 +02:00
Joshua Lochner	27d7ea489b	Improvements to documentation (#172 ) * link to the conversion Space for maximum simplicity * add some types to script (very optional) * typo * no need for trailing slash here * Node is also a valid option * Document how to find a compatible checkpoint on the hub * Update README * Fix typing * Update docs index --------- Co-authored-by: Julien Chaumond <julien@huggingface.co>	2023-06-29 19:32:17 +02:00
Joshua Lochner	a5ca113d51	[WIP] New model/tokenizer types (#165 ) * Recursively replace tensors with custom class * Add mobile vit models * Add example code for `ImageClassificationPipeline` * Fix example urls * Add MobileViT models and processors * Update optimum requirement in conversion script Previous name is deprecated * Update supported models * Update supported_models.py * Update supported_models.py * Update tokenizer test generator script * Add special test case for falcon tokenizers * Update tokenizer test script * Add support for `FalconTokenizer` * Update `BertPreTokenizer` call parameter types * Add `GPTNeoXTokenizer` tokenizer (mpt) * Use transformers from source when testing * Reuse `prepare_model_inputs` function type Better than using `@see {@link ... }` since it works with intellisense.	2023-06-28 15:14:44 +02:00
Joshua Lochner	8d6622ef9b	[version] Update to 2.3.0	2023-06-22 15:36:02 +02:00
Joshua Lochner	c491c2661f	Do not use browser cache if inaccessible (Fixes #162 )	2023-06-22 15:32:52 +02:00
Pushpender Saini	15854f9cd6	Set chunk timestamp to rounded time (#160 )	2023-06-22 01:01:06 +02:00
Joshua Lochner	f628b841a8	Allow user to set `per_channel` and `reduce_range` quantization params (#156 ) (#157 ) * Allow user to set `per_channel` and `reduce_range` quantization parameters (#156) Also save quantization options * Get operators of graph and subgraphs	2023-06-22 00:43:43 +02:00
Joshua Lochner	d90f58110a	Add whisper unit tests (#155 ) * Only run encoder with required inputs * Add basic whisper unit tests * Add newline after heading for docs * Add unit test for transcribing english with timestamps * Add multilingual test case	2023-06-21 23:58:16 +02:00
Joshua Lochner	4804171180	Do not use spread operator to concatenate large arrays (Closes #153 ) (#154 ) * Do not use spread operator for merging large arrays (Fix #153) * Add unit test for encoding long strings	2023-06-21 01:21:14 +02:00
Joshua Lochner	573012b434	[docs] Add tutorial + example app for server-side whisper (#147 ) * Update typo in node tutorial * Create node audio processing tutorial * Point to tutorial in `read_audio` function * Rename `.md` to `.mdx` * Add node audio processing tutorial to table of contents * Add link to model in tutorial * Update error message grammar	2023-06-20 23:10:33 +02:00
Joshua Lochner	35b9e21193	Support calling of decoder-only models (Fixes #137 ) (#149 ) * Override `LOAD_FUNCTION` for decoder-only models * Use object destructuring in `_call` functions * Allow decoder-only models to be called * Fix detection of default call function * Update default `_call` JSDoc * Mark helper functions as private * Remove outdated comments * Fix JSDoc * Rename functions * Specify model types Reduces major code duplication * Improve model output classes * Remove `encoder_input_name` from seq2seq forward method * Extract `validateInputs` helper function from `sessionRun` * Move `compare` helper function to separate utility file * Default `model_type` to null * Reduce duplication when loading models using `.from_pretrained` * Add unit tests for loading models using `.from_pretrained()` * Compute attention mask for decoder if not given * Improve decoder attention computation * Implement `flatten` and `view` tensor ops * Add documentation for new tensor ops * Fix `flatten` input types	2023-06-20 15:24:35 +02:00
Joshua Lochner	d279ec3c86	Add question-answering example (Closes #144 )	2023-06-10 00:16:09 +02:00
Joshua Lochner	035f69f79a	[version] Update to 2.2.0	2023-06-09 15:18:29 +02:00
Joshua Lochner	f7ffef7f43	Add whisper web example	2023-06-09 15:15:23 +02:00
Joshua Lochner	8625f4aba3	Add multilingual transcription + translation for whisper models (#87 , #95 ) (#133 ) * Align `.generate()` return type with python library * Add multilingual transcription + translation for whisper models (#87, #95) * Include `return_timestamps` in calculation of `forced_decoder_ids` * Only return non-null `forced_decoder_ids` * Allow user to specify task in any case * Only set `forced_decoder_ids` when non-empty * Implement `SuppressTokensAtBeginLogitsProcessor`	2023-06-09 15:09:42 +02:00
Mishig	f5f78c4663	[doc build] Use secrets (#139 )	2023-06-09 14:42:14 +02:00
Joshua Lochner	3b546b41f5	Fix JSDoc example	2023-06-06 15:56:49 +02:00
Joshua Lochner	0e28f5314e	Add JSDoc examples for `RawImage` usage	2023-06-06 15:53:26 +02:00
Joshua Lochner	0ecf9606be	Post-process jsdoc2md marker links (#120 ) * Post-process jsdoc2md marker links * Add `group` class to certain anchor tags	2023-06-05 17:17:21 +02:00
Joshua Lochner	acc526fe13	Update module version in README.	2023-06-05 02:30:10 +02:00
Joshua Lochner	e5e460bd3b	[version] Update to 2.1.1	2023-06-02 02:11:24 +02:00
Joshua Lochner	f517807131	Fix caching in the browser	2023-06-02 02:05:22 +02:00
Joshua Lochner	34b0e8b88f	[version] Update to v2.1.0	2023-06-01 14:42:27 +02:00
Joshua Lochner	3eea52d43b	Merge pull request #130 from xenova/merged Improved feature extraction, quantization, and testing (merged PR)	2023-06-01 14:22:58 +02:00
Joshua Lochner	658fbdcdd6	Add JSDoc for `PretrainedConfig` class	2023-06-01 13:57:45 +02:00
Joshua Lochner	54b861a284	Update configs.js JSDoc	2023-06-01 13:46:40 +02:00
Joshua Lochner	db3da29376	Implement `fetch` workaround Should prevent SocketError's from occurring	2023-06-01 01:09:14 +02:00
Joshua Lochner	1520012b43	Update translation test case	2023-05-31 23:58:48 +02:00
Joshua Lochner	25fb4a6f49	Update supported_models.py	2023-05-31 23:38:23 +02:00
Joshua Lochner	7fa7dc6f71	Merge branch 'sentence-transformers' into merged	2023-05-31 21:08:40 +02:00
Joshua Lochner	569f3f820a	[docs] Add JSDoc for configs.js	2023-05-31 19:28:44 +02:00
Joshua Lochner	76dc236e3f	Fix tests import	2023-05-31 18:48:06 +02:00

1 2 3 4 5 ...

897 Commits All Branches Search

897 Commits

All Branches