transformers

Commit Graph

Author	SHA1	Message	Date
SaulLu	1025a9b742	add a warning in `SpmConverter` for sentencepiece's model using the byte fallback feature (#16629 ) * update proto sentencepiece model * Revert "update proto sentencepiece model" This reverts commit `b07f671747`. * add check * add test * Revert "Revert "update proto sentencepiece model"" This reverts commit `46108257b8`. * test for log level * test for log level 2 * warning at the warning level * clean * format * add explanation in docstring	2022-04-11 11:06:10 +02:00
Sylvain Gugger	81156d20cd	Add model like (#14992 ) * Add new model like command * Bad doc-styler * black and doc-styler, stop fighting! * black and doc-styler, stop fighting! * At last * Clean up * Typo * Bad doc-styler * Bad doc-styler * All good maybe? * Use constants * Add doc and type hints * More cleaning * Add doc * Fix Copied from * Doc template * Use typing.Pattern instead * Framework-specific files * Fixes * Select frameworks clean model init * Deal with frameworks in main init * fixes * Last fix * Prompt user for info * Delete exemple config * Last fixes * Add test config * Fix bug with model_type included in each other * Fixes * More fixes * More fixes * Adapt config * Remove print statements * Will fix tokenization later, leave it broken for now * Add test * Quality * Try this way * Debug * Maybe by setting the path? * Let's try another way * It should go better when actually passing the arg... * Remove debug statements and style * Fix config * Add tests * Test require the three backends * intermediate commit * Revamp pattern replacements and start work on feature extractors * Adapt model info * Finalize code for processors * Fix in main init additions * Finish questionnaire for processing classes * Fix file name * Fix for real * Fix patterns * Style * Remove needless warnings * Copied from should work now. * Include Copied form in blocks * Add test * More fixes and tests * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comment Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2022-01-24 15:25:10 -05:00
Ryokan RI	824fd44fc3	Feature/fix slow test in mluke (#14749 ) * make MLukeTokenizerTest fast * make LukeTokenizerTest fast * add entry to _toctree.yaml	2021-12-22 06:35:59 -05:00
Patrick von Platen	ee4fa2e465	[AutoProcessor] Add Wav2Vec2WithLM & small fix (#14675 ) * [AutoProcessor] Add Wav2Vec2WithLM & small fix * revert line removal * Update src/transformers/__init__.py * add test * up * up * small fix	2021-12-08 15:51:28 +01:00
Sylvain Gugger	204d251310	Auto processor (#14465 ) * Add AutoProcessor class * Init and tests * Add doc * Fix init * Update src/transformers/models/auto/processing_auto.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Reverts to tokenizer or feature extractor when available * Adapt test Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-11-22 12:17:38 -05:00
Patrick von Platen	9f3aa46f45	Add Unispeech & Unispeech-SAT (#13963 ) * unispeech * add copy from * remove hubert copy from * finish for today * add unispeech-sat * adapt more * up * up * up * up * add modeling * add tests * up * up * finish * up * Apply suggestions from code review * up * up * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * up * up Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-10-26 18:59:58 +02:00
Patrick von Platen	8e908c8c74	[AutoTokenizer] Allow creation of tokenizers by tokenizer type (#13668 ) * up * up	2021-09-22 00:29:38 +02:00
Nathan Raw	79815090ea	Fix img classification tests (#13456 ) * ✅ Update image-classification example's tests * 🔥 remove cats_and_dogs test samples * 💄 fix flake8	2021-09-07 05:58:45 -04:00
Nathan Raw	76c4d8bf26	✨ Add PyTorch image classification example (#13134 ) * ✨ add pytorch image classification example * 🔥 remove utils.py * 💄 fix flake8 style issues * 🔥 remove unnecessary line * ✨ limit dataset sizes * 📌 update reqs * 🎨 restructure - use datasets lib * 🎨 import transforms directly * 📝 add comments * 💄 style * 🔥 remove flag * 📌 update requirement warning * 📝 add vision README.md * 📝 update README.md * 📝 update README.md * 🎨 add image-classification tag to model card * 🚚 rename vision ➡️ image-classification * 📝 update image-classification README.md	2021-09-02 13:29:42 -06:00
NielsRogge	d3eacbb829	Add DETR (#11653 ) * Squash all commits of modeling_detr_v7 branch into one * Improve docs * Fix tests * Style * Improve docs some more and fix most tests * Fix slow tests of ViT, DeiT and DETR * Improve replacement of batch norm * Restructure timm backbone forward * Make DetrForSegmentation support any timm backbone * Fix name of output * Address most comments by @LysandreJik * Give better names for variables * Conditional imports + timm in setup.py * Address additional comments by @sgugger * Make style, add require_timm and require_vision to testsé * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone * Add png files to fixtures * Fix type hint * Add timm to workflows * Add `BatchNorm2d` to the weight initialization * Fix retain_grad test * Replace model checkpoints by Facebook namespace * Fix name of checkpoint in test * Add user-friendly message when scipy is not available * Address most comments by @patrickvonplaten * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner * Better initialization * Scipy is necessary to get sklearn metrics * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel * Make style * Improve docs and add 2 community notebooks Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-09 11:51:13 -04:00
NielsRogge	fa84540e98	Vit deit fixes (#11309 ) * Improve docs of DeiT and ViT, add community notebook * Add gitignore for test_samples * Add notebook with Trainer Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-05-12 11:46:02 -04:00
Lysandre Debut	39084ca663	Add the ImageClassificationPipeline (#11598 ) * Add the ImageClassificationPipeline * Code review Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com> * Have `load_image` at the module level Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2021-05-07 08:08:40 -04:00
Sylvain Gugger	dabeb15292	Examples reorg (#11350 ) * Base move * Examples reorganization * Update references * Put back test data * Move conftest * More fixes * Move test data to test fixtures * Update path * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments and clean Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-04-21 11:11:20 -04:00
Sylvain Gugger	403d530eec	Auto feature extractor (#11097 ) * AutoFeatureExtractor * Init and first tests * Tests * Damn you gitignore * Quality * Defensive test for when not all backends are here * Use pattern for Speech2Text models	2021-04-06 19:20:08 -04:00
Nicolas Patry	c9837a0d27	Conversion from slow to fast for BPE spm vocabs contained an error. (#10120 ) * Conversion from slow to fast for BPE spm vocabs contained an error. - There is only 1 test currently (tokenizers + slow) that used the modified path and it's reformer, which does not contain any ids modification so the bug was silent for now. - The real issue is that vocab variable was overloaded by SentencePieceExtractor, leading to Slow specific vocab oddities to be completely ignored - The bug was reported here https://github.com/huggingface/transformers/issues/9518 - Ran the complete tokenization test suite with slow without error (`RUN_SLOW=1 pytest -sv tests/test_tokenization_`) Remove rebase error. * Adding the fixture.	2021-02-13 08:24:53 -05:00
Sylvain Gugger	e4c06ed664	New run_seq2seq script (#9605 ) * New run_seq2seq script * Add tests * Mark as slow * Update examples/seq2seq/run_seq2seq.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/data/data_collator.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/data/data_collator.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Address review comments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2021-01-19 15:22:17 -05:00
Sylvain Gugger	9a25c5bd3a	Add new run_swag example (#9175 ) * Add new run_swag example * Add check * Add sample * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Very important change to make Lysandre happy Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-12-18 14:19:24 -05:00
Sylvain Gugger	447808c85f	New squad example (#8992 ) * Add new SQUAD example * Same with a task-specific Trainer * Address review comment. * Small fixes * Initial work for XLNet * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Final clean up and working XLNet script * Test and debug * Final working version * Add new SQUAD example * Same with a task-specific Trainer * Address review comment. * Small fixes * Initial work for XLNet * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Final clean up and working XLNet script * Test and debug * Final working version * Add tick * Update README * Address review comments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-12-08 14:39:29 -05:00
Sylvain Gugger	908a28894c	Add new token classification example (#8340 ) * Add new token classification example * Remove txt file * Add test * With actual testing done * Less warmup is better * Update examples/token-classification/run_ner_new.py Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> * Address review comments * Fix test * Make Lysandre happy * Last touches and rename * Rename in tests * Address review comments * More run_ner -> run_ner_old Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>	2020-11-09 11:39:55 -05:00
Sylvain Gugger	2e5052d4f1	New run glue script (#7917 ) * Start simplification * More progress * Finished script * Address comments and update tests instructions * Wrong test * Accept files as inputs and fix test * Update src/transformers/trainer_utils.py Co-authored-by: Julien Chaumond <chaumond@gmail.com> * Fix labels and add combined score * Add special labels * Update TPU command * Revert to old label strategy * Use model labels * Fix for STT-B * Styling * Apply suggestions from code review Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> * Code styling * Fix review comments Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>	2020-10-22 11:42:22 -04:00
Thomas Wolf	ba8c4d0ac0	[Dependencies\|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659 ) * splitting fast and slow tokenizers [WIP] * [WIP] splitting sentencepiece and tokenizers dependencies * update dummy objects * add name_or_path to models and tokenizers * prefix added to file names * prefix * styling + quality * spliting all the tokenizer files - sorting sentencepiece based ones * update tokenizer version up to 0.9.0 * remove hard dependency on sentencepiece 🎉 * and removed hard dependency on tokenizers 🎉 * update conversion script * update missing models * fixing tests * move test_tokenization_fast to main tokenization tests - fix bugs * bump up tokenizers * fix bert_generation * update ad fix several tokenizers * keep sentencepiece in deps for now * fix funnel and deberta tests * fix fsmt * fix marian tests * fix layoutlm * fix squeezebert and gpt2 * fix T5 tokenization * fix xlnet tests * style * fix mbart * bump up tokenizers to 0.9.2 * fix model tests * fix tf models * fix seq2seq examples * fix tests without sentencepiece * fix slow => fast conversion without sentencepiece * update auto and bert generation tests * fix mbart tests * fix auto and common test without tokenizers * fix tests without tokenizers * clean up tests lighten up when tokenizers + sentencepiece are both off * style quality and tests fixing * add sentencepiece to doc/examples reqs * leave sentencepiece on for now * style quality split hebert and fix pegasus * WIP Herbert fast * add sample_text_no_unicode and fix hebert tokenization * skip FSMT example test for now * fix style * fix fsmt in example tests * update following Lysandre and Sylvain's comments * Update src/transformers/testing_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/testing_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-10-18 20:51:24 +02:00
Stas Bekman	b0f05e0c4c	[pegasus] Faster tokenizer tests (#7672 )	2020-10-09 11:10:32 -04:00
Yu Liu	762cba3bda	Albert pretrain datasets/ datacollator (#6168 ) * add dataset for albert pretrain * datacollator for albert pretrain * naming, comprehension, file reading change * data cleaning is no needed after this modification * delete prints * fix a bug * file structure change * add tests for albert datacollator * remove random seed * add back len and get item function * sample file for testing and test code added * format change for black * more format change * Style * var assignment issue resolve * add back wrongly deleted DataCollatorWithPadding in init file * Style Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2020-09-10 07:56:29 -04:00
Julien Chaumond	0ae96ff8a7	BIG Reorganize examples (#4213 ) * Created using Colaboratory * [examples] reorganize files * remove run_tpu_glue.py as superseded by TPU support in Trainer * Bugfix: int, not tuple * move files around	2020-05-07 13:48:44 -04:00
Julien Chaumond	4d1c98c012	AutoConfig + other Auto classes honor model_type	2020-01-11 02:46:17 +00:00
alberduris	81d6841b4b	GPU text generation: mMoved the encoded_prompt to correct device	2020-01-06 15:11:12 +01:00
alberduris	dd4df80f0b	Moved the encoded_prompts to correct device	2020-01-06 15:11:12 +01:00
Aymeric Augustin	067395d5c5	Move tests outside of library.	2019-12-22 13:47:17 +01:00

28 Commits