Commit Graph

8054 Commits

Author SHA1 Message Date
Sylvain Gugger cd66539662
Don't modify labels inplace in `LabelSmoother` (#13464) 2021-09-08 07:45:36 -04:00
Suraj Patil c164c651dc
[CLIP] fix logit_scale init (#13436)
* fix logit_scale init

* add logit_scale_init_value as config param
2021-09-08 14:21:13 +05:30
Kevin Canwen Xu f667d5b260
Deprecate Mirror for Downloading (#13470)
* Deprecated Mirror

* revert

* revert

* revert

* fix
2021-09-08 16:09:44 +08:00
Suraj Patil f5d3bb1dd2
fix CLIP conversion script (#13474) 2021-09-08 12:57:18 +05:30
shabie 4be082ce39
[docs] update dead quickstart link on resuing past for GPT2 (#13455)
* [docs] update dead quickstart link on resuing past for GPT2

Thed dead link have been replaced by two links of forward and call methods of the GPT2 class for torch and tensorflow respectively.

* [docs] fix formatting for gpt2 page update
2021-09-07 16:57:58 -04:00
Anton Lozhkov 2146833767
Add unit_divisor to downloads (#13468) 2021-09-07 13:47:52 -07:00
guillaume-be 63b90a51aa
Optimized bad word ids (#13433)
* Optimized bad word ids generation

* Fixed optimized bad token ids

* Updated style
2021-09-07 16:51:04 +02:00
Nicolas Patry 5c7789d416
Fixing by correctly raising UnicodeDecodeError. (#13449) 2021-09-07 16:45:45 +02:00
Nathan Raw 79815090ea
Fix img classification tests (#13456)
*  Update image-classification example's tests

* 🔥 remove cats_and_dogs test samples

* 💄 fix flake8
2021-09-07 05:58:45 -04:00
Anurag Kumar 92d4ef9ab0
Update setup.py (#13421) 2021-09-06 17:32:24 -04:00
Shiv Dhar 75858ca156
Update version of `packaging` package (#13454) 2021-09-06 17:19:02 -04:00
Anton Lozhkov f8363e49f9
Install libsndfile (#13403) 2021-09-06 17:12:43 -04:00
NielsRogge 5642a555ae
Add TAPAS MLM-only models (#13408)
* Add conversion of TapasForMaskedLM

* Add copied from statements
2021-09-06 19:19:30 +02:00
Suraj Patil 2dd975b235
skip image classification test (#13451) 2021-09-06 21:46:25 +05:30
Nils Reimers c8be8a9adb
Update model configs - Allow setters for common properties (#13026)
* refactor GPT Config to allow dyn. properties

* make attribute_map a class attribute

* remove old code

* update unit test to test config: Add test for common properties setter

* update unit test to test config: Add test for common properties passed as parameters to __init__

* update to black code format

* Allow that setters are not defined for certain config classes

* update config classes to implement attribute_map

* bugfix lxmert config - id2labels was not defined when num_labels was set

* update broken configs - add attribute_maps

* update bart config

* update black codestyle

* update documentation on common config attributes

* update GPTJ config to new attribute map

* update docs on common attributes

* gptj config: add max_position_embeddings

* gptj config: format with black

* update speech to text 2 config

* format doc file to max_len 119

* update config template
2021-09-06 16:30:13 +02:00
Nicolas Patry cf4eb8b3f9
Adding a test for multibytes unicode. (#13447)
* Adding a test for multibytes unicode.

* Adding some accents.

* Making sure decoding works.

* Make tests passing by being cheesy.
2021-09-06 16:11:23 +02:00
Patrick von Platen 607611f240
up (#13448) 2021-09-06 16:09:24 +02:00
Suraj Patil 6b29bff852
add torchvision in example test requirements (#13438) 2021-09-06 15:17:54 +02:00
Anton Lozhkov 26700a9516
Fix scheduled tests for `SpeechEncoderDecoderModel` (#13422)
* Add inputs to pretrained tests

* Make style
2021-09-06 14:55:13 +02:00
Yih-Dar 73ad258806
Fix tests without any real effect (#13406)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2021-09-06 14:51:45 +02:00
Nathan Raw 76c4d8bf26
Add PyTorch image classification example (#13134)
*  add pytorch image classification example

* 🔥 remove utils.py

* 💄 fix flake8 style issues

* 🔥 remove unnecessary line

*  limit dataset sizes

* 📌 update reqs

* 🎨 restructure - use datasets lib

* 🎨 import transforms directly

* 📝 add comments

* 💄 style

* 🔥 remove flag

* 📌 update requirement warning

* 📝 add vision README.md

* 📝 update README.md

* 📝 update README.md

* 🎨 add image-classification tag to model card

* 🚚 rename vision ➡️ image-classification

* 📝 update image-classification README.md
2021-09-02 13:29:42 -06:00
Patrick von Platen 9bd5d97cdd
up (#13396) 2021-09-02 18:47:09 +02:00
Patrick von Platen efa4f5f0ea
fix (#13395) 2021-09-02 18:11:26 +02:00
Aman Madaan 596bb85f2f
[docs] Update perplexity.rst to use negative log likelihood (#13386)
* [docs] Update perplexity.rst to use negative log likelihood

Model `forward` returns the negative log likelihood. The document correctly defines and calculates perplexity, but the description and variable names are inconsistent, which might cause confusion.

* [docs] restyle perplexity.rst
2021-09-02 07:49:12 -04:00
Apoorv Garg b91e65afe0
Correct order of overflowing_tokens for slow tokenizer (#13179)
* correct order of overflowing_tokens for slow tokenizer (issue fix #13148)

* python 3.9 requires sentencepiece version 0.1.94 or above

* slicing of ids fixed in truncated_sequence()

* Update setup.py

* Correct order of overflowing tokens for pair of sentences

* code reformatted

* Update tokenization_utils_base.py

* reformatting file

* test to check single_input added

* missing function restored

* test to check pair_input overflowing tokens order

* test to check pair_input overflowing tokens order

* test to check pair_input overflowing tokens order

* added an error message for pair of seq and longest_first strategy

* test for pair_input modified

* variable name corrected

* fixed a typo in error message

* requested changes implemented

* required test added

* Corrected the message to match test message

* added error message for Luke Tokenizer

* lost test recovered

* docstring for truncate_sequences and prepare_for_model updated

* docstring for luke tokenizer updated

* updated ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING

* aligned text and fixed puncuatations

* improved style and quality of code

* fixed error_msg in truncate_sequences

* replaced encode_plus method with regular call method

* clean up

* rephrased the docstring
2021-09-02 05:58:23 -04:00
Nicolas Patry c9184a2e03
Enabling automatic loading of tokenizer with `pipeline` for (#13376)
`audio-classification`.
2021-09-02 05:37:42 -04:00
Suraj Patil e92140c567
fix example (#13387) 2021-09-02 11:32:18 +02:00
NielsRogge 4114c9a75b
Add tokenizer docs (#13373) 2021-09-02 09:46:05 +02:00
Sachin Abeywardana 872e6be03d
Update clip loss calculation (#13217)
* Update clip loss calculation

Hello, I'm the author of the blog you took the snippet from. I think this way of calculating is possibly slightly more accurate for calculation.

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-09-02 12:15:56 +05:30
Eduardo Gonzalez Ponferrada 0a22335e66
[Flax/run_hybrid_clip] Fix duplicating images when captions_per_image exceeds the number of captions, enable truncation 2021-09-02 11:19:49 +05:30
Sylvain Gugger c1c2d68d37
Fix name and get_class method in AutoFeatureExtractor (#13385) 2021-09-01 20:54:49 -04:00
Patrick von Platen a105c9b776
fix (#13383) 2021-09-01 23:12:01 +02:00
Patrick von Platen 4475f1dc2a
[Flax] Fix BigBird (#13380)
* finish

* finish
2021-09-01 18:33:54 +02:00
Lysandre Debut ecd5397106
Fix RemBERT (#13375) 2021-09-01 11:11:32 -04:00
Lysandre Debut 33b7c9a8aa
Add missing feature extractors (#13374) 2021-09-01 11:10:49 -04:00
Anton Lozhkov 2406892a2e
Add `Hubert` to the `AutoFeatureExtractor` (#13366)
* Add Hubert to the auto feature extractor

* Fix import structure
2021-09-01 18:09:02 +03:00
Sylvain Gugger 6b3532643f
Properly register missing submodules in main init (#13372) 2021-09-01 10:57:43 -04:00
NielsRogge 4b7988eb49
Fix assertion (#13369) 2021-09-01 16:42:59 +02:00
SaulLu c4d78f01de
Fix tokenizer saving during training with `Trainer` (#12806)
* add test in trainer and test tokenizer saving wi
th trainer

* quality

* reverse trainer changes

* replace test in test_trainer by a test for all the tokenizers

* format

* add can_save_slow_tokenizer attribute to all tokenizers

* fix Herbert

* format

* Change comment in error

* add comments and a new assert

* Update src/transformers/models/albert/tokenization_albert_fast.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change ValueError barthez

* change ValueError BigBird

* change ValueError Camembert

* change ValueError Mbart50

* change ValueError Pegasus

* change ValueError ReFormer

* change ValueError T5

* change ValueError RoBERTa

* XLNET fast

* Update tests/test_tokenization_common.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change `assert` into `self.assertIn`

* format

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-09-01 16:32:56 +02:00
Sylvain Gugger c1b20e42f5 Redeploy stable documentation 2021-09-01 09:21:50 -04:00
Li-Huai (Allan) Lin 85cb447766 Revert "Correct wrong function signatures on the docs website (#13198)"
This reverts commit ffecfea949.
2021-09-01 09:17:08 -04:00
NielsRogge 4766e009b0
Improve T5 docs (#13240)
* Remove disclaimer

* First draft

* Fix rebase

* Improve docs some more

* Add inference section

* Improve example scripts section

* Improve code examples of modeling files

* Add docs regarding task prefix

* Address @craffel's comments

* Apply suggestions from @patrickvonplaten's review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add suggestions from code review

* Apply @sgugger's suggestions

* Fix Flax code examples

* Fix index.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-09-01 15:05:40 +02:00
donggyukimc ba1b3db709
fix wrong 'cls' masking for bigbird qa model output (#13143) 2021-09-01 14:03:16 +02:00
Sylvain Gugger 7a26307e31
Fixes for the documentation (#13361) 2021-09-01 07:54:28 -04:00
Patrick von Platen 0b8c84e110
Add SpeechEncoderDecoder & Speech2Text2 (#13186)
* fix_torch_device_generate_test

* remove @

* up

* correct some bugs

* correct model

* finish speech2text extension

* up

* up

* up

* up

* Update utils/custom_init_isort.py

* up

* up

* update with tokenizer

* correct old tok

* correct old tok

* fix bug

* up

* up

* add more tests

* up

* fix docs

* up

* fix some more tests

* add better config

* correct some more things
"

* fix tests

* improve docs

* Apply suggestions from code review

* Apply suggestions from code review

* final fixes

* finalize

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* apply suggestions Lysandre and Sylvain

* apply nicos suggestions

* upload everything

* finish

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: your_github_username <your_github_email>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-09-01 13:33:31 +02:00
Lysandre Debut 9396b40433
Fix GPT-J _CHECKPOINT_FOR_DOC typo (#13368) 2021-09-01 06:57:43 -04:00
Hamid Shojanazeri 53ee995ac9
Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing for ConvBert (#12287)
* added token_type_ids buffer to fix the issue #5664

* Handling the case that position_id buffer is not registered

* added token_type_ids buffer to fix the issue #5664

* modified to support device conversion when the model is traced
2021-09-01 04:47:58 -04:00
Hamid Shojanazeri 5adf5cab2f
Fix for the issue of device-id getting hardcoded for position-ids during Tracing for Distillbert (#12290)
* registered buffer for position-ids to address issues similar to issue#5664

* added comment

* added the flag to prevent from adding the buffer into the state_dict
2021-09-01 04:47:25 -04:00
Hamid Shojanazeri 5d1a3d135c
Fix for the issue of device-id getting hardcoded for position-ids during Tracing for Flaubert (#12292)
* adding position_ids buffer to fix the issue simialr to #5664

* adding position-id buffer to address similar issues to #5664
2021-09-01 04:46:58 -04:00
Lysandre Debut 58e999b7e6
Torchscript test for Flaubert (#13353)
* Torchscript test for Flaubert

* Update tests/test_modeling_flaubert.py

* Update tests/test_modeling_flaubert.py
2021-09-01 04:44:31 -04:00