Commit Graph

11235 Commits

Author SHA1 Message Date
Joao Gante a0f8674303
Generate: TF contrastive search with XLA support (#20050)
* Add contrastive search
2022-11-07 10:54:29 +00:00
Christopher Akiki 504db92e7d
Update hub.py (#20075) 2022-11-04 22:25:02 +01:00
Christopher Akiki 4b86e44693
Update modeling_tf_utils.py (#20076) 2022-11-04 22:24:37 +01:00
amyeroberts d68c46026b
Update defaults and logic to match old FE (#20065)
* Update defaults and logic to match old FE

* Use docker run rest values
2022-11-04 19:14:56 +00:00
Yih-Dar c06d555647
Show installed libraries and their versions in GA jobs (#20069)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:03:18 +01:00
Yih-Dar 2d02178e5c
Allow passing arguments to model testers for CLIP-like models (#20044)
* POC

* For more CLIP-like models

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:01:41 +01:00
Jordan Clive 3bd0007e87
Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068)
Co-authored-by: jordiclive <jordiclive19@imperial.ac.uk>
2022-11-04 11:32:44 -04:00
Matt 6e1c5786dc
Update READMEs for ESMFold and add notebooks (#20067)
* Update READMEs for ESMFold and add notebooks

* Fix PyCharm formatting

* make fix-copies
2022-11-04 15:10:13 +00:00
H. Jhoo 707b12a353
change constant torch.tensor to torch.full (#20061) 2022-11-04 10:41:56 -04:00
NielsRogge 787620e2a2
[Swin] Add Swin SimMIM checkpoints (#20034)
* Fix Swin

* Remove file

* Update code snippet

* Add copied from to maskformer

* Fix docstring

* Add whole name to replace

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-11-04 15:32:44 +01:00
amyeroberts 3936411b9d
PoolformerImageProcessor defaults to match previous FE (#20048)
* Poolformer image processor defaults to previous FE

* Remove unnecessary math.floor
2022-11-04 13:52:58 +00:00
Sanchit Gandhi 94e17c456c
[Trainer] Fix model name in push_to_hub (#20064) 2022-11-04 13:40:21 +00:00
Sourab Mangrulkar 19067711e7
fix `tokenizer_type` to avoid error when loading checkpoint back (#20062) 2022-11-04 19:04:01 +05:30
bhuang 3502c202f9
Update README.md (#20063) 2022-11-04 08:56:54 -04:00
Matt 1076d587b5
Fix ESM LM head test (#20045)
* Fix esm lm head test

* make fixup
2022-11-04 12:45:34 +00:00
Patrick Deutschmann d447c460b1
Speed up TF token classification postprocessing by converting complete tensors to numpy (#19976)
* Speed up TF postprocessing by converting to numpy before

* Fix bug that was triggered when offset_mapping was None

Co-authored-by: Patrick Deutschmann <patrick.deutschmann@dedalus.com>
2022-11-03 16:56:22 +00:00
Sylvain Gugger 06886d5a68
Only resize embeddings when necessary (#20043)
* Only resize embeddings when necessary

* Add comment
2022-11-03 12:05:04 -04:00
Michael Benayoun 9080607b2c
Fixed torch.finfo issue with torch.fx (#20040) 2022-11-03 16:14:44 +01:00
Matt 6f257bb3c2
Update esmfold conversion script (#20028)
* Update ESM conversion script for ESMfold

* Fix bug in ESMFold example

* make fixup and move restypes to one line
2022-11-03 14:58:06 +00:00
Wang, Yi 2564f0c21d fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc (#19891)
* fix jit trace error for classification usecase, update related doc

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add implementation in torch 1.14.0

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update_doc

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update_doc

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-11-03 10:50:03 -04:00
Arthur 737bff6a36
[FuturWarning] Add futur warning for LEDForSequenceClassification (#19066)
* fix led eos_mask

* add Futur Warning

* revert uselesss cahnges

* Update src/transformers/models/led/modeling_led.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-03 15:26:09 +01:00
Sanchit Gandhi 06d488061f
[Whisper Tokenizer] Make more user-friendly (#19921)
* [Whisper Tokenizer] Make more user-friendly

* use property

* make indexing rigorous

* small clean-up

* tests

* skip seq2seq tests

* remove multilingual arg

* reorder args

* collapse to one function

Co-authored-by: ArthurZucker <arthur@huggingface.co>

* option to override attributes

Co-authored-by: ArthurZucker <arthur@huggingface.co>

* add to docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make comment more clear

Co-authored-by: sgugger <sylvain@huggingface.co>

* don't add special tokens in get_decoder_prompt_ids

* add test for set_prefix_tokens

Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>
2022-11-03 14:22:40 +00:00
Saad Mahmud 790ff2544a
[Doctest] Add configuration_camembert.py (#20039)
* Add example docstring for CamembertConfig

* Add configuration_camembert to documentation_tests
2022-11-03 14:50:42 +01:00
Yih-Dar 9ccea7acb1
Fix some doctests after PR 15775 (#20036)
* Add skip_special_tokens=True in some doctest

* For T5

* Fix for speech_to_text.mdx

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-03 14:18:45 +01:00
amyeroberts a639ea9e8a
Add **kwargs (#20037) 2022-11-03 12:51:49 +00:00
Nicolas Patry ec6878f6ca
Now supporting pathlike in pipelines too. (#20030) 2022-11-03 09:14:45 +01:00
Steven Liu aa39967b28
reorganize glossary (#20010) 2022-11-02 16:58:17 -07:00
Yih-Dar 305e8718b4
Show installed libraries and their versions in CI jobs (#20026)
* Show versions

* check

* store outputs

* revert

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 20:52:39 +01:00
Ben Eyal 9f9ddcc2de
🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)
* Add test for SentencePiece not adding special tokens to strings

* Add SentencePieceStringConversionMixin to fix issue 15003

* Fix conversion from tokens to string for most SentencePiece tokenizers

Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer

* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab

* Fix DebertaV2Tokenizer

* Ignore LayoutXLMTokenizer in SentencePiece string conversion test

* Run 'make style' and 'make quality'

* Clean convert_tokens_to_string test

Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.

* Remove commented out code

* Improve robustness of convert_tokens_to_string test

Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.

* Inline and remove SentencePieceStringConversionMixin

The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.

* Run 'make style' and 'make quality'

* Revert removal of space in convert_tokens_to_string

* Remove redundant import

* Revert test text to original

* Uncomment the lowercasing of the reverse_text variable

* Mimic Rust tokenizer behavior for tokenizers

- Albert
- Barthez
- Camembert
- MBart50
- T5

* Fix accidentally skipping test in wrong tokenizer

* Add test for equivalent Rust and slow tokenizer behavior

* Override _decode in BigBirdTokenizer to mimic Rust behavior

* Override _decode in FNetTokenizer to mimic Rust behavior

* Override _decode in XLNetTokenizer to mimic Rust behavior

* Remove unused 're' import

* Update DebertaV2Tokenizer to mimic Rust tokenizer

* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.

* Ignore problematic tests in Deberta V2

* Add comment on why the Deberta V2 tests are skipped
2022-11-02 15:45:38 -04:00
Yih-Dar fb7cbe236b
Fix doctest (#20023)
* Fix doctest

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 19:37:25 +01:00
Yih-Dar f69eb24b5a
Improve model tester (#19984)
* part 1

* part 2

* part 3

* fix

* For CANINE

* For ESMFold

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 17:38:44 +01:00
Saad Mahmud 7487743793
[Doctest] Add configuration_deberta_v2.py (#19995)
* Add example docstring for DebertaV2Config

* Add DebertaV2Config to documentation_tests

* Fix mistake with directory name
2022-11-02 16:22:11 +01:00
amyeroberts 9aedce99b0
Update auto processor to check image processor created (#20021) 2022-11-02 15:19:33 +00:00
Sylvain Gugger 49b77b89ea
Quality (#20002) 2022-11-02 09:53:37 -04:00
Yih-Dar c6c9db3d0c
Fix gradient checkpoint test in encoder-decoder (#20017)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 14:15:09 +01:00
amyeroberts a6b7759880
Add Image Processors (#19796)
* Add CLIP image processor

* Crop size as dict too

* Update warning

* Actually use logger this time

* Normalize doesn't change dtype of input

* Add perceiver image processor

* Tidy up

* Add DPT image processor

* Add Vilt image processor

* Tidy up

* Add poolformer image processor

* Tidy up

* Add LayoutLM v2 and v3 imsge processors

* Tidy up

* Add Flava image processor

* Tidy up

* Add deit image processor

* Tidy up

* Add ConvNext image processor

* Tidy up

* Add levit image processor

* Add segformer image processor

* Add in post processing

* Fix up

* Add ImageGPT image processor

* Fixup

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Add VideoMAE image processor

* Tidy up

* Add ImageGPT image processor

* Fixup

* Add ViT image processor

* Tidy up

* Add beit image processor

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Fix up

* Fix flava and remove tree module

* Fix image classification pipeline failing tests

* Update feature extractor in trainer scripts

* Update pad_if_smaller to accept tuple and int size

* Update for image segmentation pipeline

* Update src/transformers/models/perceiver/image_processing_perceiver.py

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Update src/transformers/image_processing_utils.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/beit/image_processing_beit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* PR comments - docstrings; remove accidentally added resize; var names

* Update docstrings

* Add exception if size is not in the right format

* Fix exception check

* Fix up

* Use shortest_edge in tuple in script

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-11-02 11:57:36 +00:00
Ripose 2e3452af0f
make sentencepiece import conditional in bertjapanesetokenizer (#20012) 2022-11-02 07:44:37 -04:00
Yih-Dar 8827e1b217
clean up vision/text config dict arguments (#19954)
* clean up

* For backward compatibility

* clean up

* Same changes for more models

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 12:03:43 +01:00
Alara Dirik cb630ffab8
Update object detection pipeline to use post_process_object_detection methods(#20004) 2022-11-02 10:26:36 +03:00
Steven Liu 79c720c062
fix typo (#20006) 2022-11-01 11:30:36 -07:00
Joao Gante 831590f6a9
Generate: contrastive search with full optional outputs (#19963)
* Use beam search functionality; Add extra outputs and test

* Add full tests for contrastive search

* Add error message on unconventional cache format
2022-11-01 18:15:36 +00:00
Steven Liu ab74ac11e4
Add LayoutLMv3 resource (#19932)
* add layoutlmv3 resource

* add layoutlmv2 resources

* fix button
2022-11-01 11:10:46 -07:00
Steven Liu dec8578e70
Add BERT resources (#19852)
* add resources for bert

* add course chapters

* apply reviews

* add pipeline icons and community resource

* fix buttons
2022-11-01 11:09:53 -07:00
Steven Liu 1f6885bad0
add dataset (#20005) 2022-11-01 10:37:20 -07:00
Matt 4f1e5e4efd
Add ESMFold code sample (#20000)
* Add ESMFold code sample

* sorry sylvain

* make fixup

* sorry sylvain again
2022-11-01 13:21:12 +00:00
Ikko Ashimine 38e5b71abb
Add Japanese translated README (#19945)
* Add japanese translated README.md

* Add README_ja.md link

* Add japanese transkate to check_copies.py

* Add guide to Japanese README.md

* Update README_ja.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update utils/check_copies.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-01 09:18:08 -04:00
Wang Ran (汪然) 4f90fc1db8
typo (#20001) 2022-11-01 09:04:53 -04:00
Sayak Paul c87ae86a8f
Update image_classification.mdx (#19996) 2022-11-01 07:54:41 -04:00
Mohit Sharma c796b6dea6
Added onnx config whisper (#19525)
* Added onnx config whisper

* added whisper support onnx

* add audio input data

* added whisper support onnx

* fixed the seqlength value

* Updated the whisper onnx ocnfig

* restore files to old version

* removed attention mask from inputs

* Updated get_dummy_input_onnxruntime docstring

* Updated relative imports and token generation

* update docstring
2022-11-01 07:50:42 -04:00
Sylvain Gugger c3a93d8d82
v4.25.0.dev0 2022-10-31 21:48:40 -04:00