Joao Gante
a0f8674303
Generate: TF contrastive search with XLA support ( #20050 )
...
* Add contrastive search
2022-11-07 10:54:29 +00:00
Christopher Akiki
504db92e7d
Update hub.py ( #20075 )
2022-11-04 22:25:02 +01:00
Christopher Akiki
4b86e44693
Update modeling_tf_utils.py ( #20076 )
2022-11-04 22:24:37 +01:00
amyeroberts
d68c46026b
Update defaults and logic to match old FE ( #20065 )
...
* Update defaults and logic to match old FE
* Use docker run rest values
2022-11-04 19:14:56 +00:00
Yih-Dar
c06d555647
Show installed libraries and their versions in GA jobs ( #20069 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:03:18 +01:00
Yih-Dar
2d02178e5c
Allow passing arguments to model testers for CLIP-like models ( #20044 )
...
* POC
* For more CLIP-like models
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:01:41 +01:00
Jordan Clive
3bd0007e87
Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 ( #20068 )
...
Co-authored-by: jordiclive <jordiclive19@imperial.ac.uk>
2022-11-04 11:32:44 -04:00
Matt
6e1c5786dc
Update READMEs for ESMFold and add notebooks ( #20067 )
...
* Update READMEs for ESMFold and add notebooks
* Fix PyCharm formatting
* make fix-copies
2022-11-04 15:10:13 +00:00
H. Jhoo
707b12a353
change constant torch.tensor to torch.full ( #20061 )
2022-11-04 10:41:56 -04:00
NielsRogge
787620e2a2
[Swin] Add Swin SimMIM checkpoints ( #20034 )
...
* Fix Swin
* Remove file
* Update code snippet
* Add copied from to maskformer
* Fix docstring
* Add whole name to replace
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-11-04 15:32:44 +01:00
amyeroberts
3936411b9d
PoolformerImageProcessor defaults to match previous FE ( #20048 )
...
* Poolformer image processor defaults to previous FE
* Remove unnecessary math.floor
2022-11-04 13:52:58 +00:00
Sanchit Gandhi
94e17c456c
[Trainer] Fix model name in push_to_hub ( #20064 )
2022-11-04 13:40:21 +00:00
Sourab Mangrulkar
19067711e7
fix `tokenizer_type` to avoid error when loading checkpoint back ( #20062 )
2022-11-04 19:04:01 +05:30
bhuang
3502c202f9
Update README.md ( #20063 )
2022-11-04 08:56:54 -04:00
Matt
1076d587b5
Fix ESM LM head test ( #20045 )
...
* Fix esm lm head test
* make fixup
2022-11-04 12:45:34 +00:00
Patrick Deutschmann
d447c460b1
Speed up TF token classification postprocessing by converting complete tensors to numpy ( #19976 )
...
* Speed up TF postprocessing by converting to numpy before
* Fix bug that was triggered when offset_mapping was None
Co-authored-by: Patrick Deutschmann <patrick.deutschmann@dedalus.com>
2022-11-03 16:56:22 +00:00
Sylvain Gugger
06886d5a68
Only resize embeddings when necessary ( #20043 )
...
* Only resize embeddings when necessary
* Add comment
2022-11-03 12:05:04 -04:00
Michael Benayoun
9080607b2c
Fixed torch.finfo issue with torch.fx ( #20040 )
2022-11-03 16:14:44 +01:00
Matt
6f257bb3c2
Update esmfold conversion script ( #20028 )
...
* Update ESM conversion script for ESMfold
* Fix bug in ESMFold example
* make fixup and move restypes to one line
2022-11-03 14:58:06 +00:00
Wang, Yi
2564f0c21d
fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc ( #19891 )
...
* fix jit trace error for classification usecase, update related doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add implementation in torch 1.14.0
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-11-03 10:50:03 -04:00
Arthur
737bff6a36
[FuturWarning] Add futur warning for LEDForSequenceClassification ( #19066 )
...
* fix led eos_mask
* add Futur Warning
* revert uselesss cahnges
* Update src/transformers/models/led/modeling_led.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-03 15:26:09 +01:00
Sanchit Gandhi
06d488061f
[Whisper Tokenizer] Make more user-friendly ( #19921 )
...
* [Whisper Tokenizer] Make more user-friendly
* use property
* make indexing rigorous
* small clean-up
* tests
* skip seq2seq tests
* remove multilingual arg
* reorder args
* collapse to one function
Co-authored-by: ArthurZucker <arthur@huggingface.co>
* option to override attributes
Co-authored-by: ArthurZucker <arthur@huggingface.co>
* add to docs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make comment more clear
Co-authored-by: sgugger <sylvain@huggingface.co>
* don't add special tokens in get_decoder_prompt_ids
* add test for set_prefix_tokens
Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>
2022-11-03 14:22:40 +00:00
Saad Mahmud
790ff2544a
[Doctest] Add configuration_camembert.py ( #20039 )
...
* Add example docstring for CamembertConfig
* Add configuration_camembert to documentation_tests
2022-11-03 14:50:42 +01:00
Yih-Dar
9ccea7acb1
Fix some doctests after PR 15775 ( #20036 )
...
* Add skip_special_tokens=True in some doctest
* For T5
* Fix for speech_to_text.mdx
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-03 14:18:45 +01:00
amyeroberts
a639ea9e8a
Add **kwargs ( #20037 )
2022-11-03 12:51:49 +00:00
Nicolas Patry
ec6878f6ca
Now supporting pathlike in pipelines too. ( #20030 )
2022-11-03 09:14:45 +01:00
Steven Liu
aa39967b28
reorganize glossary ( #20010 )
2022-11-02 16:58:17 -07:00
Yih-Dar
305e8718b4
Show installed libraries and their versions in CI jobs ( #20026 )
...
* Show versions
* check
* store outputs
* revert
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 20:52:39 +01:00
Ben Eyal
9f9ddcc2de
🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` ( #15775 )
...
* Add test for SentencePiece not adding special tokens to strings
* Add SentencePieceStringConversionMixin to fix issue 15003
* Fix conversion from tokens to string for most SentencePiece tokenizers
Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer
* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
* Fix DebertaV2Tokenizer
* Ignore LayoutXLMTokenizer in SentencePiece string conversion test
* Run 'make style' and 'make quality'
* Clean convert_tokens_to_string test
Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.
* Remove commented out code
* Improve robustness of convert_tokens_to_string test
Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.
* Inline and remove SentencePieceStringConversionMixin
The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.
* Run 'make style' and 'make quality'
* Revert removal of space in convert_tokens_to_string
* Remove redundant import
* Revert test text to original
* Uncomment the lowercasing of the reverse_text variable
* Mimic Rust tokenizer behavior for tokenizers
- Albert
- Barthez
- Camembert
- MBart50
- T5
* Fix accidentally skipping test in wrong tokenizer
* Add test for equivalent Rust and slow tokenizer behavior
* Override _decode in BigBirdTokenizer to mimic Rust behavior
* Override _decode in FNetTokenizer to mimic Rust behavior
* Override _decode in XLNetTokenizer to mimic Rust behavior
* Remove unused 're' import
* Update DebertaV2Tokenizer to mimic Rust tokenizer
* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
* Ignore problematic tests in Deberta V2
* Add comment on why the Deberta V2 tests are skipped
2022-11-02 15:45:38 -04:00
Yih-Dar
fb7cbe236b
Fix doctest ( #20023 )
...
* Fix doctest
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 19:37:25 +01:00
Yih-Dar
f69eb24b5a
Improve model tester ( #19984 )
...
* part 1
* part 2
* part 3
* fix
* For CANINE
* For ESMFold
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 17:38:44 +01:00
Saad Mahmud
7487743793
[Doctest] Add configuration_deberta_v2.py ( #19995 )
...
* Add example docstring for DebertaV2Config
* Add DebertaV2Config to documentation_tests
* Fix mistake with directory name
2022-11-02 16:22:11 +01:00
amyeroberts
9aedce99b0
Update auto processor to check image processor created ( #20021 )
2022-11-02 15:19:33 +00:00
Sylvain Gugger
49b77b89ea
Quality ( #20002 )
2022-11-02 09:53:37 -04:00
Yih-Dar
c6c9db3d0c
Fix gradient checkpoint test in encoder-decoder ( #20017 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 14:15:09 +01:00
amyeroberts
a6b7759880
Add Image Processors ( #19796 )
...
* Add CLIP image processor
* Crop size as dict too
* Update warning
* Actually use logger this time
* Normalize doesn't change dtype of input
* Add perceiver image processor
* Tidy up
* Add DPT image processor
* Add Vilt image processor
* Tidy up
* Add poolformer image processor
* Tidy up
* Add LayoutLM v2 and v3 imsge processors
* Tidy up
* Add Flava image processor
* Tidy up
* Add deit image processor
* Tidy up
* Add ConvNext image processor
* Tidy up
* Add levit image processor
* Add segformer image processor
* Add in post processing
* Fix up
* Add ImageGPT image processor
* Fixup
* Add mobilevit image processor
* Tidy up
* Add postprocessing
* Fixup
* Add VideoMAE image processor
* Tidy up
* Add ImageGPT image processor
* Fixup
* Add ViT image processor
* Tidy up
* Add beit image processor
* Add mobilevit image processor
* Tidy up
* Add postprocessing
* Fixup
* Fix up
* Fix flava and remove tree module
* Fix image classification pipeline failing tests
* Update feature extractor in trainer scripts
* Update pad_if_smaller to accept tuple and int size
* Update for image segmentation pipeline
* Update src/transformers/models/perceiver/image_processing_perceiver.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Update src/transformers/image_processing_utils.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/beit/image_processing_beit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* PR comments - docstrings; remove accidentally added resize; var names
* Update docstrings
* Add exception if size is not in the right format
* Fix exception check
* Fix up
* Use shortest_edge in tuple in script
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-11-02 11:57:36 +00:00
Ripose
2e3452af0f
make sentencepiece import conditional in bertjapanesetokenizer ( #20012 )
2022-11-02 07:44:37 -04:00
Yih-Dar
8827e1b217
clean up vision/text config dict arguments ( #19954 )
...
* clean up
* For backward compatibility
* clean up
* Same changes for more models
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 12:03:43 +01:00
Alara Dirik
cb630ffab8
Update object detection pipeline to use post_process_object_detection methods( #20004 )
2022-11-02 10:26:36 +03:00
Steven Liu
79c720c062
fix typo ( #20006 )
2022-11-01 11:30:36 -07:00
Joao Gante
831590f6a9
Generate: contrastive search with full optional outputs ( #19963 )
...
* Use beam search functionality; Add extra outputs and test
* Add full tests for contrastive search
* Add error message on unconventional cache format
2022-11-01 18:15:36 +00:00
Steven Liu
ab74ac11e4
Add LayoutLMv3 resource ( #19932 )
...
* add layoutlmv3 resource
* add layoutlmv2 resources
* fix button
2022-11-01 11:10:46 -07:00
Steven Liu
dec8578e70
Add BERT resources ( #19852 )
...
* add resources for bert
* add course chapters
* apply reviews
* add pipeline icons and community resource
* fix buttons
2022-11-01 11:09:53 -07:00
Steven Liu
1f6885bad0
add dataset ( #20005 )
2022-11-01 10:37:20 -07:00
Matt
4f1e5e4efd
Add ESMFold code sample ( #20000 )
...
* Add ESMFold code sample
* sorry sylvain
* make fixup
* sorry sylvain again
2022-11-01 13:21:12 +00:00
Ikko Ashimine
38e5b71abb
Add Japanese translated README ( #19945 )
...
* Add japanese translated README.md
* Add README_ja.md link
* Add japanese transkate to check_copies.py
* Add guide to Japanese README.md
* Update README_ja.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update utils/check_copies.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-01 09:18:08 -04:00
Wang Ran (汪然)
4f90fc1db8
typo ( #20001 )
2022-11-01 09:04:53 -04:00
Sayak Paul
c87ae86a8f
Update image_classification.mdx ( #19996 )
2022-11-01 07:54:41 -04:00
Mohit Sharma
c796b6dea6
Added onnx config whisper ( #19525 )
...
* Added onnx config whisper
* added whisper support onnx
* add audio input data
* added whisper support onnx
* fixed the seqlength value
* Updated the whisper onnx ocnfig
* restore files to old version
* removed attention mask from inputs
* Updated get_dummy_input_onnxruntime docstring
* Updated relative imports and token generation
* update docstring
2022-11-01 07:50:42 -04:00
Sylvain Gugger
c3a93d8d82
v4.25.0.dev0
2022-10-31 21:48:40 -04:00