Commit Graph

336 Commits

Author SHA1 Message Date
Patrick von Platen 2e12b907ae
TF generate refactor - Greedy Search (#15562)
* TF generate start refactor

* Add tf tests for sample generate

* re-organize

* boom boom

* Apply suggestions from code review

* re-add

* add all code

* make random greedy pass

* make encoder-decoder random work

* further improvements

* delete bogus file

* make gpt2 and t5 tests work

* finish logits tests

* correct logits processors

* correct past / encoder_outputs drama

* refactor some methods

* another fix

* refactor shape_list

* fix more shape list

* import shape
_list

* finish docs

* fix imports

* make style

* correct tf utils

* Fix TFRag as well

* Apply Lysandre's and Sylvais suggestions

* Update tests/test_generation_tf_logits_process.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/tf_utils.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* remove cpu according to gante

* correct logit processor

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-02-15 17:54:43 +01:00
Yih-Dar 6a5472a8e1
Force use_cache to be False in PyTorch (#15385)
* use_cache = False for PT models if labels is passed

* Fix for BigBirdPegasusForConditionalGeneration

* add warning if users specify use_cache=True

* Use logger.warning instead of warnings.warn

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-08 16:20:53 +01:00
SaulLu 7b8bdd8601
fix the `tokenizer_config.json` file for the slow tokenizer when a fast version is available (#15319)
* add new test

* update test

* remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py`

* add `tokenizer_file` for the fast only tokenizer

* change global variables layoutxml

* remove `"tokenizer_file"` from DPR tokenizer's Global variables

* remove `tokenizer_file` from herbert slow tokenizer init

* `"tokenizer_file"` from LED tokenizer's Global variables

* remove `tokenizer_file` from mbart slow tokenizer init

* remove `tokenizer_file` from slow tokenizer template

* adapt to versioning

* adapt the `test_tokenizer_mismatch_warning` test

* clean test

* clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py

* Revert "remove `tokenizer_file` from mbart slow tokenizer init"

This reverts commit 0dbb723fa9.

* Revert "`"tokenizer_file"` from LED tokenizer's Global variables"

This reverts commit 5a3f879bdd.

* Revert "remove `tokenizer_file` from herbert slow tokenizer init"

This reverts commit f5e10007b7.

* Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables"

This reverts commit da0895330b.

* set `tokenizer_file` in super `__init__` of mbart
2022-02-01 16:48:25 +01:00
Yih-Dar dc05dd539f
Fix TF Causal LM models' returned logits (#15256)
* Fix TF Causal LM models' returned logits

* Fix expected shape in the tests

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-01 11:04:07 +00:00
Yih-Dar 554d333ece
Fix loss calculation in TFXXXForTokenClassification models (#15294)
* Fix loss calculation in TFFunnelForTokenClassification

* revert the change in TFFunnelForTokenClassification

* fix FunnelForTokenClassification loss

* fix other TokenClassification loss

* fix more

* fix more

* add num_labels to ElectraForTokenClassification

* revert the change to research projects

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-31 11:43:08 -05:00
Sylvain Gugger 7fc6f41d91
Add doc for add-new-model-like command (#15433) 2022-01-31 11:10:45 -05:00
Yih-Dar c15bb3fe19
[Fix doc example] fix missing import jnp (#15291)
* fix missing import jnp

* Fix missing jax and k=1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-24 14:54:23 +01:00
Jonas Kuball c962c2adbf
Adds missing module_specs for usages of _LazyModule (#15230)
* Add missing __spec__ for transformers.models.auto

* Moves the __spec__-test to the UnitTest class

* Adds module_spec to all instances of _LazyModule

* Refactors an old test from pytest to unittest
2022-01-21 07:30:12 -05:00
Matt 2708bfa127
Rename compute_loss in TF models (#15207)
* Rename compute_loss to hf_compute_loss to avoid conflicts with the new Keras method

* make style

* Adding deprecation warning to `compute_loss`

* Fix sneaky reference to compute_loss

* Replace logger.warning with warnings.warn

* Clarifying warning and deprecation timeline
2022-01-19 13:29:07 +00:00
Sylvain Gugger 5f3c57fc84
Check the repo consistency in model templates test (#15141)
* Check the repo consistency in model templates test

* Fix doc template

* Fix docstrings

* Fix last docstring
2022-01-14 04:52:38 -05:00
Sylvain Gugger 1a00863e95 Fix typo in doc template 2022-01-11 15:22:15 -05:00
NielsRogge 6ea6266625
Fix cookiecutter (#15100) 2022-01-11 05:57:26 -05:00
Suraj Patil 3e9fdcf019
[DOC] fix doc examples for bart-like models (#15093)
* fix doc examples

* remove double colons
2022-01-10 18:13:28 +01:00
Sylvain Gugger 61d18ae035
Happy New Year! (#15094) 2022-01-10 12:05:57 -05:00
Sylvain Gugger 207594be81
Convert rst files (#14888)
* Convert all tutorials and guides

* Convert all remaining rst to mdx

* Track and fix bad links
2021-12-22 16:14:35 -05:00
Sylvain Gugger 27b3031de2
Mass conversion of documentation from rst to Markdown (#14866)
* Convert docstrings of all configurations and tokenizers

* Processors and fixes

* Last modeling files and fixes to models

* Pipeline modules

* Utils files

* Data submodule

* All the other files

* Style

* Missing examples

* Style again

* Fix copies

* Say bye bye to rst docstrings forever
2021-12-21 15:06:33 -05:00
Sylvain Gugger 7af80f6618
Convert docstrings of modeling files (#14850)
* Convert file_utils docstrings to Markdown

* Test on BERT

* Return block indent

* Temporarily disable doc styler

* Remove from quality checks as well

* Remove doc styler mess

* Remove check from circleCI

* Fix typo

* Convert file_utils docstrings to Markdown

* Test on BERT

* Return block indent

* Temporarily disable doc styler

* Remove from quality checks as well

* Remove doc styler mess

* Remove check from circleCI

* Fix typo

* Let's go on all other model files

* Add templates too

* Styling and quality
2021-12-21 05:37:32 -05:00
Daniel Stancl ff066119ca
Implement head_mask for Flax BERT and other models copied from BERT (#14620)
* Implement head_mask for Flax BERT and other models copied from BERT

* Remove `from jax._src.nn.functions import sigmoid`

Remove `from jax._src.nn.functions import sigmoid` unintentionally added by IDE

* Remove no more valid copy statement

* Apply patil-suraj's suggestions from code review

* Apply suggestions from the code review

* Update Flax template

* Fix a typo

* Also update template for CausalLM modules
2021-12-17 17:06:59 +01:00
Lysandre Debut 8010fda9bf
Removes images to put them in a dataset (#14781)
* First try

* Update instructions
2021-12-16 04:42:02 -05:00
Yih-Dar 15a9d01519
Avoid using tf.tile in embeddings for TF models (#14735)
* avoid tf.tile in embeddings

* remove more tf.tile in embeddings

* clean

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2021-12-13 17:30:46 +00:00
Yih-Dar 32eb29fef9
Fix doc examples: modify config before super().__init__ (#14697)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2021-12-13 12:50:02 +01:00
Yih-Dar 59d684fa92
Fix examples: 'CausalLMOutputWithCrossAttentions' object has no attribute 'last_hidden_state' (#14678)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2021-12-10 14:55:54 +01:00
Thomas Viehmann 6ed9882ddb
use functional interface for softmax in attention (#14198)
* use functional interface instead of instantiating module and immediately calling it

* fix torch.nn.functional to nn.functional. Thank you Stas!
2021-11-30 11:47:33 -05:00
Sylvain Gugger d83b0e0c07
Add a post init method to all models (#14431)
* Add a post init method to all models

* Fix tests

* Fix last tests

* Fix templates

* Add comment

* Forgot to save
2021-11-18 08:38:09 -05:00
Suraj Patil e92190c0f8
Fix Flax params dtype (#13098)
* fix inits

* fix embed dtype

* fix embed dtype

* add test to check default dtype

* quality

* add type conversion methods for flax models

* more robust casting

* cast sinusoidal positions

* update pegasus

* update albert

* update test

* make sure dtype is passed to every module

* style

* fix electra dense

* fix t5

* quality

* add more tests

* better name

* use the dtype for lm head computation

* fix albert

* style

* fix albert embed dtype

* more tests

* fix vision enc-dec

* cleanup

* fix embed dtype pegasus

* fix default param test

* doc

* update template

* fix final_logits_bias dtype

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix doc

* fix doc

* add detailed docstring for dtype parameter

* remove un-necessary import

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-11-11 14:45:20 +05:30
Patrick von Platen e81d8d7fa9
[Bert2Bert] allow bert2bert + relative embeddings (#14324)
* [Bert2Bert] allow bert2bert + relative embeddings

* up

* Update README_ko.md

* up

* up
2021-11-09 14:26:58 -05:00
Sylvain Gugger c28bc80bbb
Generalize problem_type to all sequence classification models (#14180)
* Generalize problem_type to all classification models

* Missing import

* Deberta BC and fix tests

* Fix template

* Missing imports

* Revert change to reformer test

* Fix style
2021-10-29 10:32:56 -04:00
Patrick von Platen f5af873617
[Docs] More general docstrings (#14028)
* up

* finish

* up

* up

* finish
2021-10-16 00:48:37 +02:00
Yih-Dar 8b240a0661
Add TFEncoderDecoderModel + Add cross-attention to some TF models (#13222)
* Add cross attentions to TFGPT2Model

* Add TFEncoderDecoderModel

* Add TFBaseModelOutputWithPoolingAndCrossAttentions

* Add cross attentions to TFBertModel

* Fix past or past_key_values argument issue

* Fix generation

* Fix save and load

* Add some checks and comments

* Clean the code that deals with past keys/values

* Add kwargs to processing_inputs

* Add serving_output to TFEncoderDecoderModel

* Some cleaning + fix use_cache value issue

* Fix tests + add bert2bert/bert2gpt2 tests

* Fix more tests

* Ignore crossattention.bias when loading GPT2 weights into TFGPT2

* Fix return_dict_in_generate in tf generation

* Fix is_token_logit_eos_token bug in tf generation

* Finalize the tests after fixing some bugs

* Fix another is_token_logit_eos_token bug in tf generation

* Add/Update docs

* Add TFBertEncoderDecoderModelTest

* Clean test script

* Add TFEncoderDecoderModel to the library

* Add cross attentions to TFRobertaModel

* Add TFRobertaEncoderDecoderModelTest

* make style

* Change the way of position_ids computation

* bug fix

* Fix copies in tf_albert

* Remove some copied from and apply some fix-copies

* Remove some copied

* Add cross attentions to some other TF models

* Remove encoder_hidden_states from TFLayoutLMModel.call for now

* Make style

* Fix TFRemBertForCausalLM

* Revert the change to longformer + Remove copies

* Revert the change to albert and convbert + Remove copies

* make quality

* make style

* Add TFRembertEncoderDecoderModelTest

* make quality and fix-copies

* test TFRobertaForCausalLM

* Fixes for failed tests

* Fixes for failed tests

* fix more tests

* Fixes for failed tests

* Fix Auto mapping order

* Fix TFRemBertEncoder return value

* fix tf_rembert

* Check copies are OK

* Fix missing TFBaseModelOutputWithPastAndCrossAttentions is not defined

* Add TFEncoderDecoderModelSaveLoadTests

* fix tf weight loading

* check the change of use_cache

* Revert the change

* Add missing test_for_causal_lm for TFRobertaModelTest

* Try cleaning past

* fix _reorder_cache

* Revert some files to original versions

* Keep as many copies as possible

* Apply suggested changes - Use raise ValueError instead of assert

* Move import to top

* Fix wrong require_torch

* Replace more assert by raise ValueError

* Add test_pt_tf_model_equivalence (the test won't pass for now)

* add test for loading/saving

* finish

* finish

* Remove test_pt_tf_model_equivalence

* Update tf modeling template

* Remove pooling, added in the prev. commit, from MainLayer

* Update tf modeling test template

* Move inputs["use_cache"] = False to modeling_tf_utils.py

* Fix torch.Tensor in the comment

* fix use_cache

* Fix missing use_cache in ElectraConfig

* Add a note to from_pretrained

* Fix style

* Change test_encoder_decoder_save_load_from_encoder_decoder_from_pt

* Fix TFMLP (in TFGPT2) activation issue

* Fix None past_key_values value in serving_output

* Don't call get_encoderdecoder_model in TFEncoderDecoderModelTest.test_configuration_tie until we have a TF checkpoint on Hub

* Apply review suggestions - style for cross_attns in serving_output

* Apply review suggestions - change assert + docstrings

* break the error message to respect the char limit

* deprecate the argument past

* fix docstring style

* Update the encoder-decoder rst file

* fix Unknown interpreted text role "method"

* fix typo

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-10-13 00:10:34 +02:00
Tommy Chiang a2ef9c5446
Use torch.unique_consecutive to check same element (#13637)
We use `torch.unique` here only to check whether every elements have
the same value.
Therefore, we can use `torch.unique_consecutive` here.

This function eliminates all but the first element from every consecutive
group of equivalent elements.
Like, if we apply this function to `[1, 2, 2, 1]`, it will result in
`[1, 2, 1]`.

As you could see, this is enough for checking whether every elements
have the same value.

Since `torch.unique_consecutive` do less thing, it is much more faster.
On my computer, it is 25x faster on GPU and 15x faster on CPU.
2021-09-24 10:31:23 +02:00
Sylvain Gugger 27d4639779
Make gradient_checkpointing a training argument (#13657)
* Make gradient_checkpointing a training argument

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix tests

* Style

* document Gradient Checkpointing as a performance feature

* Small rename

* PoC for not using the config

* Adapt BC to new PoC

* Forgot to save

* Rollout changes to all other models

* Fix typo

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2021-09-22 07:51:38 -04:00
Bhadresh Savani 3fbb55c757
[Flax] Fixes typo in Bart based Flax Models (#13565) 2021-09-15 11:03:52 +05:30
Nils Reimers c8be8a9adb
Update model configs - Allow setters for common properties (#13026)
* refactor GPT Config to allow dyn. properties

* make attribute_map a class attribute

* remove old code

* update unit test to test config: Add test for common properties setter

* update unit test to test config: Add test for common properties passed as parameters to __init__

* update to black code format

* Allow that setters are not defined for certain config classes

* update config classes to implement attribute_map

* bugfix lxmert config - id2labels was not defined when num_labels was set

* update broken configs - add attribute_maps

* update bart config

* update black codestyle

* update documentation on common config attributes

* update GPTJ config to new attribute map

* update docs on common attributes

* gptj config: add max_position_embeddings

* gptj config: format with black

* update speech to text 2 config

* format doc file to max_len 119

* update config template
2021-09-06 16:30:13 +02:00
Patrick von Platen 02039352b2
Update README.md 2021-09-01 09:50:21 +02:00
Jonathan Chang d160782a53
Add template for adding flax models (#12441)
* Add option to add flax

* Add flax template for __init__.py

* Add flax template for .rst

* Copy TF modeling template

* Add a missing line in modeling_tf_... template

* Update first half of modeling_flax_..

* Update encoder flax template

* Copy test_modeling_tf... as test_modeling_flax...

* Replace some TF to Flax in test_modeling_flax_...

* Replace tf to np

some function might not work, like _assert_tensors_equal

* Replace remaining tf to np (might not work)

* Fix cookiecutter

* Add Flax in to_replace_... template

* Update transformers-cli add-new-model

* Save generate_flax in configuration.json

This will be read by transformers-cli

* Fix to_replace_... and cli

* Fix replace cli

* Fix cookiecutter name

* Move docstring earlier to avoid not defined error

* Fix a missing Module

* Add encoder-decoder flax template from bart

* Fix flax test

* Make style

* Fix endif

* Fix replace all "utf-8 -> unp-8"

* Update comment

* Fix flax template (add missing ..._DOCSTRING)

* Use flax_bart imports in template (was t5)

* Fix unp

* Update templates/adding_a_new_model/tests

* Revert "Fix unp"

This reverts commit dc9002a41d.

* Remove one line of copied from to suppress CI error

* Use generate_tensorflow_pytorch_and_flax

* Add a missing part

* fix typo

* fix flax config

* add examples for flax

* small rename

* correct modeling imports

* correct auto loading

* corrects some flax tests

* correct small typo

* correct as type

* finish modif

* correct more templates

* final fixes

* add file testers

* up

* make sure tests match template regex

* correct pytorch

* correct tf

* correct more tf

* correct imports

* minor error

* minor error

* correct init

* more fixes

* correct more flax tests

* correct flax test

* more fixes

* correct docs

* update

* fix

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-09-01 09:49:03 +02:00
Jongheon Kim ef8d6f2b4a
Set missing seq_length variable when using inputs_embeds with ALBERT & Remove code duplication (#13152)
* Set seq_length variable when using inputs_embeds

* remove code duplication
2021-08-31 06:51:25 -04:00
Allan Lin 91ff480e26
Update namespaces inside torch.utils.data to the latest. (#13167)
* Update torch.utils.data namespaces to the latest.

* Format

* Update Dataloader.

* Style
2021-08-19 14:29:51 +02:00
Sylvain Gugger 9870093f7b
[WIP] Disentangle auto modules from other modeling files (#13023)
* Initial work

* All auto models

* All tf auto models

* All flax auto models

* Tokenizers

* Add feature extractors

* Fix typos

* Fix other typo

* Use the right config

* Remove old mapping names and update logic in AutoTokenizer

* Update check_table

* Fix copies and check_repo script

* Fix last test

* Add back name

* clean up

* Update template

* Update template

* Forgot a )

* Use alternative to fixup

* Fix TF model template

* Address review comments

* Address review comments

* Style
2021-08-06 13:12:30 +02:00
Sylvain Gugger 790f1c9545
Fix template for inputs docstrings (#12976) 2021-08-03 08:28:25 +02:00
Lysandre Debut c3d9ac7607
Expose get_config() on ModelTesters (#12812)
* Expose get_config() on ModelTesters

* Typo
2021-07-21 04:13:11 -04:00
Sylvain Gugger 0a6b9048d1
Init pickle (#12567)
* Try to pickle transformers

* Deal with special objs better

* Make picklable
2021-07-08 07:20:46 -04:00
Michal Szutenberg 0d2bffad31
Remove tf.roll wherever not needed (#12512)
It was used in shift_right.
After this change TF code is more similar to Pytorch implementations
Also, TF graphs are optimized (one node less)
2021-07-07 16:17:30 +01:00
Bhadresh Savani 04dbea31a9
[Examples] Added context manager to datasets map (#12367)
* added cotext manager to datasets map

* fixed style and spaces

* fixed warning of deprecation

* changed desc
2021-06-28 09:14:00 -07:00
Bhadresh Savani 9a7545943d
updated example template (#12365) 2021-06-25 20:50:30 -07:00
Stas Bekman 4a872caef4
remove extra white space from log format (#12360) 2021-06-25 13:20:14 -07:00
Sylvain Gugger 9eda6b52e2
Add all XxxPreTrainedModel to the main init (#12314)
* Add all XxxPreTrainedModel to the main init

* Add to template

* Add to template bis

* Add FlaxT5
2021-06-23 10:40:54 -04:00
Hamid Shojanazeri af6e01c5bc
Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing [WIP] (#11252)
* registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing

* sytle format

* adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue

* adding the try catch to the fix as persistent flag is only available from PT >1.6

* adding version check

* added the condition to only use the token_type_ids buffer when its autogenerated not passed by user

* adding comments and making the conidtion where token_type_ids are None to use the registered buffer

* taking out position-embeddding from the if block

* adding comments

* handling the case if buffer for position_ids was not registered

* reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings

* reverting the token_type_ids in case of None to the previous version

* reverting changes on position_ids adding back the if block

* changes added by running make fix-copies

* changes added by running make fix-copies and added the import version as it was getting used

* changes added by running make fix-copies

* changes added by running make fix-copies

* fixing the import format

* fixing the import format

* modified to use temp tensor for trimed and expanded token_type_ids buffer

* changes made by fix-copies after temp tensor modifications

* changes made by fix-copies after temp tensor modifications

* changes made by fix-copies after temp tensor modifications

* clean up

* clean up

* clean up

* clean up

* Nit

* Nit

* Nit

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* changes based on latest in master

* Adapt templates

* Add version import

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-22 05:21:30 -04:00
Xa9aX ツ f3558bbcfd
Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240)
* Moved Mish to Torch 1.9 version

* Run black formatting
2021-06-18 09:13:45 -04:00
Stas Bekman a156da9a23
consistent nn. and nn.functional: p2 templates (#12153) 2021-06-14 11:41:24 -07:00
François Lagunas f8bd8c6c7e
Fixes bug that appears when using QA bert and distilation. (#12026)
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.

* Fixing all models QA clamp_ bug.
2021-06-07 11:21:59 -04:00
Suraj Patil 185122ef22
fix docs of past_key_values (#12049) 2021-06-07 15:24:03 +05:30
Sylvain Gugger f086652b16
Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training

* Use an alternate property
2021-05-25 08:03:43 -04:00
Sylvain Gugger 7eee950ac3
Re-styling in seq2seq attention (#11613) 2021-05-06 14:24:19 -04:00
Bhadresh Savani 1d30ec95c7
[Examples] Fixes inconsistency around eval vs val and predict vs test (#11380)
* added changes for uniformity

* modified files

* corrected typo

* fixed qa scripts

* fix typos

* fixed predict typo in qa no trainer

* fixed test file

* reverted trainer changes

* reverted trainer changes in custom exmaples

* updated readme

* added changes in deepspeed test

* added changes for predict and eval
2021-04-26 09:24:31 -07:00
Daniel Stancl 38a716cd41
TF BART models - Add `cross_attentions` to model output and fix cross-attention head masking (#10699)
* Add cross_attn_head_mask to BART

* Fix cross_attentions in TFBart-like models

* This commit enables returning of `cross_attentions`
for TFBart-like models

* It also fixes attention head masking in cross-attenion module

* Update TF model templates

* Fix missing , in TF model templates

* Fix typo: congig -> config
2021-04-26 14:16:21 +02:00
Daniel Stancl e3ff165aa5
Fix cross-attention head mask for Torch encoder-decoder models (#10605)
* Fix cross-attention head mask for Torch BART models

* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus

* Enable test_headmasking for M2M_100 model

* Fix cross_head_mask for FSMT, LED and T5

* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5

* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`

* Update template

* Fix template for BartForCausalLM

* Fix cross_head_mask for Speech2Text models

* Fix cross_head_mask in templates

* Fix args order in BartForCausalLM template

* Fix doc in BART templates

* Make more explicit naming

* `cross_head_mask` -> `cross_attn_head_mask`

* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`

* Fix doc

* make style quality

* Fix speech2text docstring
2021-04-23 18:58:06 +02:00
Sylvain Gugger 74712e22f3
Honor contributors to models (#11329)
* Honor contributors to models

* Fix typo

* Address review comments

* Add more authors
2021-04-21 09:47:27 -04:00
Sylvain Gugger 45fc8c7951
Make `get_special_tokens_mask` consider all tokens (#11163) 2021-04-09 11:57:44 -04:00
Stas Bekman c9035e4537
fix: The 'warn' method is deprecated (#11105)
* The 'warn' method is deprecated

* fix test
2021-04-07 09:20:06 -04:00
Sylvain Gugger acc3bd9d2a
Enforce string-formatting with f-strings (#10980)
* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-31 10:00:27 -04:00
Sylvain Gugger 700229f8a4
Fixes in the templates (#10951)
* Fixes in the templates

* Define in all cases

* Dimensionality -> Dimension

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-29 17:36:13 -04:00
Bhadresh Savani 7ef40120a0
[Examples] Added predict stage and Updated Example Template (#10868)
* added predict stage

* added test keyword in exception message

* removed example specific saving predictions

* fixed f-string error

* removed extra line

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-23 10:37:59 -07:00
Sylvain Gugger bf1f43fbd7
Update the example template for a no Trainer option (#10865) 2021-03-23 10:02:39 -04:00
Sylvain Gugger 2295d783d5
Copy tokenizer files in each of their repo (#10624)
* Move tokenizer files in each repo

* Fix mBART50 tests

* Fix mBART tests

* Fix Marian tests

* Update templates
2021-03-10 11:26:23 -05:00
Bhadresh Savani ac17f71159
added max_sample args and metrics changes (#10602) 2021-03-09 12:06:56 -05:00
Sylvain Gugger 7da995c00c
Fix embeddings for PyTorch 1.8 (#10549)
* Fix embeddings for PyTorch 1.8

* Try with PyTorch 1.8.0

* Fix embeddings init

* Fix copies

* Typo

* More typos
2021-03-05 16:18:48 -05:00
Patrick von Platen 2d2ed2cc18
[T5] Fix speed degradation bug t5 (#10496)
* fix speed degradation bug t5

* fix for all models

* fix code quality
2021-03-03 12:42:41 +03:00
mingruimingrui 894db6701e
Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding (#10200)
* Assumption of padding_idx <2 might not stand

* Use offset instead of 2

* Fix with black

* Change behavior to warning instead for backward compatibility.

* Fix with black

* Remove warning

* Make padding_idx non-required

* padding_idx fix for blenderbot

* padding_idx fix for blenderbot_small

* padding_idx fix for led

* padding_idx fix for mbart

* Remove extra whitespaces

* padding_idx fix for template

* Fix padding_idx passed to nn.Embedding mistake

* Fixed padding_idx passed to positional embedding in template

* Remove padding_idx from pytorch learned positional embeddings

* Remove accidentally added quotes

* Remove padding_idx from tf learned positional embeddings

* Remove zeroing of weights in __init__

Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>
2021-02-25 14:33:13 +03:00
Julien Plu 83d803ba02
Making TF BART-like models XLA and AMP compliant (#10191)
* Update BART

* Update Blenderbot

* Update BlenderbotSmall

* Update Marian

* Update MBart

* Update MBart

* Update Pegasus

* Update template

* Fix Marian and Pegasus

* Apply style

* Default initializer

* Default initializer

* Default initializer

* Remove int32 casts

* Fix template

* Remove more cast
2021-02-17 17:48:56 +01:00
Julien Plu 31b0560ab4
Add AMP for Albert (#10141) 2021-02-15 17:18:33 +01:00
Julien Plu 570218878a
Fix TF template (#10189)
* Fix template

* Update Seq2Seq tests
2021-02-15 09:21:57 -05:00
Patrick von Platen 8e13b73593
Update README.md 2021-02-11 18:35:27 +03:00
Patrick von Platen d6b4f48ecb
Update ADD_BIG_BIRD.md 2021-02-11 18:34:17 +03:00
Patrick von Platen 4cda2d73ef
Update ADD_BIG_BIRD.md 2021-02-09 19:58:35 +03:00
Julien Plu b82fe7d258
Replace strided slice with tf.expand_dims (#10078)
* Replace tf.newaxis -> tf.expand_dims

* Fix tests

* Fix tests

* Use reshape when a tensors needs a double expand

* Fix GPT2

* Fix GPT2
2021-02-09 11:48:28 -05:00
Lysandre Debut c9df1b1d53
Model templates (#10072) 2021-02-08 09:07:02 -05:00
Julien Plu cdd8659231
Fix TF template (#10069)
* Fix template

* Fix template
2021-02-08 08:10:50 -05:00
Julien Plu 31563e056d
Restore TF embeddings and attention layers to their previous version (#9890)
* Refacto BERT

* Restore all the concerned models

* Remove print

* Update template

* Apply Sylvain's and Morgan's comments

* Fix cast

* Put the cast inside call

* Remove cond in ebds

* Fix funnel

* Restore previous dot product (attention_scores) computation

* Add ConvBERT and BART

* Make all the S2S models ONNX compliant

* Fix test

* Fix check copies
2021-02-08 14:36:30 +03:00
Lysandre Debut ae37ceacbd
Fix typo (#10064) 2021-02-08 06:02:05 -05:00
Stas Bekman 8ea412a86f
[examples] make run scripts executable (#10037)
* make executable

* make executable

* same for the template

* cleanup
2021-02-05 15:51:18 -08:00
Patrick von Platen 89be094e29
[Templates] Add template "call-for-model" markdown and "call-for-big-bird" markdown (#9921)
* add big bird

* change teacher to mentor

* add proposal template

* adapt template

* delete old template

* correct some links

* finish template

* create big bird from template

* add big bird

* improve boxes

* finish boxes

* add pointers for BigBird

* finish big bird

* up

* up

* up

* up

* apply lysandres and sylvains suggestions

* delete bogus file

* correct markdown

* try different style

* try different style

* finalize
2021-02-05 15:47:54 +03:00
Lysandre Debut e89c959af9
Fix model templates (#9999) 2021-02-04 07:47:26 -05:00
demSd 00031785a8
BartForCausalLM analogs to `ProphetNetForCausalLM` (#9128)
* initiliaze bart4causalLM

* create BartDecoderWrapper, setters/getters

* delete spaces

* forward and additional methods

* update cache function, loss function, remove ngram* params in data class.

* add bartcausallm, bartdecoder testing

* correct bart for causal lm

* remove at

* add mbart as well

* up

* fix typo

* up

* correct

* add pegasusforcausallm

* add blenderbotforcausallm

* add blenderbotsmallforcausallm

* add marianforcausallm

* add test for MarianForCausalLM

* add Pegasus test

* add BlenderbotSmall test

* add blenderbot test

* fix a fail

* fix an import fail

* a fix

* fix

* Update modeling_pegasus.py

* fix models

* fix inputs_embeds setting getter

* adapt tests

* correct repo utils check

* finish test improvement

* fix tf models as well

* make style

* make fix-copies

* fix copies

* run all tests

* last changes

* fix all tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-02-04 11:56:12 +03:00
Patrick von Platen 0f4dc5d864
fix typo in naming (#9944) 2021-02-02 12:22:42 +03:00
Patrick von Platen 538b3b4607
[Tokenizer Utils Base] Make pad function more flexible (#9928)
* change tokenizer requirement

* split line

* Correct typo from list to str

* improve style

* make other function pretty as well

* add comment

* correct typo

* add new test

* pass tests for tok without padding token

* Apply suggestions from code review
2021-02-02 10:35:27 +03:00
Patrick von Platen 0e3be1ac8f
Add new model docs (#9667)
* add new model logic

* fix docs

* change structure

* improve add_new_model

* push new changes

* up

* up

* correct spelling

* improve docstring

* correct line length

* update readme

* correct links

* correct typos

* only add rst file for now

* Apply suggestions from code review 1

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>

* Apply suggestions from code review

Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>

* finish adding all suggestions

* make style

* apply Niels feedback

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply sylvains suggestions

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-01 17:55:10 +03:00
Sylvain Gugger d85691ac75
Doc title in the template (#9910) 2021-02-01 03:05:31 -05:00
Sylvain Gugger b4e559cfa1
Deprecate model_path in Trainer.train (#9854) 2021-01-28 08:32:46 -05:00
Funtowicz Morgan 2ee9f9b69e
Fix computation of attention_probs when head_mask is provided. (#9853)
* Fix computation of attention_probs when head_mask is provided.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Apply changes to the template

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-01-28 06:11:52 -05:00
Lysandre Debut 763ece2fea
Fix model templates (#9842) 2021-01-27 08:20:58 -05:00
Julien Plu bd701ab1a0
Fix template (#9840) 2021-01-27 07:40:30 -05:00
Julien Plu 4adbdce5ee
Clean TF Bert (#9788)
* Start cleaning BERT

* Clean BERT and all those depends of it

* Fix attribute name

* Apply style

* Apply Sylvain's comments

* Apply Lysandre's comments

* remove unused import
2021-01-27 11:28:11 +01:00
Julien Plu a1720694a5
Remove a TF usage warning and rework the documentation (#9756)
* Rework documentation

* Update the template

* Trigger CI

* Restore the warning but with the TF logger

* Update convbert doc
2021-01-27 10:45:42 +01:00
Sylvain Gugger f2fabedbab
Setup logging with a stdout handler (#9816) 2021-01-27 03:39:11 -05:00
Lysandre 897a24c869 Fix head_mask for model templates 2021-01-26 11:02:48 +01:00
Andrea Cappelli 10e5f28212
Improve pytorch examples for fp16 (#9796)
* Pad to 8x for fp16 multiple choice example (#9752)

* Pad to 8x for fp16 squad trainer example (#9752)

* Pad to 8x for fp16 ner example (#9752)

* Pad to 8x for fp16 swag example (#9752)

* Pad to 8x for fp16 qa beam search example (#9752)

* Pad to 8x for fp16 qa example (#9752)

* Pad to 8x for fp16 seq2seq example (#9752)

* Pad to 8x for fp16 glue example (#9752)

* Pad to 8x for fp16 new ner example (#9752)

* update script template #9752

* Update examples/multiple-choice/run_swag.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa_beam_search.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve code quality #9752

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-26 04:47:07 -05:00
Sylvain Gugger caf4abf768
Auto-resume training from checkpoint (#9776)
* Auto-resume training from checkpoint

* Update examples/text-classification/run_glue.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Roll out to other examples

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-25 12:03:51 -05:00
Julien Plu a7dabfb3d1
Fix TF s2s models (#9478)
* Fix Seq2Seq models for serving

* Apply style

* Fix lonfgormer

* Fix mBart/Pegasus/Blenderbot

* Apply style

* Add a main intermediate layer

* Apply style

* Remove import

* Apply tf.function to Longformer

* Fix utils check_copy

* Update S2S template

* Fix BART + Blenderbot

* Fix BlenderbotSmall

* Fix BlenderbotSmall

* Fix BlenderbotSmall

* Fix MBart

* Fix Marian

* Fix Pegasus + template

* Apply style

* Fix common attributes test

* Forgot to fix the LED test

* Apply Patrick's comment on LED Decoder
2021-01-21 17:03:29 +01:00
Julien Plu 3f290e6c84
Fix mixed precision in TF models (#9163)
* Fix Gelu precision

* Fix gelu_fast

* Naming

* Fix usage and apply style

* add TF gelu approximate version

* add TF gelu approximate version

* add TF gelu approximate version

* Apply style

* Fix albert

* Remove the usage of the Activation layer
2021-01-21 07:00:11 -05:00
Julien Plu 7251a4736d
Fix template (#9697) 2021-01-20 09:04:53 -05:00
Julien Plu 14042d560f
New TF embeddings (cleaner and faster) (#9418)
* Create new embeddings + add to BERT

* Add Albert

* Add DistilBert

* Add Albert + Electra + Funnel

* Add Longformer + Lxmert

* Add last models

* Apply style

* Update the template

* Remove unused imports

* Rename attribute

* Import embeddings in their own model file

* Replace word_embeddings per weight

* fix naming

* Fix Albert

* Fix Albert

* Fix Longformer

* Fix Lxmert Mobilebert and MPNet

* Fix copy

* Fix template

* Update the get weights function

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/electra/modeling_tf_electra.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* address Sylvain's comments

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-20 12:08:12 +01:00
Sylvain Gugger 7e662e6a3b
Fix model templates and use less than 119 chars (#9684)
* Fix model templates and use less than 119 chars

* Missing new line
2021-01-19 17:11:22 -05:00
Yusuke Mori b020a736c3
Update `past_key_values` in GPT-2 (#9596)
* Update past_key_values in gpt2 (#9391)

* Update generation_utils, and rename some items

* Update modeling_gpt2 to avoid an error in gradient_checkpointing

* Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2

* Change the location of '_reorder_cache' in modeling files

* Add '_reorder_cache' in modeling_ctrl

* Fix a bug of my last commit in CTRL

* Add '_reorder_cache' to GPT2DoubleHeadsModel

* Manage 'use_cache' in config of test_modeling_gpt2

* Clean up the doc string

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix the doc string (GPT-2, CTRL)

* improve gradient_checkpointing_behavior

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-01-19 16:00:15 +01:00
Patrick von Platen 7f28613213
[TFBart] Split TF-Bart (#9497)
* make templates ready

* make add_new_model_command_ready

* finish tf bart

* prepare tf mbart

* finish tf bart

* add tf mbart

* add marian

* prep pegasus

* add tf pegasus

* push blenderbot tf

* add blenderbot

* add blenderbot small

* clean-up

* make fix copy

* define blend bot tok

* fix

* up

* make style

* add to docs

* add copy statements

* overwrite changes

* improve

* fix docs

* finish

* fix last slow test

* fix missing git conflict line

* fix blenderbot

* up

* fix blenderbot small

* load changes

* finish copied from

* upload fix
2021-01-12 02:06:32 +01:00
Julien Plu 1e3c362235
Fix template (#9512) 2021-01-11 08:03:28 -05:00
Julien Plu 1243ee7d0c
Full rework of the TF input/output embeddings and bias resizing (#9193)
* Start rework resizing

* Rework bias/decoder resizing

* Full resizing rework

* Full resizing rework

* Start to update the models with the new approach

* Finish to update the models

* Update all the tests

* Update the template

* Fix tests

* Fix tests

* Test a new approach

* Refactoring

* Refactoring

* Refactoring

* New rework

* Rework BART

* Rework bert+blenderbot

* Rework CTRL

* Rework Distilbert

* Rework DPR

* Rework Electra

* Rework Flaubert

* Rework Funnel

* Rework GPT2

* Rework Longformer

* Rework Lxmert

* Rework marian+mbart

* Rework mobilebert

* Rework mpnet

* Rework openai

* Rework pegasus

* Rework Roberta

* Rework T5

* Rework xlm+xlnet

* Rework template

* Fix TFT5EncoderOnly + DPRs

* Restore previous methods

* Fix Funnel

* Fix CTRL and TransforXL

* Apply style

* Apply Sylvain's comments

* Restore a test in DPR

* Address the comments

* Fix bug

* Apply style

* remove unused import

* Fix test

* Forgot a method

* missing test

* Trigger CI

* naming update

* Rebase

* Trigger CI
2021-01-11 06:27:28 -05:00
Julien Plu cf416764f4
Fix template (#9504) 2021-01-11 05:21:25 -05:00
Julien Plu 4f7022d68d
Reformat (#9482) 2021-01-10 15:10:15 +01:00
Sylvain Gugger 1bdf42409c
Fast imports part 3 (#9474)
* New intermediate inits

* Update template

* Avoid importing torch/tf/flax in tokenization unless necessary

* Styling

* Shutup flake8

* Better python version check
2021-01-08 07:40:59 -05:00
Sylvain Gugger 758ed3332b
Transformers fast import part 2 (#9446)
* Main init work

* Add version

* Change from absolute to relative imports

* Fix imports

* One more typo

* More typos

* Styling

* Make quality script pass

* Add necessary replace in template

* Fix typos

* Spaces are ignored in replace for some reason

* Forgot one models.

* Fixes for import

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Add documentation

* Styling

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2021-01-07 09:36:14 -05:00
Julien Plu 812045adcc
New serving (#9419)
* Add a serving method

* Add albert

* Add serving for BERT and BART

* Add more models

* Finish the serving addition

* Temp fix

* Restore DPR

* Fix funnel attribute

* Fix attributes GPT2

* Fix OpenAIGPT attribute

* Fix T5 attributes

* Fix Bart attributes

* Fix TransfoXL attributes

* Add versioning

* better test

* Update template

* Fix Flaubert

* Fix T5

* Apply style

* Remove unused imports

* Deactivate extra parameters

* Remove too long test + saved_model default to False

* Ignore the saved model test for some models

* Fix some inputs

* Fix mpnet serving

* Trigger CI

* Address all comments
2021-01-07 11:48:49 +01:00
Patrick von Platen b8462b5b2a
[GenerationOutputs] Fix GenerationOutputs Tests (#9443)
* fix generation models

* fix led

* fix docs

* add is_decoder

* fix last docstrings

* make style

* fix t5 cross attentions

* correct t5
2021-01-06 19:37:02 +01:00
Sylvain Gugger 453a70d4cb
Allow example to use a revision and work with private models (#9407)
* Allow example to use a revision and work with private models

* Copy to other examples and template

* Styling
2021-01-06 06:49:23 -05:00
Patrick von Platen eef66035a2
[PyTorch Bart] Split Bart into different models (#9343)
* first try

* remove old template

* finish bart

* finish mbart

* delete unnecessary line

* init pegasus

* save intermediate

* correct pegasus

* finish pegasus

* remove cookie cutter leftover

* add marian

* finish blenderbot

* replace in file

* correctly split blenderbot

* delete "old" folder

* correct "add statement"

* adapt config for tf comp

* correct configs for tf

* remove ipdb

* fix more stuff

* fix mbart

* push pegasus fix

* fix mbart

* more fixes

* fix research projects code

* finish docs for bart, mbart, and marian

* delete unnecessary file

* correct attn typo

* correct configs

* remove pegasus for seq class

* correct peg docs

* correct peg docs

* finish configs

* further improve docs

* add copied from statements to mbart

* fix copied from in mbart

* add copy statements to marian

* add copied from to marian

* add pegasus copied from

* finish pegasus

* finish copied from

* Apply suggestions from code review

* make style

* backward comp blenderbot

* apply lysandres and sylvains suggestions

* apply suggestions

* push last fixes

* fix docs

* fix tok tests

* fix imports code style

* fix doc
2021-01-05 22:00:05 +01:00
Patrick von Platen 912f6881d2
add import math (#9346) 2020-12-29 19:35:06 +01:00
Patrick von Platen 785e52cd30
improve templates (#9342) 2020-12-29 16:48:44 +01:00
Patrick von Platen 83fdd252f6
[Seq2Seq Templates] Correct some TF-serving errors and add gradient checkpointing to PT by default. (#9334)
* correct tests

* correct shape and get_tf_activation

* more correction tf

* add gradient checkpointing to templates

* correct typo
2020-12-28 17:51:04 +01:00
Patrick von Platen 6c091abef2
[Templates] Adapt Bert (#9284)
* adapt templates

* adapt config

* add test as well

* fix output type

* fix cache false naming

* finish tests

* last fix
2020-12-24 01:44:33 +01:00
Patrick von Platen d5db6c37d4
[Seq2Seq Templates] Fix check_repo.py templates file (#9277)
* add enc dec pt model to check repo

* fix indent
2020-12-23 11:40:20 +01:00
Patrick von Platen cbe63949d7
Model Templates for Seq2Seq (#9251)
* adapt cookie cutter

* fix copy past statement

* delete copy statements for now

* remove unused import from template

* make doc rst

* correct config docstring

* correct training

* correct inputs processing tf enc dec

* make style

* adapt templates

* clean tabs

* correct tensor -> Tensor naming

* correct indent

* correct templates

* fix the test

* break lines to avoid > 119

* Apply suggestions from code review
2020-12-22 23:41:20 +01:00
Sylvain Gugger ab17758874
Add speed metrics to all example scripts + template (#9260) 2020-12-22 14:02:26 -05:00
Julien Plu 161a6461db
Fix TF template (#9234) 2020-12-21 13:52:16 +01:00
Julien Plu 5a8a4eb187
Improve BERT-like models performance with better self attention (#9124)
* Improve BERT-like models attention layers

* Apply style

* Put back error raising instead of assert

* Update template

* Fix copies

* Apply raising valueerror in MPNet

* Restore the copy check for the Intermediate layer in Longformer

* Update longformer
2020-12-21 13:10:15 +01:00
Lysandre Debut 6587cf9f84
Patch *ForCausalLM model (#9092) 2020-12-14 00:39:55 -05:00
Julien Plu 51d9c569fa
Fix embeddings resizing in TF models (#8657)
* Resize the biases in same time than the embeddings

* Trigger CI

* Biases are not reset anymore

* Remove get_output_embeddings + better LM model detection in generation utils

* Apply style

* First test on BERT

* Update docstring + new name

* Apply the new resizing logic to all the models

* fix tests

* Apply style

* Update the template

* Fix naming

* Fix naming

* Apply style

* Apply style

* Remove unused import

* Revert get_output_embeddings

* Trigger CI

* Update num parameters

* Restore get_output_embeddings in TFPretrainedModel and add comments

* Style

* Add decoder resizing

* Style

* Fix tests

* Separate bias and decoder resize

* Fix tests

* Fix tests

* Apply style

* Add bias resizing in MPNet

* Trigger CI

* Apply style
2020-12-13 23:05:24 -05:00
Lysandre Debut 67ff1c314a
Templates overhaul 1 (#8993) 2020-12-08 18:00:07 -05:00
Sylvain Gugger 00aa9dbca2
Copyright (#8970)
* Add copyright everywhere missing

* Style
2020-12-07 18:36:34 -05:00
Julien Plu dcd3046f98
Better booleans handling in the TF models (#8777)
* Apply on BERT and ALBERT

* Update TF Bart

* Add input processing to TF BART

* Add input processing for TF CTRL

* Add input processing to TF Distilbert

* Add input processing to TF DPR

* Add input processing to TF Electra

* Add deprecated arguments

* Add input processing to TF XLM

* Add input processing to TF Funnel

* Add input processing to TF GPT2

* Add input processing to TF Longformer

* Add input processing to TF Lxmert

* Apply style

* Add input processing to TF Mobilebert

* Add input processing to TF GPT

* Add input processing to TF Roberta

* Add input processing to TF T5

* Add input processing to TF TransfoXL

* Apply style

* Rebase on master

* Bug fix

* Retry to bugfix

* Retry bug fix

* Fix wrong model name

* Try another fix

* Fix BART

* Fix input precessing

* Apply style

* Put the deprecated warnings in the input processing function

* Remove the unused imports

* Raise an error when len(kwargs)>0

* test ModelOutput instead of TFBaseModelOutput

* Bug fix

* Address Patrick's comments

* Address Patrick's comments

* Address Sylvain's comments

* Add boolean processing for the inputs

* Apply style

* Missing optional

* Fix missing some input proc

* Update the template

* Fix missing inputs

* Missing input

* Fix args parameter

* Trigger CI

* Trigger CI

* Trigger CI

* Address Patrick's and Sylvain's comments

* Replace warn by warning

* Trigger CI

* Fix XLNET

* Fix detection
2020-12-04 09:08:29 -05:00
Patrick von Platen 443f67e887
[PyTorch] Refactor Resize Token Embeddings (#8880)
* fix resize tokens

* correct mobile_bert

* move embedding fix into modeling_utils.py

* refactor

* fix lm head resize

* refactor

* break lines to make sylvain happy

* add news tests

* fix typo

* improve test

* skip bart-like for now

* check if base_model = get(...) is necessary

* clean files

* improve test

* fix tests

* revert style templates

* Update templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_{{cookiecutter.lowercase_modelname}}.py
2020-12-02 19:19:50 +01:00
Julien Plu 29d4992453
New TF model inputs (#8602)
* Apply on BERT and ALBERT

* Update TF Bart

* Add input processing to TF BART

* Add input processing for TF CTRL

* Add input processing to TF Distilbert

* Add input processing to TF DPR

* Add input processing to TF Electra

* Add input processing for TF Flaubert

* Add deprecated arguments

* Add input processing to TF XLM

* remove unused imports

* Add input processing to TF Funnel

* Add input processing to TF GPT2

* Add input processing to TF Longformer

* Add input processing to TF Lxmert

* Apply style

* Add input processing to TF Mobilebert

* Add input processing to TF GPT

* Add input processing to TF Roberta

* Add input processing to TF T5

* Add input processing to TF TransfoXL

* Apply style

* Rebase on master

* Bug fix

* Retry to bugfix

* Retry bug fix

* Fix wrong model name

* Try another fix

* Fix BART

* Fix input precessing

* Apply style

* Put the deprecated warnings in the input processing function

* Remove the unused imports

* Raise an error when len(kwargs)>0

* test ModelOutput instead of TFBaseModelOutput

* Bug fix

* Address Patrick's comments

* Address Patrick's comments

* Address Sylvain's comments

* Add the new inputs in new Longformer models

* Update the template with the new input processing

* Remove useless assert

* Apply style

* Trigger CI
2020-11-24 13:55:00 -05:00
Stas Bekman e84786aaa6
consistent ignore keys + make private (#8737)
* consistent ignore keys + make private

* style

* - authorized_missing_keys    => _keys_to_ignore_on_load_missing
  - authorized_unexpected_keys => _keys_to_ignore_on_load_unexpected

* move public doc of private attributes to private comment
2020-11-23 12:33:13 -08:00
Sylvain Gugger a0c62d2493
Fix training from scratch in new scripts (#8623) 2020-11-18 12:15:26 -05:00
Sylvain Gugger dd52804f5f
Remove deprecated (#8604)
* Remove old deprecated arguments

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Remove needless imports

* Fix tests

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-11-17 15:11:29 -05:00
Sylvain Gugger 36a19915ea
Fix model templates (#8595)
* First fixes

* Fix imports and add init

* Fix typo

* Move init to final dest

* Fix tokenization import

* More fixes

* Styling
2020-11-17 10:35:38 -05:00
Julien Chaumond 042a6aa777
Tokenizers: ability to load from model subfolder (#8586)
* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-11-17 08:58:45 -05:00
Sylvain Gugger c89bdfbe72
Reorganize repo (#8580)
* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command
2020-11-16 21:43:42 -05:00
Sylvain Gugger 1073a2bde5
Switch `return_dict` to `True` by default. (#8530)
* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests
2020-11-16 11:43:00 -05:00
Lysandre Debut 826f04576f
Model templates encoder only (#8509)
* Model templates

* TensorFlow

* Remove pooler

* CI

* Tokenizer + Refactoring

* Encoder-Decoder

* Let's go testing

* Encoder-Decoder in TF

* Let's go testing in TF

* Documentation

* README

* Fixes

* Better names

* Style

* Update docs

* Choose to skip either TF or PT

* Code quality fixes

* Add to testing suite

* Update file path

* Cookiecutter path

* Update `transformers` path

* Handle rebasing

* Remove seq2seq from model templates

* Remove s2s config

* Apply Sylvain and Patrick comments

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last fixes from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-13 11:59:30 -05:00
Sylvain Gugger cf89724696
Fix validation file loading in scripts (#8298) 2020-11-04 10:42:18 -05:00
Sylvain Gugger cdc48ce92d
Finalize lm examples (#8188)
* Finish the cleanup of the language-modeling examples

* Update main README

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Propagate changes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-30 14:20:18 -04:00
Sylvain Gugger 691176283d
Add a template for examples and apply it for mlm and plm examples (#8153)
* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Styling
2020-10-29 13:38:11 -04:00
Santiago Castro 969859d5f6
Fix doc errors and typos across the board (#8139)
* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes
2020-10-29 10:33:33 -04:00
Sylvain Gugger 378142afdf
Rename add_start_docstrings_to_callable (#8120) 2020-10-28 13:42:31 -04:00
Stas Bekman 3e31e7f956
[testing] rename skip targets + docs (#7863)
* rename skip targets + docs

* fix quotes

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small improvements

* fix

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-20 04:39:13 -04:00
Thomas Wolf ba8c4d0ac0
[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659)
* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉

* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-18 20:51:24 +02:00
Sylvain Gugger 13c1857718
Fix typo in all model docs (#7714) 2020-10-12 04:06:59 -04:00
Sylvain Gugger b2b7fc7814
Check and update model list in index.rst automatically (#7527)
* Check and update model list in index.rst automatically

* Check and update model list in index.rst automatically

* Adapt template
2020-10-05 09:40:45 -04:00
Forrest Iandola 02ef825be2
SqueezeBERT architecture (#7083)
* configuration_squeezebert.py

thin wrapper around bert tokenizer

fix typos

wip sb model code

wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working

set up squeezebert to use BertModelOutput when returning results.

squeezebert documentation

formatting

allow head mask that is an array of [None, ..., None]

docs

docs cont'd

path to vocab

docs and pointers to cloud files (WIP)

line length and indentation

squeezebert model cards

formatting of model cards

untrack modeling_squeezebert_scratchpad.py

update aws paths to vocab and config files

get rid of stub of NSP code, and advise users to pretrain with mlm only

fix rebase issues

redo rebase of modeling_auto.py

fix issues with code formatting

more code format auto-fixes

move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert

tests for squeezebert modeling and tokenization

fix typo

move squeezebert before bert in modeling_auto.py to fix inheritance problem

disable test_head_masking, since squeezebert doesn't yet implement head masking

fix issues exposed by the test_modeling_squeezebert.py

fix an issue exposed by test_tokenization_squeezebert.py

fix issue exposed by test_modeling_squeezebert.py

auto generated code style improvement

issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()

update copyright

resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask

docs

add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli

autogenerated formatting tweaks

integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings

* tiny change to order of imports
2020-10-05 04:25:43 -04:00
Sylvain Gugger 0ccb6f5c6d
Clean RAG docs and template docs (#7348)
* Clean RAG docs and template docs

* Fix typo

* Better doc
2020-09-24 09:24:41 -04:00
Stas Bekman b0cbcdb05b
[logging] remove no longer needed verbosity override (#7100) 2020-09-15 04:01:14 -04:00