* improve names and tests longformer
* more and better tests for longformer
* add first tf test
* finalize tf basic op functions
* fix merge
* tf shape test passes
* narrow down discrepancies
* make longformer local attn tf work
* correct tf longformer
* add first global attn function
* add more global longformer func
* advance tf longformer
* finish global attn
* upload big model
* finish all tests
* correct false any statement
* fix common tests
* make all tests pass except keras save load
* fix some tests
* fix torch test import
* finish tests
* fix test
* fix torch tf tests
* add docs
* finish docs
* Update src/transformers/modeling_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply Lysandres suggestions
* reverse to assert statement because function will fail otherwise
* applying sylvains recommendations
* Update src/transformers/modeling_longformer.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* improve unit tests
this is a sample of one test according to the request in https://github.com/huggingface/transformers/issues/5973
before I apply it to the rest
* batch 1
* batch 2
* batch 3
* batch 4
* batch 5
* style
* non-tf template
* last deletion of check_loss_output
* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``
* DOC: Apply Black Formatting
* Fix errors where output_attentions was undefined
* Remove output_attentions in classes per review
* Fix regressions on tests having `output_attention`
* Fix further regressions in tests relating to `output_attentions`
Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses
* Fix more regressions in `test_output_attentions`
* Fix issues with BertEncoder
* Rename related variables to `output_attentions`
* fix pytorch tests
* fix bert and gpt2 tf
* Fix most TF tests for `test_output_attentions`
* Fix linter errors and more TF tests
* fix conflicts
* DOC: Apply Black Formatting
* Fix errors where output_attentions was undefined
* Remove output_attentions in classes per review
* Fix regressions on tests having `output_attention`
* fix conflicts
* fix conflicts
* fix conflicts
* fix conflicts
* fix pytorch tests
* fix conflicts
* fix conflicts
* Fix linter errors and more TF tests
* fix tf tests
* make style
* fix isort
* improve output_attentions
* improve tensorflow
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* better api
* improve automatic setting of global attention mask
* fix longformer bug
* fix global attention mask in test
* fix global attn mask flatten
* fix slow tests
* update docstring
* update docs and make more robust
* improve attention mask
* add multiple choice for longformer
* add models to docs
* adapt docstring
* add test to longformer
* add longformer for mc in init and modeling auto
* fix tests
* added LongformerForQuestionAnswering
* add LongformerForQuestionAnswering
* fix import for LongformerForMaskedLM
* add LongformerForQuestionAnswering
* hardcoded sep_token_id
* compute attention_mask if not provided
* combine global_attention_mask with attention_mask when provided
* update example in docstring
* add assert error messages, better attention combine
* add test for longformerForQuestionAnswering
* typo
* cast gloabl_attention_mask to long
* make style
* Update src/transformers/configuration_longformer.py
* Update src/transformers/configuration_longformer.py
* fix the code quality
* Merge branch 'longformer-for-question-answering' of https://github.com/patil-suraj/transformers into longformer-for-question-answering
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* first commit
* bug fixes
* better examples
* undo padding
* remove wrong VOCAB_FILES_NAMES
* License
* make style
* make isort happy
* unit tests
* integration test
* make `black` happy by undoing `isort` changes!!
* lint
* no need for the padding value
* batch_size not bsz
* remove unused type casting
* seqlen not seq_len
* staticmethod
* `bert` selfattention instead of `n2`
* uint8 instead of bool + lints
* pad inputs_embeds using embeddings not a constant
* black
* unit test with padding
* fix unit tests
* remove redundant unit test
* upload model weights
* resolve todo
* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_
* increase unittest coverage