* Only access loss tensor every logging_steps
* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards
* Fix style (#6803)
* t5 model should make decoder_attention_mask (#6800)
* [s2s] Test hub configs in self-scheduled CI (#6809)
* [s2s] round runtime in run_eval (#6798)
* Pegasus finetune script: add --adafactor (#6811)
* [bart] rename self-attention -> attention (#6708)
* [tests] fix typos in inputs (#6818)
* Fixed open in colab link (#6825)
* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)
* BR_BERTo model card (#6793)
* clearly indicate shuffle=False (#6312)
* Clarify shuffle
* clarify shuffle
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* [s2s README] Add more dataset download instructions (#6737)
* Style
* Patch logging issue
* Set default logging level to `WARNING` instead of `INFO`
* TF Flaubert w/ pre-norm (#6841)
* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Fix in Adafactor docstrings (#6845)
* Fix resuming training for Windows (#6847)
* Only access loss tensor every logging_steps
* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards
* comments
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Improved tokenization with sacremoses
* The TransfoXLTokenizer is now using sacremoses for tokenization
* Added tokenization of comma-separated and floating point numbers.
* Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
* Added corresponding tests
* Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
* Added deprecation warning to TransfoXLTokenizerFast
* isort change
Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.
* update PR fixes, add basic test
* bug -- incorrect params in test
* bugfix -- import Adafactor into test
* bugfix -- removed accidental T5 include
* resetting T5 to master
* bugfix -- include Adafactor in __init__
* longer loop for adafactor test
* remove double error class declare
* lint
* black
* isort
* Update src/transformers/optimization.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* single docstring
* Cleanup docstring
Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* add tf graph compile tests
* fix conflict
* remove more tf transpose statements
* fix conflicts
* fix comment typos
* move function to class function
* fix black
* fix black
* make style