* Add cache_dir to save features TextDataset
This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).
* style
* Introduce HPO checkpointing for PBT
* Moved checkpoint saving
* Fixed checkpoint subdir pass
* Fixed style
* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT
* Adjust number of GPUs to number of jobs
* Avoid mode pickling in ray
* Move hp search to integrations
* Only access loss tensor every logging_steps
* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards
* Fix style (#6803)
* t5 model should make decoder_attention_mask (#6800)
* [s2s] Test hub configs in self-scheduled CI (#6809)
* [s2s] round runtime in run_eval (#6798)
* Pegasus finetune script: add --adafactor (#6811)
* [bart] rename self-attention -> attention (#6708)
* [tests] fix typos in inputs (#6818)
* Fixed open in colab link (#6825)
* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)
* BR_BERTo model card (#6793)
* clearly indicate shuffle=False (#6312)
* Clarify shuffle
* clarify shuffle
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* [s2s README] Add more dataset download instructions (#6737)
* Style
* Patch logging issue
* Set default logging level to `WARNING` instead of `INFO`
* TF Flaubert w/ pre-norm (#6841)
* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Fix in Adafactor docstrings (#6845)
* Fix resuming training for Windows (#6847)
* Only access loss tensor every logging_steps
* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards
* comments
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>