Commit Graph

5117 Commits

Author SHA1 Message Date
Sam Shleifer 61b7ba93f5
Marian distill scripts + integration test (#6799) 2020-08-31 13:48:26 -04:00
Jin Young (Daniel) Sohn 02d09c8fcc
Only access loss tensor every logging_steps (#6802)
* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* Fix style (#6803)

* t5 model should make decoder_attention_mask (#6800)

* [s2s] Test hub configs in self-scheduled CI (#6809)

* [s2s] round runtime in run_eval (#6798)

* Pegasus finetune script: add --adafactor (#6811)

* [bart] rename self-attention -> attention (#6708)

* [tests] fix typos in inputs (#6818)

* Fixed open in colab link (#6825)

* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)

* BR_BERTo model card (#6793)

* clearly indicate shuffle=False (#6312)

* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* [s2s README] Add more dataset download instructions (#6737)

* Style

* Patch logging issue

* Set default logging level to `WARNING` instead of `INFO`

* TF Flaubert w/ pre-norm (#6841)

* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)

* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Fix in Adafactor docstrings (#6845)

* Fix resuming training for Windows (#6847)

* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* comments

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-31 11:35:51 -04:00
Sylvain Gugger c48546c7f7
Fix resuming training for Windows (#6847) 2020-08-31 11:02:30 -04:00
Sylvain Gugger d2f9cb838e
Fix in Adafactor docstrings (#6845) 2020-08-31 10:52:47 -04:00
Huang Lianzhe 2de7ee0385
Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-31 08:25:00 -04:00
Lysandre Debut 895d394669
TF Flaubert w/ pre-norm (#6841) 2020-08-31 04:53:20 -04:00
Lysandre 4561f05c5f Set default logging level to `WARNING` instead of `INFO` 2020-08-31 09:56:25 +02:00
Lysandre 05c3214153 Patch logging issue 2020-08-31 09:37:08 +02:00
Sam Shleifer dfa10a41ba
[s2s README] Add more dataset download instructions (#6737) 2020-08-30 16:29:24 -04:00
xujiaze13 32fe44086c
clearly indicate shuffle=False (#6312)
* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-30 19:26:10 +08:00
Rodolfo De Nadai 0eecaceac7
BR_BERTo model card (#6793) 2020-08-30 19:02:46 +08:00
Zane Lim d176aaad7f
Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827) 2020-08-30 18:21:49 +08:00
Thomas Ashish Cherian a5847619e3
Fixed open in colab link (#6825) 2020-08-30 18:21:00 +08:00
Stas Bekman 563485bf95
[tests] fix typos in inputs (#6818) 2020-08-30 18:19:57 +08:00
Sam Shleifer 22933e661f
[bart] rename self-attention -> attention (#6708) 2020-08-29 18:03:08 -04:00
Sam Shleifer 0f58903bb6
Pegasus finetune script: add --adafactor (#6811) 2020-08-29 17:43:32 -04:00
Sam Shleifer ac47458a02
[s2s] round runtime in run_eval (#6798) 2020-08-29 17:36:31 -04:00
Sam Shleifer 5ab21b072f
[s2s] Test hub configs in self-scheduled CI (#6809) 2020-08-28 17:05:52 -04:00
Sam Shleifer 3cac867fac
t5 model should make decoder_attention_mask (#6800) 2020-08-28 15:22:33 -04:00
Sam Shleifer 20f7786453
Fix style (#6803) 2020-08-28 15:02:25 -04:00
Sam Shleifer 9336086ab5
prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654)
* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc
2020-08-28 11:15:17 -04:00
RafaelWO cb276b41de
Transformer-XL: Improved tokenization with sacremoses (#6322)
* Improved tokenization with sacremoses

 * The TransfoXLTokenizer is now using sacremoses for tokenization
 * Added tokenization of comma-separated and floating point numbers.
 * Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
 * Added corresponding tests
 * Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
 * Added deprecation warning to TransfoXLTokenizerFast

* isort change

Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-28 09:56:17 -04:00
Ahmed Elnaggar 930153e7d2
Add ProtBert model card (#6764) 2020-08-28 12:12:28 +08:00
Stas Bekman 743d131d76
[style] set the minimal required version for `black` (#6784)
`make style` with `black` < 20.8b1 is a no go (in case some other package forced a lower version) - so make it explicit to avoid confusion
2020-08-28 11:38:09 +08:00
Sam Shleifer fb78a90d6a
PL: --adafactor option (#6776) 2020-08-27 22:19:46 -04:00
Stas Bekman 92ac2fa7d1
[transformers-cli] fix logger getter (#6777) 2020-08-27 20:01:17 -04:00
Lysandre 42fddacd1c Format 2020-08-27 18:31:51 +02:00
Stas Bekman 70fccc5cf3
new Makefile target: docs (#6510)
* [doc] multiple corrections to "Summary of the tasks"

* add a new "docs" target to validate docs and document it

* fix mixup
2020-08-27 12:25:16 -04:00
Stas Bekman dbfe34f2f5
[test schedulers] adjust to test the first step's reading (#6429)
* [test schedulers] small improvement

* cleanup
2020-08-27 12:23:28 -04:00
Stas Bekman e6b811f0a7
[testing] replace hardcoded paths to allow running tests from anywhere (#6523)
* [testing] replace hardcoded paths to allow running tests from anywhere

* fix the merge conflict
2020-08-27 12:22:18 -04:00
Sam Shleifer 9d1b4db2aa
add nlp install (#6767) 2020-08-27 11:08:14 -04:00
Tom Grek c225e872ed
Fix it to work with BART (#6756) 2020-08-27 09:04:50 -04:00
Lysandre 0d2c111a0c Format 2020-08-27 14:56:47 +02:00
Julien Plu 6f289dc97a
Fix the TF Trainer gradient accumulation and the TF NER example (#6713)
* Align TF NER example over the PT one

* Fix Dataset call

* Fix gradient accumulation training

* Apply style

* Address Sylvain's comments

* Address Sylvain's comments

* Apply style
2020-08-27 08:45:34 -04:00
Lysandre Debut 41aa2b4ef1
Adafactor docs (#6765) 2020-08-27 05:16:50 -04:00
Nikolai Yakovenko 971d1802d0
Add AdaFactor optimizer from fairseq (#6722)
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.

* update PR fixes, add basic test

* bug -- incorrect params in test

* bugfix -- import Adafactor into test

* bugfix -- removed accidental T5 include

* resetting T5 to master

* bugfix -- include Adafactor in __init__

* longer loop for adafactor test

* remove double error class declare

* lint

* black

* isort

* Update src/transformers/optimization.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* single docstring

* Cleanup docstring

Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-27 04:58:13 -04:00
Sam Shleifer 4bd7be9a42
s2s distillation uses AutoModelForSeqToSeqLM (#6761) 2020-08-26 23:25:11 -04:00
Ahmed Elnaggar 05e7150a53
create ProtBert-BFD model card. (#6724) 2020-08-27 02:19:19 +02:00
Sam Shleifer 61518e2df3
[s2s] run_eval.py QOL improvements and cleanup(#6746) 2020-08-26 18:59:20 -04:00
Igli Manaj 434936f34a
Model Card for Multilingual Passage Reranking BERT (#6755) 2020-08-26 18:00:27 -04:00
Joe Davison 10a34501f1
add __init__.py to utils (#6754) 2020-08-26 23:51:10 +02:00
Ali Safaya 61b9ed8074
Model card for kuisailab/albert-large-arabic (#6730)
* Create README.md

* Update README.md
2020-08-26 17:27:56 -04:00
Ali Safaya 8e0d51e4f2
Model card for kuisailab/albert-xlarge-arabic (#6731)
* Create README.md

* Update README.md
2020-08-26 17:27:42 -04:00
Ali Safaya 70c96a10e9
Model card for kuisailab/albert-base-arabic (#6729)
* Create README.md

* Update README.md
2020-08-26 17:27:34 -04:00
Sagor Sarker cc4ba79f68
added model card for codeswitch-spaeng-sentiment-analysis-lince (#6727)
* added model card for codeswitch-spaeng-sentiment-analysis-lince model also update other model card

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* Update README.md
2020-08-26 17:26:32 -04:00
Tanmay Thakur e10fb9cbe6
Create model card for lordtt13/COVID-SciBERT (#6718) 2020-08-26 17:22:25 -04:00
Adam Montgomerie baeba53e88
Adding model cards for 5 models (#6703)
* Added model cards for 4 models

Added model cards for:
- roberta-base-bulgarian
- roberta-base-bulgarian-pos
- roberta-small-bulgarian
- roberta-small-bulgarian-pos

* fixed link text

* Update README.md

* Create README.md

* removed trailing bracket

* Add language metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-26 17:20:55 -04:00
Julien Chaumond 3242e4d942 [model_cards] Fix tiny typos 2020-08-26 23:16:06 +02:00
Joe Davison 99407f9d1e
add xlm-roberta-large-xnli model card (#6723)
* add xlm-roberta-large-xnli model card

* update pt example

* typo
2020-08-26 16:05:59 -04:00
Patrick von Platen 858b7d5873
[TF Longformer] Improve Speed for TF Longformer (#6447)
* add tf graph compile tests

* fix conflict

* remove more tf transpose statements

* fix conflicts

* fix comment typos

* move function to class function

* fix black

* fix black

* make style
2020-08-26 14:55:41 -04:00