Commit Graph

196 Commits

Author SHA1 Message Date
Stas Bekman 5423f2a9d4
[testing] port test_trainer_distributed to distributed pytest + TestCasePlus enhancements (#8107)
* move the helper code into testing_utils

* port test_trainer_distributed to work with pytest

* improve docs

* simplify notes

* doc

* doc

* style

* doc

* further improvements

* torch might not be available

* real fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-28 11:51:32 -04:00
Patrick von Platen 664c7ec453
[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token (#8043)
* make sure padding is implemented for non-padding tokens models as well

* add better error message

* add better warning

* remove results files

* Update examples/seq2seq/seq2seq_trainer.py

* remove unnecessary copy line

* correct usage of labels

* delete test files
2020-10-26 17:28:16 +01:00
Patrick von Platen 3c682ea15c
[Examples] Allow EncoderDecoderModels to be trained with Seq2Seq (#7809)
* Make Seq2Seq Trainer more similar to Trainer

* fix typo

* fix seq2seq trainer

* remove from tests

* remove lock

* remove train files

* delete test files

* correct typo

* check at init

* make sure trainer is not slowed down on TPU

* correct isort

* remove use cache

* fix use cache

* add last use chache = false
2020-10-23 23:05:51 +02:00
Stas Bekman 023f0f3708
[s2s trainer] tests to use distributed on multi-gpu machine (#7965) 2020-10-22 17:26:22 -04:00
Stas Bekman 8b38173398
[seq2seq testing] multigpu test run via subprocess (#7281)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-21 17:20:53 -04:00
Stas Bekman 0e24e4c136
[s2s] create doc for pegasus/fsmt replication (#7934) 2020-10-20 15:07:52 -04:00
Stas Bekman 3e31e7f956
[testing] rename skip targets + docs (#7863)
* rename skip targets + docs

* fix quotes

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small improvements

* fix

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-20 04:39:13 -04:00
Stas Bekman 9f7b2b2432
[s2s testing] turn all to unittests, use auto-delete temp dirs (#7859) 2020-10-17 14:33:21 -04:00
Stas Bekman 1652ddad35
[seq2seq testing] improve readability (#7845) 2020-10-16 09:05:29 -04:00
Sam Shleifer 96e47d9229
[cleanup] assign todos, faster bart-cnn test (#7835)
* 2 beam output

* unassign/remove TODOs

* remove one more
2020-10-16 03:11:18 -04:00
Stas Bekman 2255c2c7a0
[seq2seq] get_git_info fails gracefully (#7843)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-16 00:22:43 -04:00
Sylvain Gugger a1d1b332d0
Add predict step accumulation (#7767)
* Add eval_accumulation_step and clean distributed eval

* Add TPU test

* Add TPU stuff

* Fix arg name

* Fix Seq2SeqTrainer

* Fix total_size

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc and add test to TPU

* Add unit test

* Adapt name

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-14 11:41:45 -04:00
Tiger 7e73c12805
fixed lots of typos. (#7758) 2020-10-13 10:00:20 -04:00
Sam Shleifer 9c2b2db2cd
[marian] Automate Tatoeba-Challenge conversion (#7709) 2020-10-12 12:24:25 -04:00
Sam Shleifer 827c519494
[examples] bump pl=0.9.0 (#7053) 2020-10-11 16:39:38 -04:00
Sam Shleifer 297233fa92
[s2s] Switch README urls to cdn (#7670) 2020-10-08 21:22:22 -04:00
Sam Shleifer a1ecc90d6b
[pseudo] Switch URLS to CDN (#7661) 2020-10-08 14:12:39 -04:00
Suraj Patil 06a973fd2a
[s2s] configure lr_scheduler from command line (#7641) 2020-10-08 13:06:35 -04:00
Sam Shleifer aba4e22944
[pseudolabels] cleanup markdown table (#7653) 2020-10-07 23:04:18 -04:00
Sam Shleifer e2bb9abb6a
[s2s] release pseudolabel links and instructions (#7639) 2020-10-07 11:20:44 -04:00
Sylvain Gugger 08ba4b4902
Trainer callbacks (#7596)
* Initial callback proposal

* Finish various callbacks

* Post-rebase conflicts

* Fix tests

* Don't use something that's not set

* Documentation

* Remove unwanted print.

* Document all models can work

* Add tests + small fixes

* Update docs/source/internal/trainer_utils.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Fix TF tests

* Real fix this time

* This one should work

* Fix typo

* Really fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-07 10:50:21 -04:00
Sam Shleifer 500be01c5d
[s2s] save first batch to json for debugging purposes (#6810) 2020-10-06 16:11:56 -04:00
Sam Shleifer d5d2744aa7
Support T5 Distillation w/hidden state supervision (#7599) 2020-10-05 21:31:48 -04:00
Suraj Patil 99cb924bfb
[s2s] add config params like Dropout in Seq2SeqTrainingArguments (#7532) 2020-10-04 12:42:30 -04:00
Sam Shleifer 9bdce3a4f9
[s2s] fix lockfile and peg distillation constants (#7545) 2020-10-02 15:58:14 -04:00
Sam Shleifer de4d7b004a
[s2s] Adafactor support for builtin trainer (#7522) 2020-10-01 17:27:45 -04:00
Sam Shleifer d3a9601a11
[s2s] trainer scripts: Remove --run_name, thanks sylvain! (#7521) 2020-10-01 17:18:47 -04:00
Sylvain Gugger bdcc4b78a2
Fix seq2seq example test (#7518)
* Fix seq2seq example test

* Fix bad copy-paste

* Also save the state
2020-10-01 14:13:29 -04:00
Sam Shleifer 2a358f45ef
[s2s] fix nltk pytest race condition with FileLock (#7515) 2020-10-01 12:51:09 -04:00
Suraj Patil 72d363d979
[examples/s2s] clean up finetune_trainer (#7509) 2020-10-01 12:19:29 -04:00
Sam Shleifer 48f23f92a8
[s2sTrainer] test + code cleanup (#7467) 2020-10-01 00:33:01 -04:00
Sam Shleifer 03e46c1de3
[s2s] fix kwargs style (#7488) 2020-09-30 17:00:06 -04:00
Sam Shleifer 6fe8a693eb
[s2s] Fix t5 warning for distributed eval (#7487) 2020-09-30 16:58:03 -04:00
Amanpreet Singh c031d01023
Seq2SeqDataset: avoid passing src_lang everywhere (#7470)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-30 13:27:48 -04:00
Suraj Patil 08939cfdf7
[s2strainer] fix eval dataset loading (#7477) 2020-09-30 12:39:13 -04:00
Sam Shleifer 74d8d69bd4
[s2s] consistent output format across eval scripts (#7435) 2020-09-28 23:20:03 -04:00
Sam Shleifer 748425d47d
[T5] allow config.decoder_layers to control decoder size (#7409)
* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
2020-09-28 03:08:04 -04:00
Sam Shleifer 7296fea1d6
[s2s] rougeLSum expects \n between sentences (#7410)
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
2020-09-27 16:27:19 -04:00
Suraj Patil eab5f59682
[s2s] add create student script (#7290)
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-27 15:10:46 -04:00
Suraj Patil 415071b4c2
doc changes (#7385) 2020-09-25 08:00:36 -04:00
Suraj Patil 9e68d075a4
Seq2SeqTrainer (#6769)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 18:46:58 -04:00
Sam Shleifer d9d0f1140b
[s2s] distributed eval allows num_return_sequences > 1 (#7254) 2020-09-24 17:30:09 -04:00
Stas Bekman eadd870b2f
[seq2seq] make it easier to run the scripts (#7274) 2020-09-24 15:23:48 -04:00
Sam Shleifer 78387cc63e
[s2s] only save metrics.json from rank zero (#7331) 2020-09-22 18:27:28 -04:00
Sam Shleifer e53138a1b9
[s2s] add src_lang kwarg for distributed eval (#7300) 2020-09-22 18:26:37 -04:00
Sam Shleifer 25b0463d0b
[s2s] add supported architecures to MD (#7252) 2020-09-22 13:09:35 -04:00
Sam Shleifer 656c27c3a3
[s2s] save hostname with repo info (#7301)
* save hostname
2020-09-21 17:26:24 -04:00
Stas Bekman af4b98ed97
[s2s] adjust finetune + test to work with fsmt (#7263) 2020-09-21 15:13:19 -04:00
Stas Bekman 8d562a2d1a
[s2s] s/alpha_loss_encoder/alpha_encoder_loss/ (#7298)
fix to match `distillation.py:        self.alpha_encoder_loss`
2020-09-21 14:14:26 -04:00
Stas Bekman cbb2f75a16
[s2s tests] fix test_run_eval_search (#7297) 2020-09-21 14:00:40 -04:00
Stas Bekman 7cbf0f722d
examples/seq2seq/__init__.py mutates sys.path (#7194) 2020-09-20 16:54:42 -04:00
Sam Shleifer 83dba10b8f
[s2s] distributed_eval.py saves better speed info (#7242) 2020-09-18 15:46:01 -04:00
Sam Shleifer 67d9fc50d9
[s2s] remove double assert (#7223) 2020-09-17 18:32:31 -04:00
Sam Shleifer a5638b2b3a
[s2s] dynamic batch size with --max_tokens_per_batch (#7030) 2020-09-17 15:19:34 -04:00
Stas Bekman efeab6a3f1
[s2s] run_eval/run_eval_search tweaks (#7192)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-17 14:26:38 -04:00
Stas Bekman 1eeb206bef
[ported model] FSMT (FairSeq MachineTranslation) (#6940)
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in https://github.com/huggingface/transformers/pull/6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad15ca.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code #3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (#7034)

* [s2s] --eval_max_generate_length (#7018)

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-17 11:31:29 -04:00
Sam Shleifer 45b0b1ff2f
[s2s] fix kwarg typo (#7196) 2020-09-16 21:58:57 -04:00
Sam Shleifer 0203ad43bc
[s2s] distributed eval cleanup (#7186) 2020-09-16 15:38:37 -04:00
sgugger 3babef815c Formatting 2020-09-16 14:57:09 -04:00
Stas Bekman fdaf8ab349
[s2s run_eval] new features (#7109)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-16 13:59:57 -04:00
Sam Shleifer 33d479d2b2
[s2s] distributed eval in one command (#7124) 2020-09-14 15:57:56 -04:00
Sam Shleifer 0fab39695a
[s2s distill] allow pegasus-12-12 (#7104) 2020-09-14 00:03:59 -04:00
Sam Shleifer de9e297964
[s2s] distributed eval cleanup (#7110) 2020-09-13 23:40:38 -04:00
Sam Shleifer e7f8d2ab64
[s2s] two stage run_distributed_eval.py (#7105) 2020-09-13 17:28:18 -04:00
Sam Shleifer b76cb1c3df
[s2s] run_eval supports --prefix clarg. (#6953) 2020-09-12 01:08:21 -04:00
Sam Shleifer 77950c485a
[wip/s2s] DistributedSortishSampler (#7056) 2020-09-10 15:23:44 -04:00
Sylvain Gugger 514486739c
Fix CI with change of name of nlp (#7054)
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
2020-09-10 14:51:08 -04:00
Sam Shleifer e9a2f772bc
[s2s] --eval_max_generate_length (#7018) 2020-09-10 14:11:34 -04:00
Sam Shleifer ce37be9d94
[s2s] warn if --fp16 for torch 1.6 (#6977) 2020-09-06 20:41:29 -04:00
Sam Shleifer a4fc0c80b1
[s2s] run_eval.py parses generate_kwargs (#6948) 2020-09-04 14:19:31 -04:00
Sam Shleifer 6078b12098
[s2s] distill: --normalize_hidden --supervise_forward (#6834) 2020-09-04 14:05:56 -04:00
Sam Shleifer e95d262f25
[s2s] support early stopping based on loss, rather than rouge (#6927) 2020-09-03 17:31:35 -04:00
Sam Shleifer 207ed8cb78
[s2s] use --eval_beams command line arg (#6926) 2020-09-03 12:42:09 -04:00
Sam Shleifer 39ed68d597
[s2s] allow task_specific_params=summarization_xsum (#6923) 2020-09-03 11:11:40 -04:00
Sam Shleifer 5a318f075a
[s2s]: script to convert pl checkpoints to hf checkpoints (#6911)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 09:47:00 -04:00
brett koonce b8e4906c97
tweak tar command in readme (#6919) 2020-09-03 09:29:01 -04:00
Sam Shleifer b9772897ec
[s2s] command line args for faster val steps (#6833) 2020-08-31 16:16:10 -04:00
Sam Shleifer 61b7ba93f5
Marian distill scripts + integration test (#6799) 2020-08-31 13:48:26 -04:00
Sam Shleifer dfa10a41ba
[s2s README] Add more dataset download instructions (#6737) 2020-08-30 16:29:24 -04:00
Sam Shleifer 0f58903bb6
Pegasus finetune script: add --adafactor (#6811) 2020-08-29 17:43:32 -04:00
Sam Shleifer ac47458a02
[s2s] round runtime in run_eval (#6798) 2020-08-29 17:36:31 -04:00
Sam Shleifer 5ab21b072f
[s2s] Test hub configs in self-scheduled CI (#6809) 2020-08-28 17:05:52 -04:00
Sam Shleifer 9336086ab5
prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654)
* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc
2020-08-28 11:15:17 -04:00
Sam Shleifer fb78a90d6a
PL: --adafactor option (#6776) 2020-08-27 22:19:46 -04:00
Sam Shleifer 4bd7be9a42
s2s distillation uses AutoModelForSeqToSeqLM (#6761) 2020-08-26 23:25:11 -04:00
Sam Shleifer 61518e2df3
[s2s] run_eval.py QOL improvements and cleanup(#6746) 2020-08-26 18:59:20 -04:00
Lysandre a75c64d80c Black 20 release 2020-08-26 17:20:22 +02:00
Sam Shleifer 0344428f79
[s2s] round bleu, rouge to 4 digits (#6704) 2020-08-25 00:33:11 -04:00
Sylvain Gugger a573777901
Update repo to isort v5 (#6686)
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
2020-08-24 11:03:01 -04:00
Sam Shleifer d2da2cb232
allow spaces in bash args with "$@" (#6521) 2020-08-17 09:06:35 -04:00
Sam Shleifer 84c265ffcc
[lightning_base] fix s2s logging, only make train_loader once (#6404) 2020-08-16 22:49:41 -04:00
Sam Shleifer 72add6c98f
[s2s] docs, document desired filenames nicely (#6525) 2020-08-16 20:31:22 -04:00
Kyle Piira 2060181126
Fixes paths with spaces in seq2seq example (#6493) 2020-08-16 13:36:38 -04:00
Sam Shleifer e92efcf728
Mult rouge by 100: standard units (#6359) 2020-08-13 12:15:54 -04:00
Sam Shleifer f94a52cd79
[s2s] add BartTranslationDistiller for distilling mBART (#6363) 2020-08-12 11:41:04 -04:00
Stas Bekman 87b359439f
[test] replace capsys with the more refined CaptureStderr/CaptureStdout (#6422)
* replace capsys with the more refined CaptureStderr/CaptureStdout

* Update examples/seq2seq/test_seq2seq_examples.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-12 07:54:28 -04:00
Sam Shleifer be1520d3a3
rename prepare_translation_batch -> prepare_seq2seq_batch (#6103) 2020-08-11 15:57:07 -04:00
Sam Shleifer 66fa8ceaea
PegasusForConditionalGeneration (torch version) (#6340)
Co-authored-by: Jingqing  Zhang <jingqing.zhang15@imperial.ac.uk>
2020-08-11 14:31:23 -04:00
Stas Bekman f6cb0f806e
[s2s] wmt download script use less ram (#6405) 2020-08-11 12:04:17 -04:00
Sam Shleifer b9ecd92ee4
[s2s] Script to save wmt data to disk (#6403) 2020-08-10 22:49:39 -04:00