Commit Graph

39 Commits

Author SHA1 Message Date
Manuel Romero 66a14a2f6f
Fix link to old NER fine-tuning script (#9182) 2020-12-17 19:50:01 -05:00
Sylvain Gugger 783d7d2629
Reorganize examples (#9010)
* Reorganize example folder

* Continue reorganization

* Change requirements for tests

* Final cleanup

* Finish regroup with tests all passing

* Copyright

* Requirements and readme

* Make a full link for the documentation

* Address review comments

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add symlink

* Reorg again

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Adapt title

* Update to new strucutre

* Remove test

* Update READMEs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-12-11 10:07:02 -05:00
Sylvain Gugger 7f9ccffc5b
Use word_ids to get labels in run_ner (#8962)
* Use word_ids to get labels in run_ner

* Add sanity check
2020-12-07 14:26:36 -05:00
Stefan Schweter 19fa01ce2a
token-classification: use is_world_process_zero instead of deprecated is_world_master() (#8828) 2020-11-30 09:21:56 -05:00
Sylvain Gugger 20b658607e
Fix run_ner script (#8664)
* Fix run_ner script

* Pin datasets
2020-11-19 13:59:30 -05:00
Sylvain Gugger dd52804f5f
Remove deprecated (#8604)
* Remove old deprecated arguments

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Remove needless imports

* Fix tests

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-11-17 15:11:29 -05:00
Julien Chaumond 042a6aa777
Tokenizers: ability to load from model subfolder (#8586)
* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-11-17 08:58:45 -05:00
Julien Plu 27b3ff316a
Try to understand and apply Sylvain's comments (#8458) 2020-11-12 13:43:00 -05:00
sarnoult a38d1c7c31
Example NER script predicts on tokenized dataset (#8468)
The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`
2020-11-11 10:28:23 -05:00
Stas Bekman 02bdfc0251
using multi_gpu consistently (#8446)
* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc
2020-11-10 13:23:58 -05:00
Stas Bekman 190df58560
[github CI] add a multi-gpu job for all example tests (#8341)
* add a multi-gpu job for all example tests

* run only ported tests

* rename

* explain why env is re-activated on each step

* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me

* style

* Apply suggestions from code review

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-09 15:47:38 -05:00
Sylvain Gugger 5c766ecb50 Fix typo 2020-11-09 11:50:51 -05:00
Sylvain Gugger 908a28894c
Add new token classification example (#8340)
* Add new token classification example

* Remove txt file

* Add test

* With actual testing done

* Less warmup is better

* Update examples/token-classification/run_ner_new.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Fix test

* Make Lysandre happy

* Last touches and rename

* Rename in tests

* Address review comments

* More run_ner -> run_ner_old

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-11-09 11:39:55 -05:00
Bobby Donchev 52f44dd6d2
change TokenClassificationTask class methods to static methods (#7902)
* change TokenClassificationTask class methods to static methods

Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.

Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.

* Trigger Build

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-11-05 09:38:30 -05:00
Sean Naren 5e24982e58
Upgrade PyTorch Lightning to 1.0.2 (#7852)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-28 14:59:14 -04:00
Stefan Schweter ee9eae4e06
token-classification: update url of GermEval 2014 dataset (#6571) 2020-09-18 06:18:06 -04:00
Julien Plu 6f289dc97a
Fix the TF Trainer gradient accumulation and the TF NER example (#6713)
* Align TF NER example over the PT one

* Fix Dataset call

* Fix gradient accumulation training

* Apply style

* Address Sylvain's comments

* Address Sylvain's comments

* Apply style
2020-08-27 08:45:34 -04:00
Lysandre a75c64d80c Black 20 release 2020-08-26 17:20:22 +02:00
vblagoje dd522da004
Fix PL token classification examples (#6682) 2020-08-24 11:30:06 -04:00
Sylvain Gugger a573777901
Update repo to isort v5 (#6686)
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
2020-08-24 11:03:01 -04:00
Sam Shleifer 84c265ffcc
[lightning_base] fix s2s logging, only make train_loader once (#6404) 2020-08-16 22:49:41 -04:00
vblagoje eda07efaa5
Add POS tagging and Phrase chunking token classification examples (#6457)
* Add more token classification examples

* POS tagging example

* Phrase chunking example

* PR review fixes

* Add conllu to third party list (used in token classification examples)
2020-08-13 12:09:51 -04:00
Stas Bekman 7c6a085ebf
pl version: examples/requirements.txt is single source of truth (#6309) 2020-08-11 10:58:54 -04:00
Sam Shleifer 9a5ef83748
[s2s] fix --gpus clarg collision (#6358) 2020-08-08 21:51:37 -04:00
Stas Bekman 6695450a23
[examples] consistently use --gpus, instead of --n_gpu (#6315) 2020-08-07 10:36:32 -04:00
Julien Plu 54f9fbeff8
Rework TF trainer (#6038)
* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import
2020-07-29 14:32:01 -04:00
Patrick von Platen 4dc65591b5
[Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile (#5395)
* add first version of clm tf

* make style

* add more tests for bert

* update tf clm loss

* fix tests

* correct tf ner script

* add mlm loss

* delete bogus file

* clean tf auto model + add tests

* finish adding clm loss everywhere

* fix training in distilbert

* fix flake8

* save intermediate

* fix tf t5 naming

* remove prints

* finish up

* up

* fix tf gpt2

* fix new test utils import

* fix flake8

* keep backward compatibility

* Update src/transformers/modeling_tf_albert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_electra.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_mobilebert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_distilbert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply sylvains suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 18:15:53 +02:00
Sylvain Gugger 734a28a767
Clean up diffs in Trainer/TFTrainer (#5417)
* Cleanup and unify Trainer/TFTrainer

* Forgot to adapt TFTrainingArgs

* In tf scripts n_gpu -> n_replicas

* Update src/transformers/training_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Formatting

* Fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-01 11:00:20 -04:00
Hong Xu 501040fd30
In the run_ner.py example, give the optional label arg a default value (#5326)
Otherwise, if label is not specified, the following error occurs:

	Traceback (most recent call last):
	  File "run_ner.py", line 303, in <module>
	    main()
	  File "run_ner.py", line 101, in main
	    model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
	  File "/home/user/anaconda3/envs/bert/lib/python3.7/site-packages/transformers/hf_argparser.py", line 159, in parse_json_file
	    obj = dtype(**inputs)
	TypeError: __init__() missing 1 required positional argument: 'labels'
2020-06-30 19:45:35 -04:00
Stefan Schweter d812e6d76e
NER: fix construction of input examples for RoBERTa (#4943)
* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model
2020-06-15 08:30:40 -04:00
Stefan Schweter 2a4b9e09c0
NER: Add new WNUT’17 example (#4681)
* ner: add preprocessing script for examples that splits longer sentences

* ner: example shell scripts use local preprocessing now

* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results

* ner: satisfy black and isort
2020-06-04 19:13:17 -04:00
Julien Chaumond d4c2cb402d
Kill model archive maps (#4636)
* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI
2020-06-02 09:39:33 -04:00
Lysandre Debut 6a17688021
per_device instead of per_gpu/error thrown when argument unknown (#4618)
* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-27 11:36:55 -04:00
Julien Chaumond 5e7fe8b585
Distributed eval: SequentialDistributedSampler + gather all results (#4243)
* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii
2020-05-18 22:02:39 -04:00
Julien Chaumond c547f15a17 Use Filelock to ensure distributed barriers
see context in https://github.com/huggingface/transformers/pull/4223
2020-05-14 11:58:32 -04:00
Julien Chaumond 241759101e
(v2) Improvements to the wandb integration (#4324)
* Improvements to the wandb integration

* small reorg + no global necessary

* feat(trainer): log epoch and final metrics

* Simplify logging a bit

* Fixup

* Fix crash when just running eval

Co-authored-by: Chris Van Pelt <vanpelt@gmail.com>
Co-authored-by: Boris Dayma <boris.dayma@gmail.com>
2020-05-12 21:52:01 -04:00
Stefan Schweter 3f42eb979f
Documentation: fix links to NER examples (#4279)
* docs: fix link to token classification (NER) example

* examples: fix links to NER scripts
2020-05-11 12:48:21 -04:00
Julien Chaumond 7b75aa9fa5
[TPU] Doc, fix xla_spawn.py, only preprocess dataset once (#4223)
* [TPU] Doc, fix xla_spawn.py, only preprocess dataset once

* Update examples/README.md

* [xla_spawn] Add `_mp_fn` to other Trainer scripts

* [TPU] Fix: eval dataloader was None
2020-05-08 14:10:05 -04:00
Julien Chaumond 0ae96ff8a7 BIG Reorganize examples (#4213)
* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around
2020-05-07 13:48:44 -04:00