transformers

Commit Graph

Author	SHA1	Message	Date
Lysandre Debut	d3eb7d23a4	Pipeline doc (#3055 ) * Pipeline doc initial commit * pipeline abstraction * Remove modelcard argument from pipeline * Task-specific pipelines can be instantiated with no model or tokenizer * All pipelines doc	2020-03-02 14:07:10 -05:00
Manuel Romero	2c7749784c	Update README.md - Add example of usage - Update metrics	2020-03-02 13:35:34 -05:00
Julien Chaumond	0e56b37e80	rm bogus file cc @patrickvonplaten	2020-03-02 12:27:12 -05:00
Patrick von Platen	2fdc7f6ce8	correct greedy generation when doing beam search (#3078 ) * correct greedy generation when doing beam search * improve comment	2020-03-02 12:00:09 -05:00
Julien Chaumond	13afb71208	[ci] Ensure that TF does not preempt all GPU memory for itself see https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth Co-Authored-By: Funtowicz Morgan <mfuntowicz@users.noreply.github.com> Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2020-03-02 11:56:45 -05:00
Patrick von Platen	c0135194eb	Force pad_token_id to be set before padding for standard tokenizer (#3035 ) * force pad_token_id to be set before padding * fix tests and forbid padding without having a padding_token_id set	2020-03-02 10:53:55 -05:00
Sam Shleifer	b54ef78d0c	Bart-CNN (#3059 ) `generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.	2020-03-02 10:35:53 -05:00
Victor SANH	6b1ff25084	fix n_gpu count when no_cuda flag is activated (#3077 ) * fix n_gpu count when no_cuda flag is activated * someone was left behind	2020-03-02 10:20:21 -05:00
Julien Chaumond	298bed16a8	make style	2020-03-01 14:08:01 -05:00
VictorSanh	852e032ca6	include roberta in run_squad_w_distillation - cc @graviraja	2020-03-01 01:56:50 +00:00
VictorSanh	b5509abb36	--do_lower_case will always trick me...	2020-03-01 01:39:24 +00:00
Julien Chaumond	d6ef587a10	[ci] Fixup `e36bd94345`	2020-02-28 23:19:17 -05:00
Julien Chaumond	e36bd94345	[ci] Run all tests on (self-hosted) GPU (#3020 ) * Create self-hosted.yml * Update self-hosted.yml * Update self-hosted.yml * Update self-hosted.yml * Update self-hosted.yml * Update self-hosted.yml * do not run slow tests, for now * [ci] For comparison with circleci, let's also run CPU-tests * [ci] reorganize * clearer filenames * [ci] Final tweaks before merging * rm slow tests on circle ci * Trigger CI * On GPU this concurrency was way too high	2020-02-28 21:11:08 -05:00
srush	908fa43b54	Changes to NER examples for PLT and TPU (#3053 ) * changes to allow for tpu training * black * tpu * tpu	2020-02-27 16:45:32 -05:00
Lysandre Debut	8bcb37bfb8	NER support for Albert in run_ner.py and NerPipeline (#2983 ) * * Added support for Albert when fine-tuning for NER * Added support for Albert in NER pipeline * Added command-line options to examples/ner/run_ner.py to better control tokenization * Added class AlbertForTokenClassification * Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens * Added , * Now passes style guide enforcement * Changes from reviews. * Code now passes style enforcement * Added test for AlbertForTokenClassification * Added test for AlbertForTokenClassification	2020-02-27 10:22:55 -05:00
Sam Shleifer	6a37588041	spelling: strictly (#3042 )	2020-02-27 10:22:35 -05:00
Cola	f4ff44a6d9	Fix batch_encode_plus (#3041 )	2020-02-27 09:56:47 -05:00
Martin Malmsten	f71157529e	Added test for AlbertForTokenClassification	2020-02-27 12:24:20 +01:00
Martin Malmsten	aceb6a0907	Added test for AlbertForTokenClassification	2020-02-27 11:52:46 +01:00
Martin Malmsten	d762d4289c	Code now passes style enforcement	2020-02-26 23:50:40 +01:00
Martin Malmsten	9495d38b0d	Changes from reviews.	2020-02-26 23:36:39 +01:00
Julien Chaumond	b370cc7e99	[gpu] Fixup `fdd61b1992`	2020-02-26 21:48:49 +00:00
Julien Chaumond	f5516805c2	Fix bart slow test	2020-02-26 20:47:49 +00:00
Andrew Walker	5bc99e7f33	fix several typos in Distil* readme (#3034 )	2020-02-26 12:39:54 -05:00
Patrick von Platen	fdd61b1992	Fix attn mask gpt2 when using past (#3033 ) * fix issue and add some tests * fix issue and add some tests * updated doc string gpt2	2020-02-26 12:04:37 -05:00
Julien Chaumond	9cda3620b6	Fix (non-slow) tests on GPU (torch) (#3024 ) * Fix tests on GPU (torch) * Fix bart slow tests Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-02-26 11:59:25 -05:00
Sam Shleifer	9df74b8bc4	Delete all mentions of Model2Model (#3019 )	2020-02-26 11:36:27 -05:00
Lysandre Debut	bb7c468520	Documentation (#2989 ) * All Tokenizers BertTokenizer + few fixes RobertaTokenizer OpenAIGPTTokenizer + Fixes GPT2Tokenizer + fixes TransfoXLTokenizer Correct rst for TransformerXL XLMTokenizer + fixes XLNet Tokenizer + Style DistilBERT + Fix XLNet RST CTRLTokenizer CamemBERT Tokenizer FlaubertTokenizer XLMRobertaTokenizer cleanup * cleanup	2020-02-25 18:43:36 -05:00
Patrick von Platen	c913eb9c38	Add integration tests for xlm roberta modelling and xlm roberta tokenzier (#3014 ) * add first files * add xlm roberta integration tests * make style * flake 8 issues solved	2020-02-25 16:51:25 -05:00
srush	e8ce63ff21	Change masking to direct labeling for TPU support. (#2982 ) * change masking to direct labelings * fix black * switch to ignore index * . * fix black	2020-02-25 14:47:43 -05:00
Jhuo IH	7a7ee28cb9	missing ner link (#2967 )	2020-02-25 14:06:57 -05:00
Lysandre Debut	65e7c90a77	Adding usage examples for common tasks (#2850 ) * Usage: Sequence Classification & Question Answering * Pipeline example * Language modeling * TensorFlow code for Sequence classification * Custom TF/PT toggler in docs * QA + LM for TensorFlow * Finish Usage for both PyTorch and TensorFlow * Addressing Julien's comments * More assertive * cleanup * Favicon - added favicon option in conf.py along with the favicon image - udpated 🤗 logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth) Co-authored-by: joshchagani <joshua@joshuachagani.com>	2020-02-25 13:48:24 -05:00
Julien Chaumond	e693cd1e87	[ci] Run slow tests every day	2020-02-24 19:54:47 -05:00
Julien Chaumond	4fc63151af	[ci] Attempt to fix #2844	2020-02-24 19:51:34 -05:00
Lysandre Debut	b90745c590	Test correct tokenizers after default switch (#3003 )	2020-02-24 18:45:53 -05:00
Lysandre Debut	3716c3d8af	False by default (#3002 )	2020-02-24 18:30:57 -05:00
Lysandre	f9ec5ca90b	Release: v2.5.1	2020-02-24 18:22:54 -05:00
Funtowicz Morgan	4cd9c0971c	Fix for fast tokenizers save_pretrained compatibility with Python. (#2933 ) * Renamed file generate by tokenizers when calling save_pretrained to match python. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added save_vocabulary tests. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Remove python quick and dirty fix for clean Rust impl. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.5.1 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added some save_pretrained / from_pretrained unittests. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Update tokenizers to 0.5.2 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Quality and format. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * flake8 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Making sure there is really a bug in unittest * Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-02-24 18:20:42 -05:00
Sandro Cavallari	ee60840ee6	fix _update_memory fn call in transformer-xl (#2971 )	2020-02-24 17:50:24 -05:00
Patrick von Platen	6a50d501ec	add explaining example to XLNet LM modeling (#2997 ) * add explaining example to XLNet LM modeling * improve docstring for xlnet	2020-02-24 15:42:38 -05:00
Patrick von Platen	65d74c4965	Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987 ) * add preprocessing to add space before punctuation for transfo_xl * improve warning messages * make style * compile regex at instantination of tokenizer object	2020-02-24 15:11:10 -05:00
Bram Vanroy	a143d9479e	Add local_files_only parameter to pretrained items (#2930 ) * Add disable_outgoing to pretrained items Setting disable_outgoing=True disables outgonig traffic: - etags are not looked up - models are not downloaded * parameter name change * Remove forgotten print	2020-02-24 14:58:15 -05:00
Manuel Romero	286d1ec746	Create README.md	2020-02-24 14:33:49 -05:00
Lysandre Debut	7984a70ee4	kwargs are passed to both model and configuration in AutoModels (#2998 )	2020-02-24 14:19:39 -05:00
Lysandre Debut	21d8b6a33e	Testing that batch_encode_plus is the same as encode_plus (#2973 ) * Testing that encode_plus and batch_encode_plus behave the same way Spoiler alert: they don't * Testing rest of arguments in batch_encode_plus * Test tensor return in batch_encode_plus * Addressing Sam's comments * flake8 * Simplified with `num_added_tokens`	2020-02-24 12:09:46 -05:00
Patrick von Platen	17c45c39ed	Add slow generate tests for pretrained lm models (#2909 ) * add slow generate lm_model tests * fix conflicts * merge conflicts * fix conflicts * add slow generate lm_model tests * make style * delete unused variable * fix conflicts * fix conflicts * fix conflicts * delete unused variable * fix conflicts * finished hard coded tests	2020-02-24 11:51:57 -05:00
Lysandre Debut	8194df8e0c	Warning on `add_special_tokens` (#2966 ) Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`	2020-02-24 08:42:54 -05:00
Patrick von Platen	38f5fe9e02	add_ctags_to_git_ignore (#2984 )	2020-02-23 16:55:32 -05:00
Martin Malmsten	105dcb4162	Now passes style guide enforcement	2020-02-23 21:47:59 +01:00
Martin Malmsten	33eb8a165d	Added ,	2020-02-23 21:43:31 +01:00

1 2 3 4 5 ...

3332 Commits All Branches Search

3332 Commits

All Branches