transformers

Commit Graph

Author	SHA1	Message	Date
Funtowicz Morgan	db0076a9df	Conversion script to export transformers models to ONNX IR. (#4253 ) * Added generic ONNX conversion script for PyTorch model. * WIP initial TF support. * TensorFlow/Keras ONNX export working. * Print framework version info * Add possibility to check the model is correctly loading on ONNX runtime. * Remove quantization option. * Specify ONNX opset version when exporting. * Formatting. * Remove unused imports. * Make functions more generally reusable from other part of the code. * isort happy. * flake happy * Export only feature-extraction for now * Correctly check inputs order / filter before export. * Removed task variable * Fix invalid args call in load_graph_from_args. * Fix invalid args call in convert. * Fix invalid args call in infer_shapes. * Raise exception and catch in caller function instead of exit. * Add 04-onnx-export.ipynb notebook * More WIP on the notebook * Remove unused imports * Simplify & remove unused constants. * Export with constant_folding in PyTorch * Let's try to put function args in the right order this time ... * Disable external_data_format temporary * ONNX notebook draft ready. * Updated notebooks charts + wording * Correct error while exporting last chart in notebook. * Adressing @LysandreJik comment. * Set ONNX opset to 11 as default value. * Set opset param mandatory * Added ONNX export unittests * Quality. * flake8 happy * Add keras2onnx dependency on extras["tf"] * Pin keras2onnx on github master to v1.6.5 * Second attempt. * Third attempt. * Use the right repo URL this time ... * Do the same for onnxconverter-common * Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2 * Correct commit hash. * Addressing PR review: Optimization are enabled by default. * Addressing PR review: small changes in the notebook * setup.py comment about keras2onnx versioning.	2020-05-14 16:35:52 -04:00
Julien Chaumond	448c467256	Fix: unpin flake8 and fix cs errors (#4367 ) * Fix: unpin flake8 and fix cs errors * Ok we still need to quote those	2020-05-14 13:14:26 -04:00
Julien Chaumond	015f7812ed	[ci skip] Pin isort	2020-05-14 10:12:18 -04:00
Lysandre	7cb203fae4	Release: v2.9.1	2020-05-13 17:38:50 -04:00
Funtowicz Morgan	7d7fe4997f	Allow BatchEncoding to be initialized empty. (#4316 ) * Allow BatchEncoding to be initialized empty. This is required by recent changes introduced in TF 2.2. * Attempt to unpin Tensorflow to 2.2 with the previous commit.	2020-05-12 15:02:46 -04:00
Lysandre Debut	30e343862f	pin TF to 2.1 (#4297 ) * pin TF to 2.1 * Pin flake8 as well	2020-05-11 21:03:30 -04:00
Julien Plu	94b57bf796	[TF 2.2 compat] use tf.VariableAggregation.ONLY_FIRST_REPLICA (#4283 ) * Fix the issue to properly run the accumulator with TF 2.2 * Apply style * Fix training_args_tf for TF 2.2 * Fix the TF training args when only one GPU is available * Remove the fixed version of TF in setup.py	2020-05-11 11:28:37 -04:00
Lysandre	2e57824374	Pin isort and tf <= 2.1.0	2020-05-07 14:42:00 -04:00
Lysandre	e7cfc1a313	Release: v2.9.0	2020-05-07 14:15:20 -04:00
Lysandre Debut	79b1c6966b	Pytorch 1.5.0 (#3973 ) * Standard deviation can no longer be set to 0 * Remove torch pinned version * 9th instead of 10th, silly me	2020-05-05 10:23:01 -04:00
Sam Shleifer	18db92dd9a	[testing] add timeout_decorator (#3543 )	2020-05-01 09:05:47 -04:00
Julien Chaumond	97a375484c	rm boto3 dependency	2020-04-27 11:17:14 -04:00
Anthony MOI	13dd2acca4	Bump tokenizers version to final 0.7.0 (#3898 )	2020-04-22 11:02:29 -04:00
Julien Chaumond	eb5601b0a5	[ci] Pin torch version while we update	2020-04-21 15:46:18 -04:00
Thomas Wolf	827d6d6ef0	Cleanup fast tokenizers integration (#3706 ) * First pass on utility classes and python tokenizers * finishing cleanup pass * style and quality * Fix tests * Updating following @mfuntowicz comment * style and quality * Fix Roberta * fix batch_size/seq_length inBatchEncoding * add alignement methods + tests * Fix OpenAI and Transfo-XL tokenizers * adding trim_offsets=True default for GPT2 et RoBERTa * style and quality * fix tests * add_prefix_space in roberta * bump up tokenizers to rc7 * style * unfortunately tensorfow does like these - removing shape/seq_len for now * Update src/transformers/tokenization_utils.py Co-Authored-By: Stefan Schweter <stefan@schweter.it> * Adding doc and docstrings * making flake8 happy Co-authored-by: Stefan Schweter <stefan@schweter.it>	2020-04-18 13:43:57 +02:00
Anthony MOI	b7cf9f43d2	Update tokenizers to 0.7.0-rc5 (#3705 )	2020-04-10 14:23:49 -04:00
Funtowicz Morgan	96ab75b8dd	Tokenizers v3.0.0 (#3185 ) * Renamed num_added_tokens to num_special_tokens_to_add Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Cherry-Pick: Partially fix space only input without special tokens added to the output #3091 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Make fast tokenizers unittests work on Windows. * Entirely refactored unittest for tokenizers fast. * Remove ABC class for CommonFastTokenizerTest * Added embeded_special_tokens tests from allenai @dirkgr * Make embeded_special_tokens tests from allenai more generic * Uniformize vocab_size as a property for both Fast and normal tokenizers * Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin) * Ensure providing None input raise the same ValueError than Python tokenizer + tests. * Fix invalid input for assert_padding when testing batch_encode_plus * Move add_special_tokens from constructor to tokenize/encode/[batch_]encode_plus methods parameter. * Ensure tokenize() correctly forward add_special_tokens to rust. * Adding None checking on top on encode / encode_batch for TransfoXLTokenizerFast. Avoid stripping on None values. * unittests ensure tokenize() also throws a ValueError if provided None * Added add_special_tokens unittest for all supported models. * Style * Make sure TransfoXL test run only if PyTorch is provided. * Split up tokenizers tests for each model type. * Fix invalid unittest with new tokenizers API. * Filter out Roberta openai detector models from unittests. * Introduce BatchEncoding on fast tokenizers path. This new structure exposes all the mappings retrieved from Rust. It also keeps the current behavior with model forward. * Introduce BatchEncoding on slow tokenizers path. Backward compatibility. * Improve error message on BatchEncoding for slow path * Make add_prefix_space True by default on Roberta fast to match Python in majority of cases. * Style and format. * Added typing on all methods for PretrainedTokenizerFast * Style and format * Added path for feeding pretokenized (List[str]) input to PretrainedTokenizerFast. * Style and format * encode_plus now supports pretokenized inputs. * Remove user warning about add_special_tokens when working on pretokenized inputs. * Always go through the post processor. * Added support for pretokenized input pairs on encode_plus * Added is_pretokenized flag on encode_plus for clarity and improved error message on input TypeError. * Added pretokenized inputs support on batch_encode_plus * Update BatchEncoding methods name to match Encoding. * Bump setup.py tokenizers dependency to 0.7.0rc1 * Remove unused parameters in BertTokenizerFast * Make sure Roberta returns token_type_ids for unittests. * Added missing typings * Update add_tokens prototype to match tokenizers side and allow AddedToken * Bumping tokenizers to 0.7.0rc2 * Added documentation for BatchEncoding * Added (unused) is_pretokenized parameter on PreTrainedTokenizer encode_plus/batch_encode_plus methods. * Added higher-level typing for tokenize / encode_plus / batch_encode_plus. * Fix unittests failing because add_special_tokens was defined as a constructor parameter on Rust Tokenizers. * Fix text-classification pipeline using the wrong tokenizer * Make pipelines works with BatchEncoding * Turn off add_special_tokens on tokenize by default. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Remove add_prefix_space from tokenize call in unittest. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Style and quality Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Correct message for batch_encode_plus none input exception. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix invalid list comprehension for offset_mapping overriding content every iteration. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * TransfoXL uses Strip normalizer. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.7.0rc3 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Support AddedTokens for special_tokens and use left stripping on mask for Roberta. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * SpecilaTokenMixin can use slots to faster access to underlying attributes. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Remove update_special_tokens from fast tokenizers. * Ensure TransfoXL unittests are run only when torch is available. * Style. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Style * Style 🙏🙏 * Remove slots on SpecialTokensMixin, need deep dive into pickle protocol. * Remove Roberta warning on __init__. * Move documentation to Google style. Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>	2020-04-07 00:29:15 +02:00
LysandreJik	ea6dba2787	Re-pin isort	2020-04-06 10:09:54 -04:00
LysandreJik	11c3257a18	unpin isort for pypi	2020-04-06 10:06:41 -04:00
LysandreJik	36bffc81b3	Release: v2.8.0	2020-04-06 10:03:53 -04:00
LysandreJik	eff757f2e3	Re-pin isort version	2020-03-30 09:00:47 -04:00
LysandreJik	a009d751c2	Un-pin isort for v2.7.0 pypi	2020-03-30 08:55:10 -04:00
LysandreJik	6f5a12a583	Release: v2.7.0	2020-03-30 08:49:24 -04:00
Patrick von Platen	b4fb94fe6d	revert unpin isort commit	2020-03-26 13:19:18 -04:00
Julien Chaumond	83272a3853	Experiment w/ dataclasses (including Py36) (#3423 ) * [ci] Also run test_examples in py37 (will revert at the end of the experiment) * InputExample: use immutable dataclass * [deps] Install dataclasses for Py<3.7 * [skip ci] Revert "[ci] Also run test_examples in py37" This reverts commit `d29afd9959`.	2020-03-25 11:10:20 -04:00
LysandreJik	fbc5bf10cf	v2.6.0 release: isort un-pinned	2020-03-24 11:52:02 -04:00
LysandreJik	471cce24b3	Release: v2.6.0	2020-03-24 10:37:32 -04:00
Julien Chaumond	ec6766a363	[deps] scikit-learn's transient issue was fixed	2020-03-23 18:38:09 -04:00
LysandreJik	e52482909b	Correct order for dev/quality dependencies cc @julien-c	2020-03-23 12:01:23 -04:00
Julien Chaumond	18eec3a984	[ci] simpler way to load correct version of isort hat/tip @bramvanroy	2020-03-23 10:03:22 -04:00
Bram Vanroy	115abd2166	Handle pinned version of isort The CONTRIBUTING file pins to a specific version of isort, so we might as well install that in `dev` . This makes it easier for contributors so they don't have to manually install the specific commit.	2020-03-20 18:00:04 -04:00
Thomas Wolf	2187c49f5c	CPU/GPU memory benchmarking utilities - Remove support for python 3.5 (now only 3.6+) (#3186 ) * memory benchmark rss * have both forward pass and line-by-line mem tracing * cleaned up tracing * refactored and cleaning up API * no f-strings yet... * add GPU mem logging * fix GPU memory monitoring * style and quality * clean up and doc * update with comments * Switching to python 3.6+ * fix quality	2020-03-17 10:17:11 -04:00
Patrick von Platen	34de670dbe	fix sklearn release circle ci [temporary] (#3123 )	2020-03-04 11:25:23 -05:00
Lysandre	f9ec5ca90b	Release: v2.5.1	2020-02-24 18:22:54 -05:00
Funtowicz Morgan	4cd9c0971c	Fix for fast tokenizers save_pretrained compatibility with Python. (#2933 ) * Renamed file generate by tokenizers when calling save_pretrained to match python. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added save_vocabulary tests. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Remove python quick and dirty fix for clean Rust impl. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.5.1 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added some save_pretrained / from_pretrained unittests. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Update tokenizers to 0.5.2 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Quality and format. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * flake8 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Making sure there is really a bug in unittest * Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-02-24 18:20:42 -05:00
Lysandre	59c23ad9c9	README link + better instructions for release	2020-02-19 11:57:17 -05:00
Lysandre	fb560dcb07	Release: v2.5.0 Welcome Rust Tokenizers	2020-02-19 11:46:19 -05:00
Funtowicz Morgan	3f3fa7f7da	Integrate fast tokenizers library inside transformers (#2674 ) * Implemented fast version of tokenizers Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bumped tokenizers version requirements to latest 0.2.1 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added matching tests Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Matching OpenAI GPT tokenization ! Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Matching GPT2 on tokenizers Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Expose add_prefix_space as constructor parameter for GPT2 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Matching Roberta tokenization ! Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Removed fast implementation of CTRL. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Binding TransformerXL tokenizers to Rust. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Updating tests accordingly. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added tokenizers as top-level modules. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Black & isort. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Rename LookupTable to WordLevel to match Rust side. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Black. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use "fast" suffix instead of "ru" for rust tokenizers implementations. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Introduce tokenize() method on fast tokenizers. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * encode_plus dispatchs to batch_encode_plus Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * batch_encode_plus now dispatchs to encode if there is only one input element. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bind all the encode_plus parameter to the forwarded batch_encode_plus call. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.3.0 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Formatting. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix tokenization_auto with support for new (python, fast) mapping schema. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Give correct fixtures path in test_tokenization_fast.py for the CLI. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Expose max_len_ properties on BertTokenizerFast Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Move max_len_ properties to PreTrainedTokenizerFast and override in specific subclasses. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * _convert_encoding should keep the batch axis tensor if only one sample in the batch. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Add warning message for RobertaTokenizerFast if used for MLM. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added use_fast (bool) parameter on AutoTokenizer.from_pretrained(). This allows to easily enable/disable Rust-based tokenizer instantiation. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Let's tokenizers handle all the truncation and padding stuff. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Allow to provide tokenizer arguments during pipeline creation. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Update test_fill_mask pipeline to not use fast tokenizers. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix too much parameters for convert_encoding. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * When enabling padding, max_length should be set to None. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Avoid returning nested tensors of length 1 when calling encode_plus Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Ensure output is padded when return_tensor is not None. Tensor creation requires the inital list input to be of the exact same size. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Disable transfoxl unittest if pytorch is not available (required to load the model) Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * encode_plus should not remove the leading batch axis if return_tensor is set Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Temporary disable fast tokenizers on QA pipelines. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix formatting issues. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Update tokenizers to 0.4.0 * Update style * Enable truncation + stride unit test on fast tokenizers. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Add unittest ensuring special_tokens set match between Python and Rust. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Ensure special_tokens are correctly set during construction. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Give more warning feedback to the user in case of padding without pad_token. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * quality & format. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added possibility to add a single token as str Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added unittest for add_tokens and add_special_tokens on fast tokenizers. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix rebase mismatch on pipelines qa default model. QA requires cased input while the tokenizers would be uncased. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Using offset mapping relative to the original string + unittest. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: save_vocabulary requires folder and file name Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Simplify import for Bert. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: truncate_and_pad disables padding according to the same heuristic than the one enabling padding. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Remove private member access in tokenize() Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Bump tokenizers dependency to 0.4.2 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * format & quality. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Use named arguments when applicable. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Add Github link to Roberta/GPT2 space issue on masked input. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Move max_len_single_sentence / max_len_sentences_pair to PreTrainedTokenizerFast + tests. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Relax type checking to include tuple and list object. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Document the truncate_and_pad manager behavior. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Raise an exception if return_offsets_mapping is not available with the current tokenizer. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Ensure padding is set on the tokenizers before setting any padding strategy + unittest. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * On pytorch we need to stack tensor to get proper new axis. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Generalize tests to different framework removing hard written return_tensors="..." Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizer dependency for num_special_tokens_to_add Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Overflowing tokens in batch_encode_plus are now stacked over the batch axis. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Improved error message for padding strategy without pad token. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bumping tokenizers dependency to 0.5.0 for release. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Optimizing convert_encoding around 4x improvement. 🚀 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * expose pad_to_max_length in encode_plus to avoid duplicating the parameters in kwargs Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Generate a proper overflow_to_sampling_mapping when return_overflowing_tokens is True. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix unittests for overflow_to_sampling_mapping not being returned as tensor. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Format & quality. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Remove perfect alignment constraint for Roberta (allowing 1% difference max) Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Triggering final CI Co-authored-by: MOI Anthony <xn1t0x@gmail.com>	2020-02-19 11:35:40 -05:00
Morgan Funtowicz	6aa7973aec	Fix circleci cuInit error on Tensorflow >= 2.1.0. Tensorflow 2.1.0 introduce a new dependency model where pip install tensorflow would install tf with GPU support. Before it would just install with CPU support, thus CircleCI is looking for NVidia driver version at initialization of the tensorflow related tests but fails as their is no NVidia Driver running. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-02-10 13:24:37 +01:00
Lysandre	d426b58b9e	Patch: v2.4.1	2020-01-31 14:55:33 -05:00
Lysandre	8036ceb7c5	Update commands for pypi test	2020-01-31 09:48:15 -05:00
Lysandre	6664ea943d	Release: v2.4.0	2020-01-31 09:40:32 -05:00
Julien Chaumond	5004d5af42	[serving] Update dependencies	2020-01-27 19:58:00 -05:00
Brendan Roof	23c6998bf4	Add lower bound to tqdm for tqdm.auto - It appears that `tqdm` only introduced `tqdm.auto` in 4.27. - See https://github.com/tqdm/tqdm/releases/tag/v4.27.0. - Without the lower bound I received the following stack trace in an environment where I already had tqdm installed: ``` File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/__init__.py", line 20, in <module> from .file_utils import (TRANSFORMERS_CACHE, PYTORCH_TRANSFORMERS_CACHE, PYTORCH_PRETRAINED_BERT_CACHE, File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/file_utils.py", line 24, in <module> from tqdm.auto import tqdm ModuleNotFoundError: No module named 'tqdm.auto' ```	2020-01-17 18:29:11 -05:00
alberduris	81d6841b4b	GPU text generation: mMoved the encoded_prompt to correct device	2020-01-06 15:11:12 +01:00
alberduris	dd4df80f0b	Moved the encoded_prompts to correct device	2020-01-06 15:11:12 +01:00
Anthony MOI	bfe870be65	Hotfix tokenizers version for sdist installs	2019-12-27 11:05:52 -05:00
Anthony MOI	1f82a5d910	Update for changes in tokenizers API	2019-12-26 14:37:55 -05:00
Anthony MOI	734d29b03d	tokenizers is now a real dependency	2019-12-24 13:32:41 -05:00
Aymeric Augustin	7a865821d9	Remove stray egg-info directory automatically. If a user or contributor ran `pip install -e .` on transformers < 3.0, pip created a transformers.egg-info directory next to the transformers directory at the root of the repository. In transformers 3.0, the source is in a `src` subdirectory. `pip install -e .` creates a transformers.egg-info directory there. However, pip will still pick transformers.egg-info from the previous location. This is a bug: https://github.com/pypa/pip/issues/5466 Users and contributors are likely to hit this problem because the documentation for transformers 3.0 relies heavily on extra_requires which didn't exist in earlier versions, so aren't defined in a stale transformers.egg-info directory. If such a directory exists, remove it. It's autogenerated, gitignored and not supposed to contain anything of value.	2019-12-23 21:06:23 +01:00
Aymeric Augustin	d73eb552e8	Remove requirements.txt. It's redundant with setup.py and, also, incomplete (e.g. numpy).	2019-12-23 19:15:08 +01:00
Aymeric Augustin	76a1417f2a	Include all optional dependencies in extras. Take advantage of this to simplify the Circle CI configuration. Don't bother with tensorboardX: it's a fallback for PyTorch < 1.1.0.	2019-12-23 19:14:31 +01:00
Aymeric Augustin	f2522869ea	Review and update setup.py.	2019-12-23 18:45:42 +01:00
Aymeric Augustin	1c62e87b34	Use built-in open(). On Python 3, `open is io.open`.	2019-12-22 18:38:56 +01:00
Aymeric Augustin	d6eaf4e6d2	Update comments mentioning Python 2.	2019-12-22 18:38:56 +01:00
Aymeric Augustin	6be7cdda66	Move source code inside a src subdirectory. This prevents transformers from being importable simply because the CWD is the root of the git repository, while not being importable from other directories. That led to inconsistent behavior, especially in examples. Once you fetch this commit, in your dev environment, you must run: $ pip uninstall transformers $ pip install -e .	2019-12-22 14:15:13 +01:00
Aymeric Augustin	158e82e061	Sort imports with isort. This is the result of: $ isort --recursive examples templates transformers utils hubconf.py setup.py	2019-12-22 10:57:46 +01:00
Aymeric Augustin	fa84ae26d6	Reformat source code with black. This is the result of: $ black --line-length 119 examples templates transformers utils hubconf.py setup.py There's a lot of fairly long lines in the project. As a consequence, I'm picking the longest widely accepted line length, 119 characters. This is also Thomas' preference, because it allows for explicit variable names, to make the code easier to understand.	2019-12-21 17:52:29 +01:00
Aymeric Augustin	a4c9338b83	Prevent parallel downloads of the same file with a lock. Since the file is written to the filesystem, a filesystem lock is the way to go here. Add a dependency on the third-party filelock library to get cross-platform functionality.	2019-12-21 08:43:19 +01:00
Lysandre	a436574bfd	Release: v2.3.0	2019-12-20 16:22:20 -05:00
thomwolf	73fcebf7ec	update serving command	2019-12-20 13:47:35 +01:00
thomwolf	407093b3fa	Merge branch 'cli' of https://github.com/huggingface/transformers into cli	2019-12-19 20:26:51 +01:00
thomwolf	c7be096c39	Merge branch 'master' into cli	2019-12-19 20:26:08 +01:00
Morgan Funtowicz	a305067f2d	Removed __main__	2019-12-19 19:41:48 +01:00
Lysandre	5e289f69bc	regex 2019.12.17 install fails with Python 2	2019-12-17 15:54:05 -05:00
Morgan Funtowicz	d7c62661a3	Provide serving dependencies for tensorflow and pytorch (serving-tf, serving-torch)	2019-12-17 11:23:39 +01:00
Lysandre	7bd11dda6f	Release: v2.2.2	2019-12-13 16:45:30 -05:00
thomwolf	72c36b9ea2	[WIP] - CLI	2019-12-10 11:33:14 +01:00
Aymeric Augustin	35401fe50f	Remove dependency on pytest for running tests (#2055 ) * Switch to plain unittest for skipping slow tests. Add a RUN_SLOW environment variable for running them. * Switch to plain unittest for PyTorch dependency. * Switch to plain unittest for TensorFlow dependency. * Avoid leaking open files in the test suite. This prevents spurious warnings when running tests. * Fix unicode warning on Python 2 when running tests. The warning was: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal * Support running PyTorch tests on a GPU. Reverts `27e015bd`. * Tests no longer require pytest. * Make tests pass on cuda	2019-12-06 13:57:38 -05:00
Julien Chaumond	e4fbf3e2cc	CLI for authenticated file sharing	2019-12-04 00:52:23 -05:00
LysandreJik	8101924a68	Patch: v2.2.1	2019-12-03 11:20:26 -05:00
Lysandre	ae98d45991	Release: v2.2.0	2019-11-26 14:12:44 -05:00
Lysandre	3ddce1d74c	Release: 2.1.1	2019-10-11 06:37:49 -04:00
LysandreJik	9c2e0a4acf	Release: 2.1.0	2019-10-09 12:14:03 -04:00
thomwolf	e4e35296fb	update setup.py metadata	2019-09-26 13:52:24 +02:00
thomwolf	9676d1a2a8	update readme and setup.py	2019-09-26 13:47:58 +02:00
thomwolf	31c23bd5ee	[BIG] pytorch-transformers => transformers	2019-09-26 10:15:53 +02:00
thomwolf	9d0a11a68c	update dependencies and circle-ci	2019-09-08 15:02:06 +03:00
thomwolf	89fd3450a6	Release: 1.2.0	2019-09-04 13:32:18 +02:00
Shijie Wu	ca4baf8ca1	Match order of casing in OSS XLM; Improve document; Clean up dependency	2019-08-27 20:03:18 -04:00
Shijie Wu	e85123d398	Add custom tokenizer for zh and ja	2019-08-23 20:27:52 -04:00
Shijie Wu	436ce07218	Tokenization behave the same as original XLM proprocessing for most languages except zh, ja and th; Change API to allow specifying language in `tokenize`	2019-08-23 14:40:17 -04:00
LysandreJik	fe02e45e48	Release: 1.1.0	2019-08-15 11:15:08 -04:00
thomwolf	58830807d1	inidicate we only support pytorch 1.0.0+ now	2019-08-05 14:38:59 +02:00
thomwolf	6b70760204	typos	2019-07-16 21:21:03 +02:00
thomwolf	ed7549bb1a	release version 1.0	2019-07-16 16:10:58 +02:00
thomwolf	eb91f6437e	update readme and setup	2019-07-05 12:30:15 +02:00
thomwolf	0bab55d5d5	[BIG] name change	2019-07-05 11:55:36 +02:00
thomwolf	32da75486b	add tokenizer and tests	2019-06-21 11:09:51 +02:00
thomwolf	b832d5bb8a	Release: 0.6.2	2019-04-25 21:37:47 +02:00
thomwolf	e0855e8929	forgot to add regex to requirements :(	2019-02-18 11:54:51 +01:00
thomwolf	009ee86a19	fix tests - bump up version	2019-02-17 23:57:23 +01:00
thomwolf	321d70a7a9	bump up to 0.5.1	2019-02-13 10:11:20 +01:00
thomwolf	448937c00d	python 2 compatibility	2019-02-06 00:07:46 +01:00
thomwolf	eed51c5bdf	add OpenAI GPT	2019-01-08 12:26:58 +01:00
Patrick Sodré	87c1244c7d	Convert scripts into entry_points The recommended approach to create launch scripts is to use entry_points and console_scripts. xref: https://packaging.python.org/guides/distributing-packages-using-setuptools/#scripts	2018-12-19 02:26:08 +00:00
thomwolf	ae88eb88a4	set encoding to 'utf-8' in calls to open	2018-12-14 13:48:58 +01:00
thomwolf	1cbb32a542	include version number + comment in setup.py	2018-12-13 12:50:44 +01:00
thomwolf	ce52177638	added version in __init__.py	2018-12-13 12:50:44 +01:00
thomwolf	258eb50086	bump up version	2018-11-30 22:55:33 +01:00
thomwolf	ce37b8e481	bump version in setup.py	2018-11-26 10:45:48 +01:00
thomwolf	c8cba67742	clean up readme and examples	2018-11-17 12:19:16 +01:00
thomwolf	a99b971738	bump up version minor	2018-11-17 10:43:39 +01:00
thomwolf	4e46affc34	updating examples	2018-11-17 10:30:54 +01:00
thomwolf	f920eff8c3	update readme	2018-11-17 08:42:45 +01:00
thomwolf	1de35b624b	preparing for first release	2018-11-15 20:56:10 +01:00

... 6 7 8 9 10

456 Commits