transformers

Commit Graph

Author	SHA1	Message	Date
Patrick von Platen	9bc9e59869	[Flax generate] Add params to generate (#12171 ) * fix_torch_device_generate_test * remove @ * add params as input * finish	2021-06-15 11:50:12 +01:00
Sylvain Gugger	a55dc157e3	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
Stas Bekman	040283170c	consistent nn. and nn.functional: part 5 docs (#12161 )	2021-06-14 13:34:32 -07:00
Stas Bekman	88e84186e5	[style] consistent nn. and nn.functional: part 4 `examples` (#12156 ) * consistent nn. and nn.functional: p4 examples * restore	2021-06-14 12:28:24 -07:00
Stas Bekman	372ab9cd6d	[style] consistent nn. and nn.functional: part 3 `tests` (#12155 ) * consistent nn. and nn.functional: p3 templates * restore	2021-06-14 12:18:22 -07:00
Vasudev Gupta	d9c0d08f9a	Flax Big Bird (#11967 ) * add flax bert * bert -> bigbird * original_full ported * add debugger * init block sparse * fix copies ; gelu_fast -> gelu_new * block sparse port * fix block sparse * block sparse working * all ckpts working * fix-copies * make quality * init tests * temporary fix for FlaxBigBirdForMultipleChoice * skip test_attention_outputs * fix * gelu_fast -> gelu_new ; fix multiple choice model * remove nsp * fix sequence classifier * fix * make quality * make fix-copies * finish * Delete debugger.ipynb * Update src/transformers/models/big_bird/modeling_flax_big_bird.py * make style * finish * bye bye jit flax tests Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-06-14 20:01:03 +01:00
Stas Bekman	a156da9a23	consistent nn. and nn.functional: p2 templates (#12153 )	2021-06-14 11:41:24 -07:00
Patrick von Platen	007be9e402	[Flax] Fix flax pt equivalence tests (#12154 ) * fix_torch_device_generate_test * remove @ * upload	2021-06-14 19:19:10 +01:00
Will Rice	d438eee030	Adding TFWav2Vec2Model (#11617 ) * [WIP] Add TFWav2Vec2Model Work in progress for adding a tensorflow version of Wav2Vec2 * feedback changes * small fix * Test Feedback Round 1 * Add SpecAugment and CTC Loss * correct spec augment mask creation * docstring and correct copyright * correct bugs * remove bogus file * finish tests correction * del unnecessary layers * Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * correct final bug * Feedback Changes Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-06-14 18:58:54 +01:00
Stas Bekman	1ed2ebf60d	[style] consistent nn. and nn.functional (#12124 ) * consistent nn. and nn.functional * fix glitch * fix glitch #2	2021-06-14 09:44:28 -07:00
Stas Bekman	ff7c81687a	[optim] implement AdafactorSchedule (#12123 ) * implement AdafactorSchedule * typo * fix * Update src/transformers/optimization.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-14 09:43:48 -07:00
Suraj Patil	fe3576488a	fix error message (#12148 )	2021-06-14 14:12:18 +01:00
Kumar Abhishek	9de62cfbce	[lm examples] Replicate --config_overrides addition to other LM examples (#12135 ) * [lm examples] Replicate --config_overrides addition to other LM examples * Removing no trainer files changes * Update README Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>	2021-06-14 08:12:22 -04:00
Nicholas Broad	cd7961b632	Use text_column_name variable instead of "text" (#12132 ) * Use text_column_name variable instead of "text" `text_column_name` was already defined above where I made the changes and it was also used below where I made changes. This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway. * black formatting * make style Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>	2021-06-14 08:11:13 -04:00
Sylvain Gugger	b8ab541340	Don't log anything before logging is setup in examples (#12121 ) * Don't log anything before logging is setup in examples * Last example	2021-06-14 08:03:33 -04:00
Patrick von Platen	7566fefa69	[Flax] Add links to google colabs (#12146 ) * fix_torch_device_generate_test * remove @ * add colab links	2021-06-14 11:00:29 +01:00
SaulLu	476ba679dd	Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810 ) * feature for tokenizer without slow/legacy version * format * modify common test * add tests * add PreTrainedTokenizerFast to AutoTokenizer * format * change tokenizer common test in order to be able to run test without a slow version * update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class` * add autokenizer test * replace `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None` * remove obsolete change in comment * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/tokenization_utils_fast.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * change `get_main_tokenizer` into `get_tokenizers` * clarify `get_tokenizers` method * homogenize with `test_slow_tokenizer` and `test_rust_tokenizer` * add `test_rust_tokenizer = False` to tokenizer which don't define a fast version * `test_rust_tokenizer = False` for BertJapaneseTokenizer * `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-14 11:58:44 +02:00
Daniel Stancl	4a51b1dd9b	FlaxBart (#11537 ) * Start working on FlaxBart * Create modeling_flax_bart.py * Write FlaxBartAttention * Add FlaxBartEncoderLayer * Add FlaxBartDecoderLayer and some typing * Add helepr function for FlaxBart * shift_tokens_right * _make_causal_mask * _expand_mask * Add PositionalEmbedding and fix init_std naming * Add FlaxBartPretrainedModel * Add FlaxBartEncoder * Add FlaxBartEncoder * Add FlaxBartEncoder among modules to be imported * YET WE CANNOT INITIALIZE THAT!! :( * Make BartEncoder working Change BartEncoder to instance of nn.Module so far * Add FlaxBartDecoder * Add FlaxBartModel * TODO to make model run -> Prepapre model inputs * Resolve padding * Add FlaxBartModel * Add FlaxBartModel into importable modules * Remove FlaxBartEncoder and FlaxBartDecoder from importable modules * make style; not properly working * make style; make quality not pass due to some import I left * Remove TODO for padding_idx in nn.Embed so far * Add FlaxBartForConditionalGeneration * Incorporate Flax model output classes, i.e. return_dict * Add another models and incorporate use_cache arg * Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering * Incorporate use_cache arg from PyTorch implementation * Add all necessary Flax output utils * Add FlaxBartForCausalLM; not working yet' * Add minor improvements; still lacks some functionality * Update docs, src and tests * Add support of FlaxBart to docs/source * Fix some bugs in FlaxBart souce code * Add some neccessary tests for FlaxBart models - jit_compilation not passing * Fix tests and add test_head_masking * Fix tests for @jax.jit computation * Add test_head_masking * Migrate FlaxBart tests from jax.numpy to numpy * Remove FlaxBartForCausalLM * Clean repo * fix bart model weight structure * Fix FlaxBartForSequenceClassification Slicing is not possible to use below jit, therefore, selecting sentence representation from hidden_states must be changed. * Allow FlaxBartForSequenceClassification for testing pt_flax equivalence * Allow testing for FlaxBartForQA for pt_flax equivalence * Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6 * remove past_key_values * remove inputs_mebeds and make input_ids required * add position ids * re-write attention layer * fix dataclass * fix pos embeds and attention output * fix pos embeds * expose encode method * expose decode method * move docstring to top * add cache for causal attn layer * remove head masking for now * s2s greedy search first pass * boom boom * fix typos * fix greedy generate for bart * use encoder, decoder layers instead of num_hidden_layers * handle encoder_outputs * cleanup * simplify decoding * more clean-up * typos * Change header + add {decoder_,}position_ids into 2 models * add BartConfig * fix existing tests * add encode, decode methods * Fix shift_tokens_right for JIT compilation + clarify one condition * fix decode * encoder => encode * simplify generate * add tests for encode and decode * style * add tests for cache * fix equivalence tests * sample generate now works with seq2seq * generation tests * initialize dense layers * docstring and cleanup * quality * remove get/set input_embeddings * address Patricks suggestions * decode for every model, remove encoder_outputs from call * update tests accordingly * decode returns only decoder outputs and logits * fix arguments * doc encode, decode methods * correct base_model_prefix * fix test for seq classif model * fix docs Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2021-06-14 15:16:08 +05:30
Suraj Patil	d36fce8237	add readme for flax clm (#12111 ) * add readme for flax clm * use section link for tokenizer * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update metrics Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-06-14 15:03:55 +05:30
Patrick von Platen	16c0efca2c	Add mlm pretraining xla torch readme (#12011 ) * fix_torch_device_generate_test * remove @ * upload * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Update examples/flax/language-modeling/README.md * add more info * finish * fix Co-authored-by: Patrick von Platen <patrick@huggingface.co>	2021-06-14 10:31:21 +01:00
Guido Novati	ecd6efe7cb	Fix megatron_gpt2 attention block's causal mask (#12007 ) * Fix megatron_gpt2 attention block's causal mask. * compatibility with checkpoints created with recent versions of Megatron-LM * added integration test for the released Megatron-GPT2 model * code style changes * added option to megatron conversion script to read from config file Co-authored-by: Guido Novati <gnovati@nvidia.com>	2021-06-14 04:57:55 -04:00
Jonathan Chang	783b0dd589	Fix t5 error message (#12136 ) * Fix t5 error message * Fix again	2021-06-13 12:02:57 +01:00
Lysandre Debut	3b1f5caff2	Add from_pretrained to dummy timm objects (#12097 ) * Add from_pretrained to dummy timm * Fix at the source * Update utils/check_dummies.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Missing pretrained dummies * Style Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-11 12:27:10 -04:00
Suraj Patil	15b498f3b8	Flax CLM script (#12023 ) * first draft * max_seq_length => block_size * fix arg names * fix typos * fix loss calculation * add max examples, fix train eval steps, metrics * optimizer mask * fix perpelexity, metric logging * fix logging * data_collator = > data_loader * refactor loss_fn * support single GPU * pass distributed to write_metric * fix jitting * fix single device training * fix single device metrics * close inner progress bars once finished * add overwrite_cache arg * ifx dataset caching issue * add more logs * few small fixes, * address nicholas suggestions * fix docstr * address patricks suggestions * make flake happy * pass new new_dropout_rng to apply_gradients * reset train metrics after every epoc * remove distributed logis, small fixes	2021-06-11 15:16:20 +05:30
Patrick von Platen	e47765d884	Fix head masking generate tests (#12110 ) * fix_torch_device_generate_test * remove @ * fix tests	2021-06-11 04:04:07 -04:00
Bhavitvya Malik	d2753dcbec	add relevant description to tqdm in examples (#11927 ) * add relevant `desc` in examples * require_version datasets>=1.8.0	2021-06-10 15:59:55 -04:00
Jayendra	9a9314f6d9	Flax VisionTransformer (#11951 ) * adding vit for flax * added test for Flax-vit and some bug-fixes * overrided methods where variable changes were necessary for flax_vit test * added FlaxViTForImageClassification for test * Update src/transformers/models/vit/modeling_flax_vit.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * made changes suggested in PR * Adding jax-vit models for autoimport * swapping num_channels and height,width dimension * fixing the docstring for torch-like inputs for VIT * add model to main init * add docs * doc, fix-copies * docstrings * small test fixes * fix docs * fix docstr * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style Co-authored-by: jayendra <jayendra@infocusp.in> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-06-10 21:17:13 +05:30
Daniel Stancl	0eaeae2e36	Fix a condition in test_generate_with_head_masking (#11911 ) * Fix a condition in test_generate_with_head_masking * Fix usage of head_mask in bigbirg_pegasus * Fix head masking for speech2text * Resolve copy mismatch + drop unwanted print statement * Fix the condition	2021-06-10 15:28:07 +01:00
Matt	bebbdd0fc9	Appending label2id and id2label to models to ensure inference works properly (#12102 )	2021-06-10 15:25:04 +01:00
Matt	4cda08decb	Minor style edits	2021-06-10 15:10:57 +01:00
Matt	7f08dbd10a	Update README.md to cover the TF GLUE example.	2021-06-10 14:33:42 +01:00
Sylvain Gugger	d72e5a3a6d	Fix quality	2021-06-10 09:27:11 -04:00
Matt	73a532651a	New TF GLUE example (#12028 ) * Pushing partially-complete new GLUE example * First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready. * Fix to the fit() call * Bugfixes, making sure TPU and multi-GPU support is ready * Remove logger line that depends on Pytorch * Style pass * Deleting old TF GLUE example * Include label2id and id2label in the saved model config * Don't clobber the existing model.config.label2id * Style fixes * Update examples/tensorflow/text-classification/run_glue.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-10 14:14:37 +01:00
Tobias Norlund	9d2cee8b48	CLIPFeatureExtractor should resize images with kept aspect ratio (#11994 ) * Resize with kept aspect ratio * Fixed failed test * Overload center_crop and resize methods instead * resize should handle non-PIL images * update slow test * Tensor => tensor Co-authored-by: patil-suraj <surajp815@gmail.com>	2021-06-10 18:40:41 +05:30
kumapo	472a867626	Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083 ) * Add text_column_name and label_column_name to run_ner args * Minor fix: grouping for text and label column name	2021-06-10 08:03:20 -04:00
Patrick von Platen	bc6f51e539	[Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089 ) * fix_torch_device_generate_test * remove @ * fix tests	2021-06-09 20:41:59 +01:00
Stas Bekman	61e191987d	rm require_version_examples (#12088 )	2021-06-09 11:02:52 -07:00
Suraj Patil	d1500d9151	pass decay_mask fn to optimizer (#12087 )	2021-06-09 18:49:27 +01:00
Anton Lozhkov	d472bd7b18	Wav2Vec2 Pretraining (#11306 ) * Working quantizer forward * Working quantizer forward * Clean up unused model parts, test reproducibility * Working quantizer forward * Clean up unused model parts, test reproducibility * Remove custom outputs from the shared ones * correct conversion * correct bug * add first pretrain script * save intermediate * static shapes * save intermediate * finish first pretrain script version * more refactor * remove wanddb * refactor more * improve test * correct perplexity compute bug * finish model implementation * add to docs * finish docs * finish pretraining script * finish pretraining script * remove wandb * finish PR for merge * finish config * finish * make deepspeed work * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply suggestions * fix flaky test Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-09 18:40:56 +01:00
Stas Bekman	b1a8aa94f0	[test] support more than 2 gpus (#12074 ) * support more than 2 gpus * style	2021-06-09 09:23:47 -07:00
NielsRogge	d3eacbb829	Add DETR (#11653 ) * Squash all commits of modeling_detr_v7 branch into one * Improve docs * Fix tests * Style * Improve docs some more and fix most tests * Fix slow tests of ViT, DeiT and DETR * Improve replacement of batch norm * Restructure timm backbone forward * Make DetrForSegmentation support any timm backbone * Fix name of output * Address most comments by @LysandreJik * Give better names for variables * Conditional imports + timm in setup.py * Address additional comments by @sgugger * Make style, add require_timm and require_vision to testsé * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone * Add png files to fixtures * Fix type hint * Add timm to workflows * Add `BatchNorm2d` to the weight initialization * Fix retain_grad test * Replace model checkpoints by Facebook namespace * Fix name of checkpoint in test * Add user-friendly message when scipy is not available * Address most comments by @patrickvonplaten * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner * Better initialization * Scipy is necessary to get sklearn metrics * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel * Make style * Improve docs and add 2 community notebooks Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-09 11:51:13 -04:00
Stas Bekman	d14e0af274	sync LayerDrop for Wav2Vec2Encoder + tests (#12076 )	2021-06-09 13:21:03 +01:00
Koichi Yasuoka	82a2b76c95	Update run_ner.py with id2label config (#12001 )	2021-06-09 07:27:05 -04:00
Stas Bekman	0e82f0cbc2	typo	2021-06-08 12:55:17 -07:00
Stas Bekman	11d86d3de4	[Deepspeed Wav2vec2] integration (#11638 ) * wip * wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044 * cleanup * workaround * working 5/8 modes * solve fp32 distributed zero3 * style * sync * sync * rework * deprecation * cleanup * https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged * clean up * add a guide * more prose * more prose * fix * more prose * sub_group_size was too big * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * refactor * bug fix * make the true check explicit * new deepspeed release Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-08 12:32:03 -07:00
Stas Bekman	32290d87f6	[Deepspeed] various fixes (#12058 ) * replace deprecated config * sub_group_size was too big * complete deprecation removal	2021-06-08 08:36:15 -07:00
Sylvain Gugger	fd6902838a	Properly indent block_size (#12070 )	2021-06-08 10:27:02 -04:00
cdleong	49bee0aea4	Add torch to requirements.txt in language-modeling (#12040 ) * Add torch to requirements.txt in language-modeling * Update examples/pytorch/language-modeling/requirements.txt Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-08 09:02:35 -04:00
Mario Šaško	f5eec0d8e9	Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027 ) * Replace legacy torch.Tensor constructor with torch.{tensor, empty} * Remove torch.Tensor in examples	2021-06-08 13:58:38 +01:00
Shamane Siri	e33085d648	updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806 ) * updated the original RAG implementation to be compatible with the latest PL version * updated the requirements.txt file * execute make style * code quality test * code quality * conflix resolved in requirement.txt * code quality * changed the MyDDP class name to CustomDDP	2021-06-08 13:42:49 +01:00

... 2 3 4 5 6 ...

7492 Commits All Branches Search

7492 Commits

All Branches