transformers

Commit Graph

Author	SHA1	Message	Date
Yih-Dar	cb19c2afdc	Fix expected loss values in some (m)T5 tests (#18177 ) * fix expected loss values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-18 15:26:21 +02:00
Wang, Yi	7417f3acb7	[HPO] update to sigopt new experiment api (#18147 ) * [HPO] update to sigopt new experiment api * follow https://docs.sigopt.com/experiments Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * [HPO] use new API if sigopt version >= 8.0.0 Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2022-07-18 15:19:40 +02:00
gcheron	8c14b342aa	add ONNX support for LeVit (#18154 ) Co-authored-by: Guilhem Chéron <guilhemc@authentifier.com>	2022-07-18 15:17:07 +02:00
Lysandre Debut	c1c79b0655	NLLB tokenizer (#18126 ) * NLLB tokenizer * Apply suggestions from code review - Thanks Stefan! Co-authored-by: Stefan Schweter <stefan@schweter.it> * Final touches * Style :) * Update docs/source/en/model_doc/nllb.mdx Co-authored-by: Stefan Schweter <stefan@schweter.it> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * PR reviews * Auto models Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-18 08:12:34 -04:00
John Giorgi	a4f97e6ce0	Fix incorrect type hint for lang (#18161 )	2022-07-18 09:53:18 +02:00
John Giorgi	c46d39f390	Fix check for falsey inputs in run_summarization (#18155 )	2022-07-18 09:50:32 +02:00
Nicolas Patry	ccc0897804	Adding support for `device_map` directly in `pipeline(..)` function. (#17902 ) * Adding support for `device_map` directly in `pipeline(..)` function. * Updating the docstring. * Adding a better docstring * Put back type hints. * Blacked. (`make fixup` didn't work ??!!)	2022-07-15 15:54:26 +02:00
Nicolas Patry	fca66ec4ef	Fixing a hard to trigger bug for `text-generation` pipeline. (#18131 ) * Fixing a bug where attention mask was not passed to generate. * Fixing zero-size prompts. * Comment on top.	2022-07-15 15:54:07 +02:00
amyeroberts	8581a798c0	Add TF DeiT implementation (#17806 ) * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Fixup * Fix import * Fix import * Fix import * Fix weight loading for tests whilst not on hub * Add doc tests and remove to_2tuple * Add back to_2tuple Removing to_2tuple results in many downstream changes needed because of the copies checks * Incorporate updates in Improve vision models #17731 PR * Don't hard code num_channels * Copy PyTorch DeiT embeddings and remove pytorch operations with mask * Fix patch embeddings & tidy up * Update PixelShuffle to move logic into class layer * Update doc strings - remove PT references * Use NHWC format in internal layers * Fix up * Use linear activation layer * Remove unused import * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Move dataclass to top of file * Remove from_pt now weights on hub * Fixup Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>	2022-07-13 18:04:08 +01:00
Wei	7ea6ccc2b3	Enable torchdynamo with torch_tensorrt(fx path) (#17765 ) * enable fx2trt * Update perf_train_gpu_one.mdx * Update perf_train_gpu_one.mdx * add lib check * update * format * update * fix import check * fix isort * improve doc * refactor ctx manager * fix isort * black format * isort fix * fix format * update args * update black * cleanups * Update perf_train_gpu_one.mdx * code refactor * code refactor to init * remove redundancy * isort * replace self.args with args Co-authored-by: Stas Bekman <stas@stason.org>	2022-07-13 12:43:28 -04:00
Sylvain Gugger	37aeb5787a	Make sharded checkpoints work in offline mode (#18125 ) * Make sharded checkpoints work in offline mode * Add test	2022-07-13 12:43:08 -04:00
Sylvain Gugger	0a21a48564	Revert "Make sharded checkpoints work in offline mode" This reverts commit `3564c65786`.	2022-07-13 10:53:25 -04:00
Sylvain Gugger	3564c65786	Make sharded checkpoints work in offline mode	2022-07-13 10:51:56 -04:00
lmagne	56e6487c40	add dataset split and config to model-index in TrainingSummary.from_trainer (#18064 ) * added metadata to training summary * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-13 16:07:20 +02:00
John Giorgi	fde22c75a1	Add summarization name mapping for MultiNews (#18117 ) * Add summarization name mapping for MultiNews * Add summarization name mapping for MultiNews	2022-07-13 08:19:20 -04:00
Sebastian Sosa	195133363e	supported python versions reference (#18116 ) * supported python versions reference * Update CONTRIBUTING.md removing commit hash from link Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-13 08:18:44 -04:00
Joao Gante	20509ab0e0	TF: unpack_inputs decorator independent from main_input_name (#18110 )	2022-07-13 10:43:41 +01:00
Joao Gante	fcefa200b2	TF: remove graph mode distinction when processing boolean options (#18102 )	2022-07-12 19:05:31 +01:00
Niklas Muennighoff	bc34c21191	Fix BLOOM dtype (#17995 ) * Add fp16 option * Fix BLOOM dtype * Formatting * Remove torch_dtype arg * Revert formatting * Apply formatting * Add n_embed backward compat	2022-07-12 10:36:08 -04:00
Joao Gante	981714efe1	CLI: reenable `pt_to_tf` test (#18108 )	2022-07-12 13:38:05 +01:00
wei zhao	f5221c06e4	Report value for a step instead of epoch. (#18095 ) * Report value for a step instead of epoch. Report an objective function value for a step instead of epoch to optuna. I made this modification for the following reason: If "eval_steps" is less than steps per epoch, there maybe warnings like this: "optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 0 is already reported.". So "step" are more appropriate than "epoch" here. * MOD: make style. Co-authored-by: zhaowei01 <zhaowei01@yuanfudao.com>	2022-07-12 08:18:35 -04:00
Sijun He	d4ebd4e112	speed up test (#18106 )	2022-07-12 04:28:28 -04:00
jianan-gu	b7d8bd378c	Enhance IPEX integration in Trainer (#18072 ) * enhance ipex import * refine codes * refine style * add link * style Co-authored-by: Stas Bekman <stas@stason.org>	2022-07-11 21:34:09 -07:00
Younes Belkada	a462fc9232	Bloom Optimize operations (#17866 ) * fix tolerance for a bloom slow test * enhance alibi padding - get rid of for loops - deals better with padded batched input - avoid useless cpu/gpu communication when creating alibi Co-authored-by: justheuristic <justheuristic@gmail.com> * optimize attention mask * fix scaled softmax limit values * optimize building alibi tensor Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix attention_mask shape when it's None * minor fixes - fix docstring + arg names * remove colons in docstring * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * apply suggestion * remove unsued arg * refactor a bit - use [:, None] for consistency * refactor attention block Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> * quick fixes * first attempt * refactor attention block and fix all tests except "test_simple_generation" - added comments to better explain attention block * remove debug lines and add TODO comment * change `torch.bmm` to `torch.baddbmm` - fixes `test_simple_generation`but breaks `test_batch_generation_padd` * styling * all tests are passing now - use `bmm` - add explanation for `allow_fp16_reduced_precision_reduction` Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * styling Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix support for accelerate Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove attn softmax in fp32 * refactor comments * refactor a bit - remove warning message - remove print on test * refer to pytorch t5 * change the slow tests - do the tests in fp32 - remove some comments - keep large comments * update expected output for `test_simple_generation` - we now test using fp32 * make style + change comments a bit * fix dtype padd test Co-authored-by: justheuristic <justheuristic@gmail.com> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-11 13:16:13 -04:00
Sylvain Gugger	5ff6f853d7	Mark slow test as such	2022-07-11 12:48:57 -04:00
Sylvain Gugger	b1b8222d80	Add filename to info diaplyed when downloading things in from_pretrained (#18099 )	2022-07-11 12:45:06 -04:00
Sylvain Gugger	6c8017a5c8	Fix image segmentation and object detection pipeline tests (#18100 )	2022-07-11 12:41:56 -04:00
Sylvain Gugger	b0520f594c	Skip failing tests	2022-07-11 10:16:54 -04:00
Duong A. Nguyen	1e8140caad	Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts (#18069 ) * Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts * using np.permutation for creating batch_idx * train_samples_idx -> training_samples_idx * fix type hints	2022-07-11 15:59:08 +02:00
Yih-Dar	ac98a88fbc	Fix torchscript tests for GPT-NeoX (#18012 ) * fix dtype issue in _attn * fix RotaryEmbedding * fix RotaryEmbedding 2 * clean up Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-11 05:02:54 -04:00
Yulv-git	95113d1365	Fix some typos. (#17560 ) * Fix some typos. Signed-off-by: Yulv-git <yulvchi@qq.com> * Fix typo. Signed-off-by: Yulv-git <yulvchi@qq.com> * make fixup.	2022-07-11 05:00:13 -04:00
Stas Bekman	ad28ca291b	[bloom] fix alibi device placement (#18087 )	2022-07-10 09:11:46 -07:00
neverix	8b332a6a16	Make predict() close progress bars after finishing (#17952 ) (#18078 ) * Make Trainer.predict call on_evaluate (#17952) * Add on_predict * Small fix * Small and different fix * Add tests	2022-07-08 16:44:24 -04:00
Sylvain Gugger	7c046c5c22	Update localized READMES when template is filled. (#18062 )	2022-07-08 11:08:52 -04:00
BOSEOP KIM	94ca7d2faa	Fix type issue in using bucketing with Trainer (#18051 ) * Fix type issue in using bucketing with Trainer - Fix type issues in LengthGrouperSampler, DistributedLengthGroupedSampler refs: #18003 * Change logging type in LengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Change logging type in DistributedLengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove adundant clause in LengthGroupedSampler - Use `elif` Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove adundant clause in DistributedLengthGroupedSampler - Use `elif` Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply black, isort to modified codes in the script Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-08 11:06:00 -04:00
Sylvain Gugger	9bd3968509	Fix slow CI by pinning resampy (#18077 ) * Fix slow CI by pinning resampy * Actually put it in the speech dependencies	2022-07-08 10:51:24 -04:00
Matt	de46cde14b	Drop columns after loading samples in prepare_tf_dataset (#17967 ) * Drop columns after loading samples, rather than before, to avoid breaking transforms * make fixup * Add workaround so this PR can work with current datasets version	2022-07-07 18:02:22 +01:00
Patrick von Platen	2544c1434f	[Generate Tests] Make sure no tokens are force-generated (#18053 )	2022-07-07 15:08:34 +02:00
varshith	91c4a3ab1a	Added Command for windows VENV activation in installation docs (#18008 ) * Added command for windows VENV activation * changed linux and macos specification	2022-07-07 08:18:44 -04:00
Sylvain Gugger	1b749a7f8d	Sort doc toc (#18034 ) * Add script to sort doc ToC * Style and fixes * Add check to quality job	2022-07-07 08:17:58 -04:00
Sylvain Gugger	1b5ea74783	Place inputs on device when include_inputs_for_metrics is True (#18046 )	2022-07-07 08:17:49 -04:00
Sylvain Gugger	870ff9e1da	Skip failing test until @gante fix it.	2022-07-06 15:13:28 -04:00
Sylvain Gugger	2e90c3df8f	Doc to dataset (#18037 ) * Link to the Datasets doc * Remove unwanted file	2022-07-06 12:10:06 -04:00
Matt	be79cd7d8e	Protect `TFGenerationMixin.seed_generator` so it's not created at import (#18044 )	2022-07-06 16:36:28 +01:00
Joao Gante	360719a6a4	TF: GPT-J compatible with XLA generation (#17986 )	2022-07-06 15:02:07 +01:00
ADAning	bf37e5c7f6	Fix T5 incorrect weight decay in Trainer and official summarization example (#18002 ) * Add ALL_LAYERNORM_LAYERS for LayerNorm * fix bug of appending layer norm	2022-07-06 09:44:19 -04:00
NielsRogge	22edb68d49	Squash commits (#17981 ) Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-07-06 08:11:48 -04:00
Yih-Dar	f681437203	Enable Past CI (#17919 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-05 18:08:36 +02:00
Matt	5ae087cf8e	Fix T5/mT5 tests (#18029 )	2022-07-05 16:22:03 +01:00
Sanchit Gandhi	ec07eccc7d	[Flax] Bump to v0.4.1 (#17966 )	2022-07-05 15:17:17 +01:00

... 3 4 5 6 7 ...

10417 Commits All Branches Search

10417 Commits

All Branches