transformers

Commit Graph

Author	SHA1	Message	Date
Aaron Jimenez	6b8ec2588e	[docs] Sort es/toctree.yml \| Translate performance.md (#28262 ) * Sort es/_toctree.yml like en/_toctree.yml * Run make style * Add -Rendimiento y escalabilidad- section to es/_toctree.yml * Run make style * Add s to section * Add translate of performance.md * Add performance.md to es/_toctree.yml * Run make styele * Fix docs links * Run make style	2024-01-03 14:35:58 -08:00
Mayfsz	3ea8833676	Translate contributing.md into Chinese (#28243 ) * Translate contributing.md into Chinese * Update review comments	2024-01-03 14:35:02 -08:00
Connor Henderson	d83ff5eeff	Add FastSpeech2Conformer (#23439 ) * start - docs, SpeechT5 copy and rename * add relevant code from FastSpeech2 draft, have tests pass * make it an actual conformer, demo ex. * matching inference with original repo, includes debug code * refactor nn.Sequentials, start more desc. var names * more renaming * more renaming * vocoder scratchwork * matching vocoder outputs * hifigan vocoder conversion script * convert model script, rename some config vars * replace postnet with speecht5's implementation * passing common tests, file cleanup * expand testing, add output hidden states and attention * tokenizer + passing tokenizer tests * variety of updates and tests * g2p_en pckg setup * import structure edits * docstrings and cleanup * repo consistency * deps * small cleanup * forward signature param order * address comments except for masks and labels * address comments on attention_mask and labels * address second round of comments * remove old unneeded line * address comments part 1 * address comments pt 2 * rename auto mapping * fixes for failing tests * address comments part 3 (bart-like, train loss) * make style * pass config where possible * add forward method + tests to WithHifiGan model * make style * address arg passing and generate_speech comments * address Arthur comments * address Arthur comments pt2 * lint changes * Sanchit comment * add g2p-en to doctest deps * move up self.encoder * onnx compatible tensor method * fix is symbolic * fix paper url * move models to espnet org * make style * make fix-copies * update docstring * Arthur comments * update docstring w/ new updates * add model architecture images * header size * md wording update * make style	2024-01-03 18:01:06 +00:00
lain	6eba901d88	fix documentation for zero_shot_object_detection (#28267 ) remove broken space	2024-01-03 09:20:34 -08:00
Dean Wyatte	cad9f5c6cc	Update docs around mixing hf scheduler with deepspeed optimizer (#28223 ) update docs around mixing hf scheduler with deepspeed optimizer	2024-01-02 11:48:17 +00:00
Anindyadeep	74d9d0cebb	Fixing visualization code for object detection to support both types of bounding box. (#27842 ) * fix: minor enhancement and fix in bounding box visualization example The example that was trying to visualize the bounding box was not considering an edge case, where the bounding box can be un-normalized. So using the same set of code, we can not get results with a different dataset with un-normalized bounding box. This commit fixes that. * run make clean * add an additional note on the scenarios where the box viz code works --------- Co-authored-by: Anindyadeep <anindya@pop-os.localdomain>	2023-12-22 13:24:40 +00:00
Yih-Dar	71f460578d	Update `docs/source/en/perf_infer_gpu_one.md` (#28198 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-22 10:40:22 +01:00
Younes Belkada	3a8769f6a9	[`Docs`] Add 4-bit serialization docs (#28182 ) * add 4-bit serialization docs * up * up	2023-12-22 10:18:32 +01:00
Joao Gante	45b70384a7	Generate: fix speculative decoding (#28166 ) Co-authored-by: Merve Noyan <merveenoyan@gmail.com>	2023-12-20 18:55:35 +00:00
Steven Liu	01c081d138	[docs] Trainer docs (#28145 ) * fsdp, debugging, gpu selection * fix hfoption * fix	2023-12-20 10:37:23 -08:00
Sourab Mangrulkar	def581ef51	Fix FA2 integration (#28142 ) * fix fa2 * fix FA2 for popular models * improve warning and add Younes as co-author Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix the warning * Add Tip * typo fix * nit --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-20 14:25:07 +05:30
Aaron Jimenez	38611086d2	[docs] Fix mistral link in mixtral.md (#28143 ) Fix mistral link in mixtral.md	2023-12-19 10:34:14 -08:00
Aaron Jimenez	4edffda636	[Doc] Fix token link in What 🤗 Transformers can do (#28123 ) Fix token link	2023-12-18 15:06:54 -08:00
Steven Liu	a52e180a0f	[docs] General doc fixes (#28087 ) * doc fix friday * deprecated objects * update not_doctested * update toctree	2023-12-18 10:44:09 -08:00
Rockerz	08a6e7a702	Fix indentation error - semantic_segmentation.md (#28117 ) Update semantic_segmentation.md	2023-12-18 12:47:54 -05:00
Aeneas Stankowski	7f2a8f92e4	Spelling correction (#28110 ) Update mixtral.md correct minor typo in overview	2023-12-18 14:04:05 +00:00
Steven Liu	ebfdb9ca62	[docs] MPS (#28016 ) * mps docs * toctree	2023-12-15 13:17:29 -08:00
Steven Liu	0d63d17765	[docs] Trainer (#27986 ) * first draft * add to toctree * edits * feedback	2023-12-15 12:06:55 -08:00
Younes Belkada	1faeff85ce	Fix Vip-llava docs (#28085 ) * Update vipllava.md * Update modeling_vipllava.py	2023-12-15 20:16:47 +01:00
Cylis	70a127a37a	doc: Correct spelling mistake (#28064 )	2023-12-15 13:01:39 +00:00
Sanchit Gandhi	52c37882fb	[Seamless] Fix links in docs (#27905 ) * [Seamless] Fix links in docs * apply suggestions from code review	2023-12-14 15:14:13 +00:00
Rockerz	fe44b1f1a9	Add model_docs from cpmant.md to derformable_detr.md (#27884 ) * upfaste * Update * Update docs/source/ja/model_doc/deformable_detr.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/data2vec.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/cvt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add suggestions * Toctree update * remove git references * Update docs/source/ja/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/decision_transformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2023-12-13 10:02:29 -08:00
Aaron Jimenez	815ea8e8a2	[Doc] Spanish translation of glossary.md (#27958 ) * Add glossary to es/_toctree.yml * Add glossary.md to es/ * A section translated * B and C section translated * Fix typo in en/glossary.md C section * D section translated \| Add a extra line in en/glossary.md * E and F section translated \| Fix typo in en/glossary.md * Fix words preentrenado * H and I section translated \| Fix typo in en/glossary.md * L section translated * M and N section translated * P section translated * R section translated * S section translated * T section translated * U and Z section translated \| Fix TensorParallel link in both files * Fix word	2023-12-13 09:21:59 -08:00
Younes Belkada	c7f076a00e	Adds VIP-llava to transformers (#27932 ) * v1 * add-new-model-like * revert * fix forward and conversion script * revert * fix copies * fixup * fix * Update docs/source/en/index.md * Apply suggestions from code review * push * fix * fixes here and there * up * fixup and fix tests * Apply suggestions from code review * add docs * fixup * fixes * docstring * add docstring * fixup * docstring * fixup * nit * docs * more copies * fix copies * nit * update test	2023-12-13 10:42:24 +01:00
Stas Bekman	9936143014	[doc] fix typo (#27981 )	2023-12-12 20:32:42 +00:00
Anthony Susevski	e660424717	fixed typos (issue 27919) (#27920 ) * fixed typos (issue 27919) * Update docs/source/en/tasks/knowledge_distillation_for_image_classification.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-12-11 18:44:23 -05:00
Steven Liu	35478182ce	[docs] Fused AWQ modules (#27896 ) streamline	2023-12-11 10:41:33 -08:00
NielsRogge	67b1335cb9	Update bounding box format everywhere (#27944 ) Update formats	2023-12-11 18:03:42 +00:00
Timon Käch	5cec306cdc	Fix parameter count in readme for mixtral 45b (#27945 ) fix parameter count in readme	2023-12-11 14:58:48 +00:00
Merve Noyan	b911c1f10f	Docs for AutoBackbone & Backbone (#27456 ) * Initial commit for AutoBackbone & Backbone * Added timm and clarified out_indices * Swapped the example to out_indices * fix toctree * Update autoclass_tutorial.md * Update backbones.md * Update autoclass_tutorial.md * Add dummy torch input instead * Add dummy torch input * Update autoclass_tutorial.md * Update backbones.md * minor fix * Update docs/source/en/main_classes/backbones.md Co-authored-by: Maria Khalusova <kafooster@gmail.com> * Update docs/source/en/autoclass_tutorial.md Co-authored-by: Maria Khalusova <kafooster@gmail.com> * Added illustrations and explained backbone & neck * Update docs/source/en/main_classes/backbones.md Co-authored-by: Maria Khalusova <kafooster@gmail.com> * Update backbones.md --------- Co-authored-by: Maria Khalusova <kafooster@gmail.com>	2023-12-11 08:22:17 -05:00
Arthur	accccdd008	[`Add Mixtral`] Adds support for the Mixtral MoE (#27942 ) * up * up * test * logits ok * up * up * few fixes * conversion script * up * nits * nits * update * nuke * more updates * nites * fix many issues * nit * scatter * nit * nuke megablocks * nits * fix conversion script * nit * remove * nits * nit * update * oupsssss * change * nits device * nits * fixup * update * merge * add copied from * fix the copy mentions * update tests * more fixes * nits * conversion script * add parts of the readme * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * new test + conversion script * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * fix * fix copies * fix copies * ooops * fix config * Apply suggestions from code review * fix nits * nit * add copies * add batched tests * docs * fix flash attention * let's add more verbose * add correct outputs * support router ouptus * ignore copies where needed * fix * cat list if list is given for now * nits * Update docs/source/en/model_doc/mixtral.md * finish router refactoring * fix forward * fix expected values * nits * fixup * fix * fix bug * fix * fix dtype mismatch * fix * grrr grrr I support item assignment * fix CI * docs * fixup * remove some copied form * fix weird diff * skip doctest fast on the config and modeling * mark that is supports flash attention in the doc * update * Update src/transformers/models/mixtral/modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update docs/source/en/model_doc/mixtral.md Co-authored-by: Lysandre Debut <hi@lysand.re> * revert router logits config issue * update doc accordingly * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py * nits * use torch testing asssert close * fixup * doc nits --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-12-11 12:50:27 +01:00
NielsRogge	7ea21f1f03	[LLaVa] Some improvements (#27895 ) * More improvements * Improve variable names * Update READMEs, improve docs	2023-12-11 10:22:26 +01:00
fxmarty	80377eb018	F.scaled_dot_product_attention support (#26572 ) * add sdpa * wip * cleaning * add ref * yet more cleaning * and more :) * wip llama * working llama * add output_attentions=True support * bigcode sdpa support * fixes * gpt-bigcode support, require torch>=2.1.1 * add falcon support * fix conflicts falcon * style * fix attention_mask definition * remove output_attentions from attnmaskconverter * support whisper without removing any Copied from statement * fix mbart default to eager renaming * fix typo in falcon * fix is_causal in SDPA * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained * add warnings when falling back on the manual implementation * precise doc * wip replace _flash_attn_enabled by config.attn_implementation * fix typo * add tests * style * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace * obey to config.attn_implementation if a config is passed in from_pretrained * fix is_torch_sdpa_available when torch is not installed * remove dead code * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/bart/modeling_bart.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove duplicate pretraining_tp code * add dropout in llama * precise comment on attn_mask * add fmt: off for _unmask_unattended docstring * precise num_masks comment * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion * cleanup modeling_utils * backward compatibility * fix style as requested * style * improve documentation * test pass * style * add _unmask_unattended tests * skip meaningless tests for idefics * hard_check SDPA requirements when specifically requested * standardize the use if XXX_ATTENTION_CLASSES * fix SDPA bug with mem-efficient backend on CUDA when using fp32 * fix test * rely on SDPA is_causal parameter to handle the causal mask in some cases * fix FALCON_ATTENTION_CLASSES * remove _flash_attn_2_enabled occurences * fix test * add OPT to the list of supported flash models * improve test * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test * remove remaining _flash_attn_2_enabled occurence * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/perf_infer_gpu_one.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove use_attn_implementation * fix docstring & slight bug * make attn_implementation internal (_attn_implementation) * typos * fix tests * deprecate use_flash_attention_2=True * fix test * add back llama that was removed by mistake * fix tests * remove _flash_attn_2_enabled occurences bis * add check & test that passed attn_implementation is valid * fix falcon torchscript export * fix device of mask in tests * add tip about torch.jit.trace and move bt doc below sdpa * fix parameterized.expand order * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there * update sdpaattention class with the new cache * Update src/transformers/configuration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/bark/modeling_bark.py * address review comments * WIP torch.jit.trace fix. left: test both eager & sdpa * add test for torch.jit.trace for both eager/sdpa * fix falcon with torch==2.0 that needs to use sdpa * fix doc * hopefully last fix * fix key_value_length that has no default now in mask converter * is it flacky? * fix speculative decoding bug * tests do pass * fix following #27907 --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-09 05:38:14 +09:00
Aaron Jimenez	d6c3a3f137	[Doc] Spanish translation of pad_truncation.md (#27890 ) * Add pad_truncation to es/_toctree.yml * Add pad_truncation.md to es/ * Translated first two paragraph * Translated paddig argument section * Translated truncation argument section * Translated final paragraphs * Translated table * Fixed typo in the table of en/pad_truncation.md * Run make style \| Fix a word * Add Padding (relleno) y el Truncation (truncamiento) in the final paragraphs * Fix relleno and truncamiento words	2023-12-08 10:32:18 -08:00
Tom Aarsen	633215ba58	Generate: New `Cache` abstraction and Attention Sinks support (#26681 ) * Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly * Address numerous PR suggestions 1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic. 2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls. 3. Remove __bool__ and __getitem__ magic as they're confusing. 4. past_key_values.update(key, value, idx) now returns key, value. 5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR. 6. Separate key_cache and value_cache. Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method. * Implement the SinkCache through backward+forward rotations * Integrate (Sink)Cache with Llama FA2 * Set use_legacy_cache=True as default, allows for test passes * Move from/to_legacy_cache to ...Model class * Undo unnecessary newline change * Remove copy utility from deprecated OpenLlama * Match import style * manual rebase with main * Cache class working with generate (#1) * Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly * Address numerous PR suggestions 1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic. 2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls. 3. Remove __bool__ and __getitem__ magic as they're confusing. 4. past_key_values.update(key, value, idx) now returns key, value. 5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR. 6. Separate key_cache and value_cache. Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method. * Integrate (Sink)Cache with Llama FA2 * Move from/to_legacy_cache to ...Model class * Undo unnecessary newline change * Match import style * working generate * Add tests; Simplify code; Apply changes to Mistral and Persimmon * fix rebase mess * a few more manual fixes * last manual fix * propagate changes to phi * upgrade test * add use_legacy_cache docstring; beef up tests * reintroduce unwanted deletes --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com> * move import * add default to model_kwargs.get('use_legacy_cache') * correct failing test * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * apply PR suggestions * fix failing test * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> * PR comments * tmp commit * add docstrings * more tests, more docstrings, add to docs * derp * tmp commit * tmp dbg * more dbg * fix beam search bug * cache can be a list of tuples in some models * fix group beam search * all but sinkcache integration tests * fix sink cache and add hard integration test * now also compatible with input_embeds input * PR comments * add Cache support to Phi+FA2 * make fixup --------- Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-12-08 09:00:17 +01:00
Rockerz	0ea42ef0f9	Translate `model_doc` files from `clip` to `cpm` to JP (#27774 ) * Add models * Add more models * Update docs/source/ja/model_doc/convnextv2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/convbert.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/codegen.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update translation errors and author names * link update --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2023-12-07 11:12:24 -08:00
Dina Suehiro Jones	79b79ae2db	Updates the distributed CPU training documentation to add instructions for running on a Kubernetes cluster (#27780 ) * Updates the Distributed CPU documentation to add a Kubernetes example * Small edits * Fixing link * Adding missing new lines * Minor edits * Update to include Dockerfile snippet * Add comment about tuning env var * Updates based on review comments	2023-12-07 10:50:45 -08:00
Steven Liu	f7595760ed	[docs] Custom semantic segmentation dataset (#27859 ) * custom dataset * fix link * feedback	2023-12-07 10:47:35 -08:00
Joao Gante	58e7f9bb2f	Generate: All logits processors are documented and have examples (#27796 ) Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-07 15:11:35 +00:00
Younes Belkada	44b5506d29	[`Llava`] Add Llava to transformers (#27662 ) * add model like * logits match * minor fixes * fixes * up * up * add todo * llava processor * keep the processor simple * add conversion script * fixup * fix copies * up * add to index * fix config + logits * fix * refactor * more refactor * more refactor * fix copies * add authors * v1 tests * add `LlavaProcessor` in init * remove unneeded import * up * up * docs * up * fix CI * fix CI * add attention mask in test * make fixup * remove the vision model * that' s the dirty way to do it * nits * nits * updates * add more tests * add input tests * fixup * more styling * nits * updates amd cleanup * fixup the generation expected results * fix the testing script * some cleanup and simplification which does not work yet but almost there! * make correct dispatch operations * vectorize works for batch of images and text * last todos * nits * update test and modeling code * remove useless function for now * fix few issues * fix generation * some nits * add bakllava * nits * remove duplicated code * finis merge * cleanup * missed this line * fill the todos * add left padding offset * add left and rignt padding logic * bool to properly index * make sure * more cleanups * batch is fixed 😉 * add correct device for tensor creation * fix some dtype missmatch * ruff * update conversion script * Update src/transformers/__init__.py * fa 2 support + fix conversion script * more * correct reshaping * fix test dict * fix copies by ignoring * fix nit * skip clip vision model * fixup * fixup * LlavaForVisionText2Text -> LlavaForCausalLM * update * fix * raise correct errors * fix * docs * nuke for now * nits here and there * fixup * fix remaining tests * update LlavaForConditionalGeneration instead of CausalLM * fixups * pipeline support * slow and piepline tests * supports batch * nits * cleanup * fix first integration tests * add pad token where needed * correct etsts * fixups * update pipeline testr * fix quality * nits * revert unneeded change * nit * use BatchFeature * from ...feature_extraction_utils import BatchFeature * nits * nits * properly update * more f*** nits * fix copies * comment * keep slow test slow * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add piepline example * add pixel values in docstrign * update pr doctest * fix * fix slow tests * remove hack * fixup * small note * forward contrib credits from PR25789 * forward contrib credits from original implementation and work * add arthur * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Lysandre Debut <hi@lysand.re> * update docstring * nit * move to not doctested because of timeout issues * fixup * add description * more * fix-copies * fix docs * add beam search * add more comments * add typehints on processor * add speedup plot * update slow tests and docs * push test * push batched test * fix batched generation with different number of images * remove benchmark due to a bug * fix test * fix copies * add gcolab demo --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: shauray8 <shauray8@users.noreply.github.com> Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-12-07 09:30:47 +01:00
Susnato Dhar	f84d85ba67	[`FA-2`] Add Flash Attention to `Phi` (#27661 ) * add FA and modify doc file * test_flash_attn_2_generate_padding_right test overwritten * comment * modify persimmon modeling file * added speedup graph * more changes	2023-12-07 07:57:48 +01:00
Nolwenn Bernard	06f561687c	[i18n-fr] Translate autoclass tutorial to French (#27659 ) * Translation of autoclass tutorial * Update totree to keep only tutorial section * Translate title toctree * Fix typos * Update review comments	2023-12-07 07:44:14 +01:00
Alex McKinney	75336c1794	Add Llama Flax Implementation (#24587 ) * Copies `modeling_flax_gpt_neo.py` to start * MLP Block. WIP Attention and Block * Adds Flax implementation of `LlamaMLP` Validated with in-file test. Some slight numeric differences, but assuming it isn't an issue * Adds `FlaxLlamaRMSNorm` layer `flax.linen` includes `RMSNorm` layer but not necessarily in all versions. Hence, we add in-file. * Adds FlaxLlamaAttention Copied from GPT-J as it has efficient caching implementation as well as rotary embeddings. Notice numerically different, but not by a huge amount. Needs investigating * Adds `FlaxLlamaDecoderLayer` numerically inaccurate, debugging.. * debugging rotary mismatch gptj uses interleaved whilst llama uses contiguous i think they match now but still final result is wrong. maybe drop back to just debugging attention layer? * fixes bug with decoder layer still somewhat numerically inaccurate, but close enough for now * adds markers for what to implement next the structure here diverges a lot from the PT version. not a big fan of it, but just get something working for now * implements `FlaxLlamaBlockCollection`] tolerance must be higher than expected, kinda disconcerting * Adds `FlaxLlamaModule` equivalent PyTorch model is `LlamaModel` yay! a language model🤗 * adds `FlaxLlamaForCausalLMModule` equivalent to `LlamaForCausalLM` still missing returning dict or tuple, will add later * start porting pretrained wrappers realised it probably needs return dict as a prereq * cleanup, quality, style * readds `return_dict` and model output named tuples * (tentatively) pretrained wrappers work 🔥 * fixes numerical mismatch in `FlaxLlamaRMSNorm` seems `jax.lax.rsqrt` does not match `torch.sqrt`. manually computing `1 / jax.numpy.sqrt` results in matching values. * [WIP] debugging numerics * numerical match I think issue was accidental change of backend. forcing CPU fixes test. We expect some mismatch on GPU. * adds in model and integration tests for Flax Llama summary of failing: - mul invalid combination of dimensions - one numerical mismatch - bf16 conversion (maybe my local backend issue) - params are not FrozenDict * adds missing TYPE_CHECKING import and `make fixup` * adds back missing docstrings needs review on quality of docstrings, not sure what is required. Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO * commenting out equivalence test as can just use common * debugging * Fixes bug where mask and pos_ids were swapped in pretrained models This results in all tests passing now 🔥 * cleanup of modeling file * cleanup of test file * Resolving simpler review comments * addresses more minor review comments * fixing introduced pytest errors from review * wip additional slow tests * wip tests need to grab a GPU machine to get real logits for comparison otherwise, slow tests should be okay * `make quality`, `make style` * adds slow integration tests - checking logits - checking hidden states - checking generation outputs * `make fix-copies` * fix mangled function following `make fix-copies` * adds missing type checking imports * fixes missing parameter checkpoint warning * more finegrained 'Copied from' tags avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING` * swaps import guards ??? how did these get swapped initially? * removing `inv_freq` again as pytorch version has now removed * attempting to get CI to pass * adds doc entries for llama flax models * fixes typo in __init__.py imports * adds back special equivalence tests these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version * overrides tests with dummy to see if CI passes need to fill in these tests later * adds my contribution to docs * `make style; make quality` * replaces random masking with fixed to work with flax version * `make quality; make style` * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * updates `x`->`tensor` in `rotate_half` * addresses smaller review comments * Update docs/source/en/model_doc/llama.md Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adds integration test class * adds `dtype` to rotary embedding to cast outputs * adds type to flax llama rotary layer * `make style` * `make fix-copies` * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * applies suggestions from review * Update modeling_flax_llama.py * `make fix-copies` * Update tests/models/llama/test_modeling_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * fixes shape mismatch in FlaxLlamaMLP * applies some suggestions from reviews * casts attn output logits to f32 regardless of dtype * adds attn bias using `LlamaConfig.attention_bias` * adds Copied From comments to Flax Llama test * mistral and persimmon test change -copy from llama * updates docs index * removes Copied from in tests it was preventing `make fix-copies` from succeeding * quality and style * ignores FlaxLlama input docstring * adds revision to `_CHECKPOINT_FOR_DOC` * repo consistency and quality * removes unused import * removes copied from from Phi test now diverges from llama tests following FlaxLlama changes * adds `_REAL_CHECKPOINT_FOR_DOC` * removes refs from pr tests * reformat to make ruff happy --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-12-07 07:05:00 +01:00
Rockerz	9660e27cd0	Translating en/model_doc folder docs to Japanese(from `blip` to `clap`) 🇯🇵 (#27673 ) * Add models * Add models and update `_toctree.yml` * Update docs/source/ja/model_doc/chinese_clip.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/camembert.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/bros.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/bros.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/blip-2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/camembert.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * solve merge conflicts and update paper titles * Update docs/source/ja/model_doc/bridgetower.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/canine.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/chinese_clip.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update the authons name in bros..md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2023-12-06 10:38:21 -08:00
Younes Belkada	9270ab0827	[`Flash Attention 2`] Add flash attention 2 for GPT-Neo-X (#26463 ) * add flash-attn-2 support for GPT-neo-x * fixup * add comment * revert * fixes * update docs * comment * again * fix copies * add plot + fix copies * Update docs/source/en/model_doc/gpt_neox.md	2023-12-06 17:22:32 +01:00
Younes Belkada	ba52dec47f	[`Docs`] Update broken image on fused modules (#27856 ) Update quantization.md	2023-12-05 12:33:58 -08:00
Aaron Jimenez	da1d0d404f	Documentation: Spanish translation of perplexity.mdx (#27807 ) * Copy perplexity.md file to es/ folder * Adding perplexity to es/_toctree.yml * Translate first section * Calculating PPL section translate * Example section translate * fix translate of log-likehood * Fix title translate * Fix \ in second paragraph * Change verosimilitud for log-likelihood * Run 'make style'	2023-12-05 10:53:55 -08:00
Arindam Jati	b242d0f297	[Time series] Add PatchTSMixer (#26247 ) * patchtsmixer initial commit * x,y->context_values,target_values, unittest addded * cleanup code * minor * return hidden states * model tests, partial integration tests * ettm notebook temporary * minor * config mask bug fix, tests updated * final ETT notebooks * add selfattn * init * added docstrings * PatchTSMixerForPretraining -> PatchTSMixerForMaskPretraining * functionality tests added * add start and input docstrings * docstring edits * testcase edits * minor changes * docstring error fixed * ran make fixup * finalize integration tests and docs * minor * cleaned gitignore * added dataclass decorator, ran black formatter * ran ruff * formatting * add slow decorator * renamed in_Channel to input_size and default to 1 * shorten dataclass names * use smaller model for testing * moved the 3 heads to the modeling file * use scalers instead of revin * support forecast_channel_indices * fix regression scaling * undo reg. scaling * removed unneeded classes * forgot missing * add more layers * add copied positional_encoding * use patchmask from patchtst * removed dependency on layers directory * formatting * set seed * removed unused imports * fixed forward signature test * adding distributional head for PatchTSMixerForecasting * add generate to forecast * testcases for generate * add generate and distributional head for regression * raise Exception for negative values for neg binominal distribution * formatting changes * remove copied from patchtst and add TODO for test passing * make copies * doc edits * minor changes * format issues * minor changes * minor changes * format docstring * change some class names to PatchTSMixer + class name Transpose to PatchTSMixerTranspose GatedAttention to PatchTSMixerGatedAttention * change NormLayer to PatchTSMixerNormLayer * change MLP to PatchTSMixerMLP * change PatchMixer to PatchMixerBlock, FeatureMixer to FeatureMixerBlock * change ChannelFeatureMixer to ChannelFeatureMixerBlock * change PatchMasking to PatchTSMixerMasking * change Patchify to PatchTSMixerPatchify * list to `list` * fix docstrings * formatting * change bs to batch_size, edit forecast_masking * edit random_masking * change variable name and update docstring in PatchTSMixerMasking * change variable name and update docstring in InjectScalerStatistics4D * update forward call in PatchTSMixerTranspose * change variable name and update docstring in PatchTSMixerNormLayer * change variable name and update docstring in PatchTSMixerMLP * change variable name and update docstring in ChannelFeatureMixerBlock * formatting * formatting issues * docstring issue * fixed observed_mask type in docstrings * use FloatTensor type * formatting * fix rescaling issue in forecasting, fixed integration tests * add docstring from decorator * fix docstring * Update README.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * PatchTSMixerChannelFeatureMixerBlock * formatting * ForPretraining * use num_labels instead of n_classes * remove commented out code * docstring fixed * nn.functional used instead of one letter F * x_tmp renamed * one letter variable x removed from forward calls * one letter variable y removed * remove commented code * rename patch_size, in_channels, PatchTSMixerBackbone * add config to heads * add config to heads tests * code reafactoring to use config instead of passing individual params * Cdocstring fixes part 1 * docstring fixes part 2 * removed logger.debug * context_values -> past_values * formatting changes * pe -> positional_encoding * removed unused target variable * self.mode logic fixed * formatting change * edit docstring and var name * change n_targets to num_targets * rename input_size to num_input_channels * add head names with prefix PatchTSMixer * edit docstring in PatchTSMixerForRegression * fix var name change in testcases * add PatchTSMixerAttention * return dict for all exposed classes, test cases added * format * move loss function to forward call * make style * adding return dict/tuple * make repo-consistency * remove flatten mode * code refactoring * rename data * remove PatchTSMixer and keep only PatchTSMixerEncoder * docstring fixes * removed unused code * format * format * remove contiguous and formatting changes * remove model description from config * replace asserts with ValueError * remove nn.Sequential from PatchTSMixerNormLayer * replace if-else with map * remove all nn.Sequential * format * formatting * fix gradient_checkpointing error after merge, and formatting * make fix-copies * remove comments * reshape * doesnt support gradient checkpointing * corect Patchify * masking updates * batchnorm copy from * format checks * scaler edits * remove comments * format changes * remove self.config * correct class PatchTSMixerMLP(nn.Module): * makr fix * doc updates * fix-copies * scaler class correction * doc edits * scaler edits * update readme with links * injectstatistics add * fix-copies * add norm_eps option to LayerNorm * format changes * fix copies * correct make copies * use parametrize * fix doc string * add docs to toctree * make style * doc segmenting * docstring edit * change forecast to prediction * edit doc * doc edits * remove PatchTSMixerTranspose * add PatchTSMixerPositionalEncoding and init position_enc * remove positional_encoding * edit forecast_masking, remove forecast_mask_ratios * fix broken code * var rename target_values -> future_values * num_features -> d_model * fix broken code after master merge * repo consistency * use postional embedding * prediction_logits -> prediction_outputs, make fix-copies * uncommented @slow * minor changes * loss first in tuple * tuple and dict same ordering * style edits * minor changes * dict/tuple consistent enablement * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix formatting * formatting * usage tip * test on cpu only * add sample usage * change PatchTSMixerForClassification to PatchTSMixerForTimeSeriesClassification * push changes * fix copies * std scaling set to default True case * minor changes * stylechanges --------- Co-authored-by: Arindam Jati <arindam.jati@ibm.com> Co-authored-by: vijaye12 <vijaye12@in.ibm.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: nnguyen <nnguyen@us.ibm.com> Co-authored-by: vijaye12 <vijaykr.e@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-05 15:31:35 +01:00
Younes Belkada	fdb85be40f	Faster generation using AWQ + Fused modules (#27411 ) * v1 fusing modules * add fused mlp support * up * fix CI * block save_pretrained * fixup * small fix * add new condition * add v1 docs * add some comments * style * fix nit * adapt from suggestion * add check * change arg names * change variables name * Update src/transformers/integrations/awq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * style * split up into 3 different private methods * more conditions * more checks * add fused tests for custom models * fix * fix tests * final update docs * final fixes * fix importlib metadata * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change it to `do_fuse` * nit * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * few fixes * revert * fix test * fix copies * raise error if model is not quantized * add test * use quantization_config.config when fusing * Update src/transformers/modeling_utils.py --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2023-12-05 12:14:45 +01:00
Rockerz	235e5d4991	Translate `en/tasks` folder docs to Japanese 🇯🇵 (#27098 ) * Create asr.md * Create audio_classification.md * Create document_question_answering.md * Update document_question_answering.md * add * add * ggg * gg * add masked_language_modeling.md * add monocular_depth estimation * new * dd * add * add * cl * add * Add Traslation.md * hgf * Added docs to Toctree file * Update docs/source/ja/tasks/asr.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/asr.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/image_classification.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/idefics.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/image_captioning.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix docs and revert changes * Update docs/source/en/tasks/idefics.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/language_modeling.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/language_modeling.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/language_modeling.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/prompting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/masked_language_modeling.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/masked_language_modeling.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/prompting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/object_detection.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/semantic_segmentation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/semantic_segmentation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/token_classification.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/translation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/visual_question_answering.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/summarization.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * changes in review 1 and 2 * add * Update docs/source/ja/tasks/asr.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/tasks/translation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * changes * Update docs/source/ja/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update _toctree.yml --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2023-12-04 14:10:54 -08:00

1 2 3 4 5 ...

2360 Commits