transformers

History

Abhi Venigalla 005b957fb8 Add DBRX Model (#29921 ) * wip * fix __init__.py * add docs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments 1 * work on make fixup * pass configs down * add sdpa attention * remove DbrxBlock * add to configuration_auto * docstring now passes formatting test * fix style * update READMEs * add dbrx to modeling_auto * make fix-copies generated this * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * config docstring passes formatting test * rename moe_loss_weight to router_aux_loss_coef * add to flash-attn documentation * fix model-path in tests * Explicitly make `"suli"` the default `ffn_act_fn` Co-authored-by: Wing Lian <wing.lian@gmail.com> * default to using router_aux_loss_coef over ffn_config[moe_loss_weight] * fix _flash_attn_uses_top_left_mask and is_causal * fix tests path * don't use token type IDs * follow Llama and remove token_type_ids from test * init ConfigTester differently so tests pass * remove multiple choice test * remove question + answer test * remove sequence classification test * remove token classification test * copy Llama tests and remove token_type_ids from test inputs * do not test pruning or headmasking; style code * add _tied_weights_keys parameter to pass test * add type hints * fix type check * update config tester * remove masked_lm test * remove encoder tests * initialize DbrxModelTester with correct params * style * torch_dtype does not rely on torch * run make fixup, fix-copies * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py * add copyright info * fix imports and DbrxRotaryEmbedding * update DbrxModel docstring * use copies * change model path in docstring * use config in DbrxFFN * fix flashattention2, sdpaattention * input config to DbrXAttention, DbrxNormAttentionNorm * more fixes * fix * fix again! * add informative comment * fix ruff? * remove print statement + style * change doc-test * fix doc-test * fix docstring * delete commented out text * make defaults match dbrx-instruct * replace `router_aux_loss_coef` with `moe_loss_weight` * is_decoder=True * remove is_decoder from configtester * implement sdpa properly * make is_decoder pass tests * start on the GenerationTesterMixin tests * add dbrx to sdpa documentation * skip weight typing test * style * initialize smaller model Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Add DBRX to toctree * skip test_new_cache_format * make config defaults smaller again * add pad_token_id * remove pad_token_id from config * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * Update src/transformers/models/dbrx/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix typo * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs, fix configuration_auto.py * address pr comments * remove is_decoder flag * slice * fix requires grad * remove grad * disconnect differently * remove grad * enable grads * patch * detach expert * nissan al ghaib * Update modeling_dbrx.py * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * replace "Gemma" with "Dbrx" * remove # type: ignore * don't hardcode vocab_size * remove ToDo * Re-add removed idefics2 line * Update test to use tiny-random! * Remove TODO * Remove one more case of loading the entire dbrx-instruct in the tests * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address some comments * small model * add dbrx to tokenization_auto * More docstrings with add_start_docstrings * Dbrx for now * add PipelineTesterMixin * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove flash-attn2 import error * fix docstring Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add useage example * put on one line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix ffn_act_fn Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change "dbrx" to "DBRX" for display purposes. * fix __init__.py? * fix __init__.py * fix README * return the aux_loss * remove extra spaces * fix configuration_auto.py * fix format in tokenization_auto * remove new line * add more useage examples --------- Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Eitan Turok <eitan.turok@databricks.com> Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Eitan Turok <eitanturok@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>		2024-04-18 15:18:52 +02:00
..
asr.md	Add new meta w2v2-conformer BERT-like model (#28165 )	2024-01-18 13:37:34 +00:00
audio_classification.md	Add new meta w2v2-conformer BERT-like model (#28165 )	2024-01-18 13:37:34 +00:00
document_question_answering.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
idefics.md	[Docs] Fix broken links and syntax issues (#28918 )	2024-02-08 14:13:35 -08:00
image_captioning.md	[Docs] Fix backticks in inline code and documentation links (#28875 )	2024-02-06 11:15:44 -08:00
image_classification.md	[Trainer] Undo #29896 (#30129 )	2024-04-09 12:55:42 +02:00
image_feature_extraction.md	Fix header in IFE task guide (#29859 )	2024-03-26 12:32:37 +01:00
image_to_image.md	Image-to-Image Task Guide (#26595 )	2023-10-16 15:12:03 +02:00
knowledge_distillation_for_image_classification.md	fixed typos (issue 27919) (#27920 )	2023-12-11 18:44:23 -05:00
language_modeling.md	Add DBRX Model (#29921 )	2024-04-18 15:18:52 +02:00
mask_generation.md	Mask Generation Task Guide (#28897 )	2024-02-14 18:29:49 +00:00
masked_language_modeling.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
monocular_depth_estimation.md	Add Depth Anything (#28654 )	2024-01-25 09:34:50 +01:00
multiple_choice.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
object_detection.md	[Trainer] Undo #29896 (#30129 )	2024-04-09 12:55:42 +02:00
prompting.md	Fix doctest more (for `docs/source/en`) (#30247 )	2024-04-15 14:10:59 +02:00
question_answering.md	fix the post-processing link (#29091 )	2024-02-19 10:15:58 +00:00
semantic_segmentation.md	[docs] Fix image segmentation guide (#30132 )	2024-04-09 09:08:37 -07:00
sequence_classification.md	Add jamba (#29943 )	2024-04-18 11:04:02 +02:00
summarization.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
text-to-speech.md	Add FastSpeech2Conformer (#23439 )	2024-01-03 18:01:06 +00:00
token_classification.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
translation.md	Configuring Translation Pipelines documents update #27753 (#29986 )	2024-04-17 11:27:49 +02:00
video_classification.md	[Trainer] Undo #29896 (#30129 )	2024-04-09 12:55:42 +02:00
visual_question_answering.md	VQA task guide (#25244 )	2023-08-09 08:29:06 -04:00
zero_shot_image_classification.md	[docs] Fix model reference in zero shot image classification example (#26206 )	2023-09-19 00:45:12 +02:00
zero_shot_object_detection.md	[Docs] Update README and default pipelines (#28864 )	2024-02-12 10:21:36 +01:00