transformers/docs/source/en/tasks
Abhi Venigalla 005b957fb8
Add DBRX Model (#29921)
* wip

* fix __init__.py

* add docs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments 1

* work on make fixup

* pass configs down

* add sdpa attention

* remove DbrxBlock

* add to configuration_auto

* docstring now passes formatting test

* fix style

* update READMEs

* add dbrx to modeling_auto

* make fix-copies generated this

* add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP

* config docstring passes formatting test

* rename moe_loss_weight to router_aux_loss_coef

* add to flash-attn documentation

* fix model-path in tests

* Explicitly make `"suli"` the default `ffn_act_fn`

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* default to using router_aux_loss_coef over ffn_config[moe_loss_weight]

* fix _flash_attn_uses_top_left_mask and is_causal

* fix tests path

* don't use token type IDs

* follow Llama and remove token_type_ids from test

* init ConfigTester differently so tests pass

* remove multiple choice test

* remove question + answer test

* remove sequence classification test

* remove token classification test

* copy Llama tests and remove token_type_ids from test inputs

* do not test pruning or headmasking; style code

* add _tied_weights_keys parameter to pass test

* add type hints

* fix type check

* update config tester

* remove masked_lm test

* remove encoder tests

* initialize DbrxModelTester with correct params

* style

* torch_dtype does not rely on torch

* run make fixup, fix-copies

* use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py

* add copyright info

* fix imports and DbrxRotaryEmbedding

* update DbrxModel docstring

* use copies

* change model path in docstring

* use config in DbrxFFN

* fix flashattention2, sdpaattention

* input config to DbrXAttention, DbrxNormAttentionNorm

* more fixes

* fix

* fix again!

* add informative comment

* fix ruff?

* remove print statement + style

* change doc-test

* fix doc-test

* fix docstring

* delete commented out text

* make defaults match dbrx-instruct

* replace `router_aux_loss_coef` with `moe_loss_weight`

* is_decoder=True

* remove is_decoder from configtester

* implement sdpa properly

* make is_decoder pass tests

* start on the GenerationTesterMixin tests

* add dbrx to sdpa documentation

* skip weight typing test

* style

* initialize smaller model

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Add DBRX to toctree

* skip test_new_cache_format

* make config defaults smaller again

* add pad_token_id

* remove pad_token_id from config

* Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP

* Update src/transformers/models/dbrx/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/dbrx.md

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/models/dbrx/configuration_dbrx.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/dbrx.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix typo

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update docs, fix configuration_auto.py

* address pr comments

* remove is_decoder flag

* slice

* fix requires grad

* remove grad

* disconnect differently

* remove grad

* enable grads

* patch

* detach expert

* nissan al ghaib

* Update modeling_dbrx.py

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* replace "Gemma" with "Dbrx"

* remove # type: ignore

* don't hardcode vocab_size

* remove ToDo

* Re-add removed idefics2 line

* Update test to use tiny-random!

* Remove TODO

* Remove one more case of loading the entire dbrx-instruct in the tests

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* address some comments

* small model

* add dbrx to tokenization_auto

* More docstrings with add_start_docstrings

* Dbrx for now

* add PipelineTesterMixin

* Update src/transformers/models/dbrx/configuration_dbrx.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove flash-attn2 import error

* fix docstring

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add useage example

* put on one line

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix ffn_act_fn

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change "dbrx" to "DBRX" for display purposes.

* fix __init__.py?

* fix __init__.py

* fix README

* return the aux_loss

* remove extra spaces

* fix configuration_auto.py

* fix format in tokenization_auto

* remove new line

* add more useage examples

---------

Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Eitan Turok <eitan.turok@databricks.com>
Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Eitan Turok <eitanturok@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-18 15:18:52 +02:00
..
asr.md Add new meta w2v2-conformer BERT-like model (#28165) 2024-01-18 13:37:34 +00:00
audio_classification.md Add new meta w2v2-conformer BERT-like model (#28165) 2024-01-18 13:37:34 +00:00
document_question_answering.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
idefics.md [Docs] Fix broken links and syntax issues (#28918) 2024-02-08 14:13:35 -08:00
image_captioning.md [Docs] Fix backticks in inline code and documentation links (#28875) 2024-02-06 11:15:44 -08:00
image_classification.md [Trainer] Undo #29896 (#30129) 2024-04-09 12:55:42 +02:00
image_feature_extraction.md Fix header in IFE task guide (#29859) 2024-03-26 12:32:37 +01:00
image_to_image.md Image-to-Image Task Guide (#26595) 2023-10-16 15:12:03 +02:00
knowledge_distillation_for_image_classification.md fixed typos (issue 27919) (#27920) 2023-12-11 18:44:23 -05:00
language_modeling.md Add DBRX Model (#29921) 2024-04-18 15:18:52 +02:00
mask_generation.md Mask Generation Task Guide (#28897) 2024-02-14 18:29:49 +00:00
masked_language_modeling.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
monocular_depth_estimation.md Add Depth Anything (#28654) 2024-01-25 09:34:50 +01:00
multiple_choice.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
object_detection.md [Trainer] Undo #29896 (#30129) 2024-04-09 12:55:42 +02:00
prompting.md Fix doctest more (for `docs/source/en`) (#30247) 2024-04-15 14:10:59 +02:00
question_answering.md fix the post-processing link (#29091) 2024-02-19 10:15:58 +00:00
semantic_segmentation.md [docs] Fix image segmentation guide (#30132) 2024-04-09 09:08:37 -07:00
sequence_classification.md Add jamba (#29943) 2024-04-18 11:04:02 +02:00
summarization.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
text-to-speech.md Add FastSpeech2Conformer (#23439) 2024-01-03 18:01:06 +00:00
token_classification.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
translation.md Configuring Translation Pipelines documents update #27753 (#29986) 2024-04-17 11:27:49 +02:00
video_classification.md [Trainer] Undo #29896 (#30129) 2024-04-09 12:55:42 +02:00
visual_question_answering.md VQA task guide (#25244) 2023-08-09 08:29:06 -04:00
zero_shot_image_classification.md [docs] Fix model reference in zero shot image classification example (#26206) 2023-09-19 00:45:12 +02:00
zero_shot_object_detection.md [Docs] Update README and default pipelines (#28864) 2024-02-12 10:21:36 +01:00