* implement convert_mamba_ssm_checkpoint_to_pytorch
* Add test test_model_from_mamba_ssm_conversion
* moved convert_ssm_config_to_hf_config to inside mamba_ssm_available check
* fix skipif clause
* moved skips to inside test since skipif decorator isn't working for some reason
* Added validation
* removed test
* fixup
* only compare logits
* remove weight rename
* Update src/transformers/models/mamba/convert_mamba_ssm_checkpoint_to_pytorch.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* nits
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix generate_with_fallback **kwargs
* Change pop to get
* Delete keys from kwargs to prevent overriding generation_config
* Revert to passing kwargs by reference, but make a (shallow) copy
* dict -> copy.copy
* Add test_whisper_longform_multi_batch_beam
To address the issue of NaN logit outputs for certain combinations
of the `image_size`, `patch_size` and `depths` configuration
parameters, an assertion was made to ensure that the resulting
`window_size` field in the model's Self Attention class is greater
than 1, preventing divisions by zero in the normalization of
`relative_coords_table`.
Fix: #28675
* Hard error when ignoring tensors. (#27484)
* [WIP] Hard error when ignoring tensors.
* Better selection/error when saving a checkpoint.
- Find all names we should normally drop (those are in the transformers
config)
- Find all disjoint tensors (for those we can safely trigger a copy to
get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
but we try to find them all anyway.)
- For all identical names:
- If they are in the config, just ignore them everything is fine
- If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
disjoint. raise a hard error.
* Adding a failing test on `main` that passes here.
* We don't need to keep the subfolder logic in this test.
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add small tests.
* Dead variable.
* Fixup.
* Fixing tied_Weights_keys on generic models.
* Fixup + T5 encoder/decoder tying (with different layers)
* Code quality.
* Dynamic member.
* trigger
* Fixing encoder name for other types of encoder/decoder combos.
* Fix scoping.
* Update .github/workflows/self-scheduled.yml
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fixing the tied_weights after the call.
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode
* Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode
* Exclude pad_token filtering since it is used as CTC-blank token
* Add small test for skip_special_tokens
* Update decoding test for added new token
* add FA2 to o.g Musicgen
* make style
* add FA2 support to Musicgen Melody
* add generation FA2 tests to o.g Musicgen
* make style and fix copies
* add Musicgen to FA2 docs + deprecate list
* add sdpa supports to Musicgen's
* make style and fix copies
* refactor attention implementation arguments
* add Copied from to sdpa tests
* add copied form in sdpa tests melody
* add copied for FA2 generation tests
* add FA2 inference copied from
* make style
* fix issue with logit processor in beam search in Flax
* adding FlaxNoRepeatNGramLogitsProcessor class + unit test
* style correction and code verification
* add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests
* fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams
* replace non-jit compatible masking of ngrams that are not yet generated with jittable version
* Revert "fix issue with logit processor in beam search in Flax"
This reverts commit 09b70d7e4d.
* add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor
* change the method of casting to boolean of banned tokens indices
* fix code style
* remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop
* remove useless loop iterations
* set some variables that were calculated and used multiple times
* fix format
* Fix sinusoidal_embeddings in FlaubertModel
* Fix for Informer
* Fix for XLM
* Move sinusoidal emb for XLM
* Move sinusoidal emb for Flaubert
* Small cleanup
* Add comments on tests code copied from
* Add with Distilbert->
* fix bug and add tests
* nit
* otherway to get the cur len instead of attention mask
* more places where this might have been broken
* nit
* oups
* inputs_embeds vs input_embeds
* test generated outptus
* style
* nit
* fix
* skip failing biogpt
* Start rework
* Fix failing test
* Include max
* Update src/transformers/trainer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py
* add tests and raise ValueError when optimizer is None
* add second layer to test and freeze its weigths
* check if torch is available before running tests
* use decorator to check if torch is available
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix test indentation
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* Safe import of LRScheduler
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix up
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>