transformers/tests/models/mixtral
Khai Mai c5c69096b3
Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517)
* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask

* format code using black and ruff

* skip computing mask if attention_mask=None

* add tests for load balancing loss Mixtral-Moe

* fix assert loss is different in mixtral_test

* fix pad_leng

* use assertNotAlmostEqual and print to debug

* remove print for debug

* minor updates

* reduce rtol and atol
2024-01-24 10:12:14 +01:00
..
__init__.py [`Add Mixtral`] Adds support for the Mixtral MoE (#27942) 2023-12-11 12:50:27 +01:00
test_modeling_mixtral.py Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517) 2024-01-24 10:12:14 +01:00