transformers

History

Khai Mai c5c69096b3 Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517 ) * fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask * format code using black and ruff * skip computing mask if attention_mask=None * add tests for load balancing loss Mixtral-Moe * fix assert loss is different in mixtral_test * fix pad_leng * use assertNotAlmostEqual and print to debug * remove print for debug * minor updates * reduce rtol and atol	2024-01-24 10:12:14 +01:00
..
__init__.py	[`Add Mixtral`] Adds support for the Mixtral MoE (#27942 )	2023-12-11 12:50:27 +01:00
test_modeling_mixtral.py	Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517 )	2024-01-24 10:12:14 +01:00

Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517 )

* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask

* format code using black and ruff

* skip computing mask if attention_mask=None

* add tests for load balancing loss Mixtral-Moe

* fix assert loss is different in mixtral_test

* fix pad_leng

* use assertNotAlmostEqual and print to debug

* remove print for debug

* minor updates

* reduce rtol and atol

2024-01-24 10:12:14 +01:00

__init__.py

[`Add Mixtral`] Adds support for the Mixtral MoE (#27942 )

2023-12-11 12:50:27 +01:00

test_modeling_mixtral.py

Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517 )

2024-01-24 10:12:14 +01:00