![]() * fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask * format code using black and ruff * skip computing mask if attention_mask=None * add tests for load balancing loss Mixtral-Moe * fix assert loss is different in mixtral_test * fix pad_leng * use assertNotAlmostEqual and print to debug * remove print for debug * minor updates * reduce rtol and atol |
||
---|---|---|
.. | ||
__init__.py | ||
test_modeling_mixtral.py |