transformers

History

Aritra Roy Gosthipaty 965e98dc54 [Port] TensorFlow implementation of Mistral (#29708 ) * chore: initial commit * chore: adding imports and inits * chore: adding the causal and classification code * chore: adding names to the layers * chore: using single self attn layer * chore: built the model and layers * chore: start with testing * chore: docstring change, transpose fix * fix: rotary embedding * chore: adding cache implementation * remove unused torch * chore: fixing the indexing issue * make fix-copies * Use modeling_tf_utils.keras * make fixup * chore: fixing tests * chore: adding past key value logic * chore: adding multi label classfication test * fix: switching on the built parameters in the layers * fixing repo consistency * ruff formats * style changes * fix: tf and pt equivalence * removing returns from docstrings * fix docstrings * fix docstrings * removing todos * fix copies * fix docstring * fix docstring * chore: using easier rotate_half * adding integration tests * chore: addressing review related to rotary embedding layer * review changes * [run-slow] mistral * skip: test save load after resize token embedding * style --------- Co-authored-by: Matt <rocketknight1@gmail.com>		2024-05-23 17:48:49 +01:00
..
__init__.py	[Mistral] Mistral-7B-v0.1 support (#26447 )	2023-09-27 18:30:46 +02:00
test_modeling_flax_mistral.py	Flax mistral (#26943 )	2024-01-31 14:19:02 +01:00
test_modeling_mistral.py	CI: AMD MI300 tests fix (#30797 )	2024-05-21 12:46:07 +01:00
test_modeling_tf_mistral.py	[Port] TensorFlow implementation of Mistral (#29708 )	2024-05-23 17:48:49 +01:00