* add test_vocab_size for sentencepiece tok.
* add test_get_vocab for sentencepiece tok.
* add test_convert_token_and_id for sentencepiece tok.
* add test_tokenize_and_convert_tokens_to_string for all tok.
* improve test_tokenize_and_convert_tokens_to_string for sp. tok.
* add common tokenizer integration tests
- for albert
- for barthez
* add tokenizer integration tests to bert gen.
* add most tokenizer integration tests
* fix camembert tokenizer integration test
* add tokenizer integration test to marian
* add tokenizer integration test to reformer
* add typing and doc to tokenizer_integration_test_util
* fix tokenizer integration test of reformer
* improve test_sentencepiece_tokenize_and_convert_tokens_to_string
* empty commit to trigger CI
* fix tokenizer integration test of reformer
* remove code not needed anymore
* empty commit to trigger CI
* empty commit to trigger CI
* improve slow class tok usage at xlm rob
* add subword regularization for barthez
* improve barthez tok. test
* fix tokenizer tests
* add subword regularization for camembert
* add subword regularization for deberta v2 tokenizer
* add more doc to deberta v2 tokenizer
* add subword regularization for speech to text tok.
* fix sp_model_kwargs type in speech 2 text tok.
* add subword regularization for M2M100 tok.
* add more concrete type hints
* fix tests for m2m100 and s2t tok.
* add missing Any import
* fix syntax error in m2m100 tok.
* fix unpickle of m2m100 and s2t tok.
* fix test of m2m100 and s2t tok.
* improve unpickle of deberta v2 tok.
* add test for pickle of barthez & camembert
* fix pickle of barthez & camembert
* add test for deberta v2 tok. pickle
* fix m2m100 tok. pickle
* fix s2t tok. pickle
* add subword regularization to albert tok.
* refactor subword reg. test into TokenizerTesterMixin
improve albert tok. test
remove sample argument form albert tok.
check subword reg. using TokenizerTesterMixin
improve tok. tests
improve xlm roberta tok. tests
improve xlm roberta tok. tests
* add subword regularization for big bird t.
* improve xlm roberta tok. test
* add subword regularization for mbart50 tok.
* add subword regularization for pegasus tok.
* add subword regularization for reformer tok.
* add subword regularization for T5 tok.
* fix t5 tok. test formatting
* add subword regularization for xlm_proph. tok.
* add subword regularization for xlnet tok.
* add subword regularization for gert_gen tok.
* add typing to tokenizers
* add typing to xlm rob. tok
* add subword regularization for marian tok.
* add reverse tok. test
* fix marian tok test
* fix marian tok test
* fix casing in tok. tests
* fix style of tok. common test
* fix deberta v2 tok test
* add type annotations to tok. tests
* add type annotations to tok. __init__
* add typing to kokenizer
* add type annotations to tok. __init__
* don't specify the default when it's None
* fix barthez tok. doc
* move sentencepiece tok. tests to TokenizerTesterMixin
* fix unused imports
* fix albert tok. test
* add comment to sentencepiece test options
* fix Any import at big bird tok.
* fix Any import at xlm prophetnet tok.
* empty commit to trigger CI
* [WIP] Enabling multilingual models for translation pipelines.
* decoder_input_ids -> forced_bos_token_id
* Improve docstring.
* Rebase
* Fixing 2 bugs
- Type token_ids coming from `_parse_and_tokenize`
- Wrong index from tgt_lang.
* Fixing black version.
* Adding tests for _build_translation_inputs and add them for all
tokenizers.
* Mbart actually puts the lang code at the end.
* Fixing m2m100.
* Adding TF support to `deep_round`.
* Update src/transformers/pipelines/text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding one line comment.
* Fixing M2M100 `_build_translation_input_ids`, and fix the call site.
* Fixing tests + deep_round -> nested_simplify
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>