transformers/scripts/tatoeba
Julien Chaumond 9129fd0377
`transformers-cli login` => `huggingface-cli login` (#18490)
* zero chance anyone's using that constant no?

* `transformers-cli login` => `huggingface-cli login`

* `transformers-cli repo create` => `huggingface-cli repo create`

* `make style`
2022-08-06 09:42:55 +02:00
..
README.md `transformers-cli login` => `huggingface-cli login` (#18490) 2022-08-06 09:42:55 +02:00
upload_models.sh `transformers-cli login` => `huggingface-cli login` (#18490) 2022-08-06 09:42:55 +02:00

README.md

Setup transformers following instructions in README.md, (I would fork first).

git clone git@github.com:huggingface/transformers.git
cd transformers
pip install -e .
pip install pandas GitPython wget

Get required metadata

curl https://cdn-datasets.huggingface.co/language_codes/language-codes-3b2.csv  > language-codes-3b2.csv
curl https://cdn-datasets.huggingface.co/language_codes/iso-639-3.csv > iso-639-3.csv

Install Tatoeba-Challenge repo inside transformers

git clone git@github.com:Helsinki-NLP/Tatoeba-Challenge.git

To convert a few models, call the conversion script from command line:

python src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py --models heb-eng eng-heb --save_dir converted

To convert lots of models you can pass your list of Tatoeba model names to resolver.convert_models in a python client or script.

from transformers.convert_marian_tatoeba_to_pytorch import TatoebaConverter
resolver = TatoebaConverter(save_dir='converted')
resolver.convert_models(['heb-eng', 'eng-heb'])

Upload converted models

Since version v3.5.0, the model sharing workflow is switched to git-based system . Refer to model sharing doc for more details.

To upload all converted models,

  1. Install git-lfs.

  2. Login to transformers-cli

huggingface-cli login
  1. Run the upload_models script
./scripts/tatoeba/upload_models.sh

Modifications

  • To change naming logic, change the code near os.rename. The model card creation code may also need to change.
  • To change model card content, you must modify TatoebaCodeResolver.write_model_card