ukr-roberta-base
Pre-training corpora
Below is the list of corpora used along with the output of wc command (counting lines, words and characters). These corpora were concatenated and tokenized with HuggingFace Roberta Tokenizer.
Pre-training details
- Ukrainian Roberta was trained with code provided in HuggingFace tutorial
- Currently released model follows roberta-base-cased model architecture (12-layer, 768-hidden, 12-heads, 125M parameters)
- The model was trained on 4xV100 (85 hours)
- Training configuration you can find in the original repository
Author
Vitalii Radchenko - contact me on Twitter @vitaliradchenko