transformers/model_cards/urduhack/roberta-urdu-small
Ikram Ali 98ee802023
[model_cards] Add model cards for Urduhack model (roberta-urdu-small) (#6536)
* [model_cards] roberta-urdu-small added.

* [model_cards] typo fixed.

* Tweak license format (yaml expects a simple string)

Co-authored-by: Ikram Ali <mrikram1989>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-17 16:04:29 -04:00
..
README.md [model_cards] Add model cards for Urduhack model (roberta-urdu-small) (#6536) 2020-08-17 16:04:29 -04:00

README.md

language thumbnail tags license
ur https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png
roberta-urdu-small
urdu
transformers
mit

roberta-urdu-small

License: MIT

Overview

Language model: roberta-urdu-small Model size: 125M Language: Urdu Training data: News data from urdu news resources in Pakistan

About roberta-urdu-small

roberta-urdu-small is a language model for urdu language.

from transformers import pipeline
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small")

Training procedure

roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from urduhack to eliminate characters from other languages like arabic.

About Urduhack

Urduhack is a Natural Language Processing (NLP) library for urdu language. Github: https://github.com/urduhack/urduhack