98ee802023
* [model_cards] roberta-urdu-small added. * [model_cards] typo fixed. * Tweak license format (yaml expects a simple string) Co-authored-by: Ikram Ali <mrikram1989> Co-authored-by: Julien Chaumond <chaumond@gmail.com> |
||
---|---|---|
.. | ||
README.md |
README.md
language | thumbnail | tags | license | |||
---|---|---|---|---|---|---|
ur | https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png |
|
mit |
roberta-urdu-small
Overview
Language model: roberta-urdu-small Model size: 125M Language: Urdu Training data: News data from urdu news resources in Pakistan
About roberta-urdu-small
roberta-urdu-small is a language model for urdu language.
from transformers import pipeline
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small")
Training procedure
roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from urduhack to eliminate characters from other languages like arabic.
About Urduhack
Urduhack is a Natural Language Processing (NLP) library for urdu language. Github: https://github.com/urduhack/urduhack