Create camembert-base-README.md

This commit is contained in:
Benjamin Muller 2020-03-13 20:08:53 +08:00 committed by Julien Chaumond
parent afea70c01c
commit cc4c37952a
1 changed files with 9 additions and 0 deletions

View File

@ -0,0 +1,9 @@
# CamemBERT
CamemBERT is a state-of-the-art language model for French based on the RoBERTa architecture pretrained on the French subcorpus of the newly available multilingual corpus OSCAR.
CamemBERT was originally evaluated on four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER) and natural language inference (NLI); improving the state of the art for most tasks over previous monolingual and multilingual approaches, which confirms the effectiveness of large pretrained language models for French.
CamemBERT was trained and evaluated by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
Preprint can be found [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894)