82 lines
3.3 KiB
Markdown
82 lines
3.3 KiB
Markdown
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
|
|
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
|
rendered properly in your Markdown viewer.
|
|
|
|
-->
|
|
|
|
# XLM-RoBERTa-XL
|
|
|
|
## Overview
|
|
|
|
The XLM-RoBERTa-XL model was proposed in [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
|
|
|
|
The abstract from the paper is the following:
|
|
|
|
*Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available.*
|
|
|
|
This model was contributed by [Soonhwan-Kwon](https://github.com/Soonhwan-Kwon) and [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
|
|
|
|
## Usage tips
|
|
|
|
XLM-RoBERTa-XL is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
|
|
not require `lang` tensors to understand which language is used, and should be able to determine the correct
|
|
language from the input ids.
|
|
|
|
## Resources
|
|
|
|
- [Text classification task guide](../tasks/sequence_classification)
|
|
- [Token classification task guide](../tasks/token_classification)
|
|
- [Question answering task guide](../tasks/question_answering)
|
|
- [Causal language modeling task guide](../tasks/language_modeling)
|
|
- [Masked language modeling task guide](../tasks/masked_language_modeling)
|
|
- [Multiple choice task guide](../tasks/multiple_choice)
|
|
|
|
## XLMRobertaXLConfig
|
|
|
|
[[autodoc]] XLMRobertaXLConfig
|
|
|
|
## XLMRobertaXLModel
|
|
|
|
[[autodoc]] XLMRobertaXLModel
|
|
- forward
|
|
|
|
## XLMRobertaXLForCausalLM
|
|
|
|
[[autodoc]] XLMRobertaXLForCausalLM
|
|
- forward
|
|
|
|
## XLMRobertaXLForMaskedLM
|
|
|
|
[[autodoc]] XLMRobertaXLForMaskedLM
|
|
- forward
|
|
|
|
## XLMRobertaXLForSequenceClassification
|
|
|
|
[[autodoc]] XLMRobertaXLForSequenceClassification
|
|
- forward
|
|
|
|
## XLMRobertaXLForMultipleChoice
|
|
|
|
[[autodoc]] XLMRobertaXLForMultipleChoice
|
|
- forward
|
|
|
|
## XLMRobertaXLForTokenClassification
|
|
|
|
[[autodoc]] XLMRobertaXLForTokenClassification
|
|
- forward
|
|
|
|
## XLMRobertaXLForQuestionAnswering
|
|
|
|
[[autodoc]] XLMRobertaXLForQuestionAnswering
|
|
- forward
|