246 lines
12 KiB
Markdown
246 lines
12 KiB
Markdown
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
|
|
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
|
rendered properly in your Markdown viewer.
|
|
|
|
-->
|
|
|
|
# RoBERTa
|
|
|
|
<div class="flex flex-wrap space-x-1">
|
|
<a href="https://huggingface.co/models?filter=roberta">
|
|
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-roberta-blueviolet">
|
|
</a>
|
|
<a href="https://huggingface.co/spaces/docs-demos/roberta-base">
|
|
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
|
|
</a>
|
|
<a href="https://huggingface.co/papers/1907.11692">
|
|
<img alt="Paper page" src="https://img.shields.io/badge/Paper%20page-1907.11692-green">
|
|
</a>
|
|
</div>
|
|
|
|
## Overview
|
|
|
|
The RoBERTa model was proposed in [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, [Myle Ott](https://huggingface.co/myleott), Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
|
|
Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018.
|
|
|
|
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with
|
|
much larger mini-batches and learning rates.
|
|
|
|
The abstract from the paper is the following:
|
|
|
|
*Language model pretraining has led to significant performance gains but careful comparison between different
|
|
approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes,
|
|
and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication
|
|
study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and
|
|
training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every
|
|
model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results
|
|
highlight the importance of previously overlooked design choices, and raise questions about the source of recently
|
|
reported improvements. We release our models and code.*
|
|
|
|
This model was contributed by [julien-c](https://huggingface.co/julien-c). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta).
|
|
|
|
## Usage tips
|
|
|
|
- This implementation is the same as [`BertModel`] with a tiny embeddings tweak as well as a setup
|
|
for Roberta pretrained models.
|
|
- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a
|
|
different pretraining scheme.
|
|
- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just
|
|
separate your segments with the separation token `tokenizer.sep_token` (or `</s>`)
|
|
- Same as BERT with better pretraining tricks:
|
|
|
|
* dynamic masking: tokens are masked differently at each epoch, whereas BERT does it once and for all
|
|
* together to reach 512 tokens (so the sentences are in an order than may span several documents)
|
|
* train with larger batches
|
|
* use BPE with bytes as a subunit and not characters (because of unicode characters)
|
|
- [CamemBERT](camembert) is a wrapper around RoBERTa. Refer to this page for usage examples.
|
|
|
|
## Resources
|
|
|
|
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with RoBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
|
|
|
|
<PipelineTag pipeline="text-classification"/>
|
|
|
|
- A blog on [Getting Started with Sentiment Analysis on Twitter](https://huggingface.co/blog/sentiment-analysis-twitter) using RoBERTa and the [Inference API](https://huggingface.co/inference-api).
|
|
- A blog on [Opinion Classification with Kili and Hugging Face AutoTrain](https://huggingface.co/blog/opinion-classification-with-kili) using RoBERTa.
|
|
- A notebook on how to [finetune RoBERTa for sentiment analysis](https://colab.research.google.com/github/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb). 🌎
|
|
- [`RobertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb).
|
|
- [`TFRobertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb).
|
|
- [`FlaxRobertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb).
|
|
- [Text classification task guide](../tasks/sequence_classification)
|
|
|
|
<PipelineTag pipeline="token-classification"/>
|
|
|
|
- [`RobertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb).
|
|
- [`TFRobertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
|
|
- [`FlaxRobertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification).
|
|
- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course.
|
|
- [Token classification task guide](../tasks/token_classification)
|
|
|
|
<PipelineTag pipeline="fill-mask"/>
|
|
|
|
- A blog on [How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train) with RoBERTa.
|
|
- [`RobertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb).
|
|
- [`TFRobertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
|
|
- [`FlaxRobertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb).
|
|
- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the 🤗 Hugging Face Course.
|
|
- [Masked language modeling task guide](../tasks/masked_language_modeling)
|
|
|
|
<PipelineTag pipeline="question-answering"/>
|
|
|
|
- A blog on [Accelerated Inference with Optimum and Transformers Pipelines](https://huggingface.co/blog/optimum-inference) with RoBERTa for question answering.
|
|
- [`RobertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb).
|
|
- [`TFRobertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb).
|
|
- [`FlaxRobertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/question-answering).
|
|
- [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the 🤗 Hugging Face Course.
|
|
- [Question answering task guide](../tasks/question_answering)
|
|
|
|
**Multiple choice**
|
|
- [`RobertaForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb).
|
|
- [`TFRobertaForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb).
|
|
- [Multiple choice task guide](../tasks/multiple_choice)
|
|
|
|
## RobertaConfig
|
|
|
|
[[autodoc]] RobertaConfig
|
|
|
|
## RobertaTokenizer
|
|
|
|
[[autodoc]] RobertaTokenizer
|
|
- build_inputs_with_special_tokens
|
|
- get_special_tokens_mask
|
|
- create_token_type_ids_from_sequences
|
|
- save_vocabulary
|
|
|
|
## RobertaTokenizerFast
|
|
|
|
[[autodoc]] RobertaTokenizerFast
|
|
- build_inputs_with_special_tokens
|
|
|
|
<frameworkcontent>
|
|
<pt>
|
|
|
|
## RobertaModel
|
|
|
|
[[autodoc]] RobertaModel
|
|
- forward
|
|
|
|
## RobertaForCausalLM
|
|
|
|
[[autodoc]] RobertaForCausalLM
|
|
- forward
|
|
|
|
## RobertaForMaskedLM
|
|
|
|
[[autodoc]] RobertaForMaskedLM
|
|
- forward
|
|
|
|
## RobertaForSequenceClassification
|
|
|
|
[[autodoc]] RobertaForSequenceClassification
|
|
- forward
|
|
|
|
## RobertaForMultipleChoice
|
|
|
|
[[autodoc]] RobertaForMultipleChoice
|
|
- forward
|
|
|
|
## RobertaForTokenClassification
|
|
|
|
[[autodoc]] RobertaForTokenClassification
|
|
- forward
|
|
|
|
## RobertaForQuestionAnswering
|
|
|
|
[[autodoc]] RobertaForQuestionAnswering
|
|
- forward
|
|
|
|
</pt>
|
|
<tf>
|
|
|
|
## TFRobertaModel
|
|
|
|
[[autodoc]] TFRobertaModel
|
|
- call
|
|
|
|
## TFRobertaForCausalLM
|
|
|
|
[[autodoc]] TFRobertaForCausalLM
|
|
- call
|
|
|
|
## TFRobertaForMaskedLM
|
|
|
|
[[autodoc]] TFRobertaForMaskedLM
|
|
- call
|
|
|
|
## TFRobertaForSequenceClassification
|
|
|
|
[[autodoc]] TFRobertaForSequenceClassification
|
|
- call
|
|
|
|
## TFRobertaForMultipleChoice
|
|
|
|
[[autodoc]] TFRobertaForMultipleChoice
|
|
- call
|
|
|
|
## TFRobertaForTokenClassification
|
|
|
|
[[autodoc]] TFRobertaForTokenClassification
|
|
- call
|
|
|
|
## TFRobertaForQuestionAnswering
|
|
|
|
[[autodoc]] TFRobertaForQuestionAnswering
|
|
- call
|
|
|
|
</tf>
|
|
<jax>
|
|
|
|
## FlaxRobertaModel
|
|
|
|
[[autodoc]] FlaxRobertaModel
|
|
- __call__
|
|
|
|
## FlaxRobertaForCausalLM
|
|
|
|
[[autodoc]] FlaxRobertaForCausalLM
|
|
- __call__
|
|
|
|
## FlaxRobertaForMaskedLM
|
|
|
|
[[autodoc]] FlaxRobertaForMaskedLM
|
|
- __call__
|
|
|
|
## FlaxRobertaForSequenceClassification
|
|
|
|
[[autodoc]] FlaxRobertaForSequenceClassification
|
|
- __call__
|
|
|
|
## FlaxRobertaForMultipleChoice
|
|
|
|
[[autodoc]] FlaxRobertaForMultipleChoice
|
|
- __call__
|
|
|
|
## FlaxRobertaForTokenClassification
|
|
|
|
[[autodoc]] FlaxRobertaForTokenClassification
|
|
- __call__
|
|
|
|
## FlaxRobertaForQuestionAnswering
|
|
|
|
[[autodoc]] FlaxRobertaForQuestionAnswering
|
|
- __call__
|
|
|
|
</jax>
|
|
</frameworkcontent>
|