Generation doc (#6470)
* Generation doc * MBartForConditionalGeneration (#6441) * add MBartForConditionalGeneration * style * rebase and fixes * add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS * fix docs * don't ignore mbart * doc * fix mbart fairseq link * put mbart before bart * apply doc suggestions * Use hash to clean the test dirs (#6475) * Use hash to clean the test dirs * Use hash to clean the test dirs * Use hash to clean the test dirs * fix * [EncoderDecoder] Add Cross Attention for GPT2 (#6415) * add cross attention layers for gpt2 * make gpt2 cross attention work * finish bert2gpt2 * add explicit comments * remove attention mask since not yet supported * revert attn mask in pipeline * Update src/transformers/modeling_gpt2.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_encoder_decoder.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Sort unique_no_split_tokens to make it deterministic (#6461) * change unique_no_split_tokens's type to set * use sorted list instead of set * style * Import accuracy_score (#6480) * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address comments * Styling * Generation doc * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address comments * Styling Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Kevin Canwen Xu <canwenxu@126.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: gijswijnholds <gijswijnholds@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
This commit is contained in:
parent
b5ba758ba9
commit
895ed8f451
|
@ -12,7 +12,9 @@ are common among all the models to:
|
||||||
- prune the attention heads of the model.
|
- prune the attention heads of the model.
|
||||||
|
|
||||||
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
|
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
|
||||||
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models).
|
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
|
||||||
|
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models) and
|
||||||
|
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models)
|
||||||
|
|
||||||
|
|
||||||
``PreTrainedModel``
|
``PreTrainedModel``
|
||||||
|
@ -46,4 +48,8 @@ The other methods that are common to each model are defined in :class:`~transfor
|
||||||
Generative models
|
Generative models
|
||||||
~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Coming soon
|
.. autoclass:: transformers.generation_utils.GenerationMixin
|
||||||
|
:members:
|
||||||
|
|
||||||
|
.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
|
||||||
|
:members:
|
|
@ -91,7 +91,7 @@ class PretrainedConfig(object):
|
||||||
keep for top-k-filtering that will be used by default in the :obj:`generate` method of the model.
|
keep for top-k-filtering that will be used by default in the :obj:`generate` method of the model.
|
||||||
- **top_p** (:obj:`float`, `optional`, defaults to 1) -- Value that will be used by default in the
|
- **top_p** (:obj:`float`, `optional`, defaults to 1) -- Value that will be used by default in the
|
||||||
:obj:`generate` method of the model for ``top_p``. If set to float < 1, only the most probable tokens
|
:obj:`generate` method of the model for ``top_p``. If set to float < 1, only the most probable tokens
|
||||||
with probabilities that add up to ``top_p`` or highest are kept for generation.
|
with probabilities that add up to ``top_p`` or higher are kept for generation.
|
||||||
- **repetition_penalty** (:obj:`float`, `optional`, defaults to 1) -- Parameter for repetition penalty
|
- **repetition_penalty** (:obj:`float`, `optional`, defaults to 1) -- Parameter for repetition penalty
|
||||||
that will be used by default in the :obj:`generate` method of the model. 1.0 means no penalty.
|
that will be used by default in the :obj:`generate` method of the model. 1.0 means no penalty.
|
||||||
- **length_penalty** (:obj:`float`, `optional`, defaults to 1) -- Exponential penalty to the length that
|
- **length_penalty** (:obj:`float`, `optional`, defaults to 1) -- Exponential penalty to the length that
|
||||||
|
|
|
@ -25,10 +25,15 @@ logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
class TFGenerationMixin:
|
class TFGenerationMixin:
|
||||||
"""
|
"""
|
||||||
A class contraining all of the functions supporting generation, to be used as a mixin in TFPreTrainedModel.
|
A class contraining all of the functions supporting generation, to be used as a mixin in
|
||||||
|
:class:`~transfomers.TFPreTrainedModel`.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def prepare_inputs_for_generation(self, inputs, **kwargs):
|
def prepare_inputs_for_generation(self, inputs, **kwargs):
|
||||||
|
"""
|
||||||
|
Implement in subclasses of :class:`~transfomers.TFPreTrainedModel` for custom behavior to prepare inputs in the
|
||||||
|
generate method.
|
||||||
|
"""
|
||||||
return {"inputs": inputs}
|
return {"inputs": inputs}
|
||||||
|
|
||||||
def _use_cache(self, outputs, use_cache):
|
def _use_cache(self, outputs, use_cache):
|
||||||
|
@ -62,87 +67,83 @@ class TFGenerationMixin:
|
||||||
decoder_start_token_id=None,
|
decoder_start_token_id=None,
|
||||||
use_cache=None,
|
use_cache=None,
|
||||||
):
|
):
|
||||||
r""" Generates sequences for models with a LM head. The method currently supports greedy or penalized greedy decoding, sampling with top-k or nucleus sampling
|
r"""
|
||||||
and beam-search.
|
Generates sequences for models with a language modeling head. The method currently supports greedy decoding,
|
||||||
|
beam-search decoding, sampling with temperature, sampling with top-k or nucleus sampling.
|
||||||
|
|
||||||
Adapted in part from `Facebook's XLM beam search code`_.
|
Adapted in part from `Facebook's XLM beam search code
|
||||||
|
<https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529>`__.
|
||||||
|
|
||||||
.. _`Facebook's XLM beam search code`:
|
Apart from :obj:`input_ids` and :obj:`attention_mask`, all the arguments below will default to the value of the
|
||||||
https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529
|
attribute of the same name inside the :class:`~transformers.PretrainedConfig` of the model. The default values
|
||||||
|
indicated are the default values of those config.
|
||||||
|
|
||||||
|
Most of these parameters are explained in more detail in `this blog post
|
||||||
|
<https://huggingface.co/blog/how-to-generate>`__.
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
|
|
||||||
input_ids: (`optional`) `tf.Tensor` of `dtype=tf.int32` of shape `(batch_size, sequence_length)`
|
input_ids (:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||||
The sequence used as a prompt for the generation. If `None` the method initializes
|
The sequence used as a prompt for the generation. If :obj:`None` the method initializes
|
||||||
it as an empty `tf.Tensor` of shape `(1,)`.
|
it as an empty :obj:`tf.Tensor` of shape :obj:`(1,)`.
|
||||||
|
max_length (:obj:`int`, `optional`, defaults to 20):
|
||||||
|
The maximum length of the sequence to be generated.
|
||||||
|
min_length (:obj:`int`, `optional`, defaults to 10):
|
||||||
|
The minimum length of the sequence to be generated.
|
||||||
|
do_sample (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
Whether or not to use sampling ; use greedy decoding otherwise.
|
||||||
|
early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
|
||||||
|
num_beams (:obj:`int`, `optional`, defaults to 1):
|
||||||
|
Number of beams for beam search. 1 means no beam search.
|
||||||
|
temperature (:obj:`float`, `optional`, defaults tp 1.0):
|
||||||
|
The value used to module the next token probabilities.
|
||||||
|
top_k (:obj:`int`, `optional`, defaults to 50):
|
||||||
|
The number of highest probability vocabulary tokens to keep for top-k-filtering.
|
||||||
|
top_p (:obj:`float`, `optional`, defaults to 1.0):
|
||||||
|
If set to float < 1, only the most probable tokens with probabilities that add up to ``top_p`` or
|
||||||
|
higher are kept for generation.
|
||||||
|
repetition_penalty (:obj:`float`, `optional`, defaults to 1.0):
|
||||||
|
The parameter for repetition penalty. 1.0 means no penalty. See `this paper
|
||||||
|
<https://arxiv.org/pdf/1909.05858.pdf>`__ for more details.
|
||||||
|
pad_token_id (:obj:`int`, `optional`):
|
||||||
|
The id of the `padding` token.
|
||||||
|
bos_token_id (:obj:`int`, `optional`):
|
||||||
|
The id of the `beginning-of-sequence` token.
|
||||||
|
eos_token_id (:obj:`int`, `optional`):
|
||||||
|
The id of the `end-of-sequence` token.
|
||||||
|
length_penalty (:obj:`float`, `optional`, defaults to 1.0):
|
||||||
|
Exponential penalty to the length. 1.0 means no penalty.
|
||||||
|
|
||||||
max_length: (`optional`) int
|
Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in
|
||||||
The max length of the sequence to be generated. Between 1 and infinity. Default to 20.
|
order to encourage the model to produce longer sequences.
|
||||||
|
no_repeat_ngram_size (:obj:`int`, `optional`, defaults to 0):
|
||||||
|
If set to int > 0, all ngrams of that size can only occur once.
|
||||||
|
bad_words_ids(:obj:`List[int]`, `optional`):
|
||||||
|
List of token ids that are not allowed to be generated. In order to get the tokens of the words that
|
||||||
|
should not appear in the generated text, use :obj:`tokenizer.encode(bad_word, add_prefix_space=True)`.
|
||||||
|
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
|
||||||
|
The number of independently computed returned sequences for each element in the batch.
|
||||||
|
attention_mask (:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||||
|
Mask to avoid performing attention on padding token indices. Mask values are in ``[0, 1]``, 1 for
|
||||||
|
tokens that are not masked, and 0 for masked tokens.
|
||||||
|
|
||||||
min_length: (`optional`) int
|
If not provided, will default to a tensor the same shape as :obj:`input_ids` that masks the pad token.
|
||||||
The min length of the sequence to be generated. Between 0 and infinity. Default to 0.
|
|
||||||
do_sample: (`optional`) bool
|
|
||||||
If set to `False` greedy decoding is used. Otherwise sampling is used. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`.
|
|
||||||
|
|
||||||
early_stopping: (`optional`) bool
|
|
||||||
if set to `True` beam search is stopped when at least `num_beams` sentences finished per batch. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`.
|
|
||||||
|
|
||||||
num_beams: (`optional`) int
|
|
||||||
Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1.
|
|
||||||
|
|
||||||
temperature: (`optional`) float
|
|
||||||
The value used to module the next token probabilities. Must be strictely positive. Default to 1.0.
|
|
||||||
|
|
||||||
top_k: (`optional`) int
|
|
||||||
The number of highest probability vocabulary tokens to keep for top-k-filtering. Between 1 and infinity. Default to 50.
|
|
||||||
|
|
||||||
top_p: (`optional`) float
|
|
||||||
The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling. Must be between 0 and 1. Default to 1.
|
|
||||||
|
|
||||||
repetition_penalty: (`optional`) float
|
|
||||||
The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0.
|
|
||||||
|
|
||||||
bos_token_id: (`optional`) int
|
|
||||||
Beginning of sentence token if no prompt is provided. Default to specicic model bos_token_id or None if it does not exist.
|
|
||||||
|
|
||||||
pad_token_id: (`optional`) int
|
|
||||||
Pad token. Defaults to pad_token_id as defined in the models config.
|
|
||||||
|
|
||||||
eos_token_id: (`optional`) int
|
|
||||||
EOS token. Defaults to eos_token_id as defined in the models config.
|
|
||||||
|
|
||||||
length_penalty: (`optional`) float
|
|
||||||
Exponential penalty to the length. Default to 1.
|
|
||||||
|
|
||||||
no_repeat_ngram_size: (`optional`) int
|
|
||||||
If set to int > 0, all ngrams of size `no_repeat_ngram_size` can only occur once.
|
|
||||||
|
|
||||||
bad_words_ids: (`optional`) list of lists of int
|
|
||||||
`bad_words_ids` contains tokens that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use `tokenizer.encode(bad_word, add_prefix_space=True)`.
|
|
||||||
|
|
||||||
num_return_sequences: (`optional`) int
|
|
||||||
The number of independently computed returned sequences for each element in the batch. Default to 1.
|
|
||||||
|
|
||||||
attention_mask (`optional`) obj: `tf.Tensor` with `dtype=tf.int32` of same shape as `input_ids`
|
|
||||||
Mask to avoid performing attention on padding token indices.
|
|
||||||
Mask values selected in ``[0, 1]``:
|
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
Defaults to `None`.
|
|
||||||
|
|
||||||
`What are attention masks? <../glossary.html#attention-mask>`__
|
`What are attention masks? <../glossary.html#attention-mask>`__
|
||||||
|
decoder_start_token_id (:obj:`int`, `optional`):
|
||||||
decoder_start_token_id=None: (`optional`) int
|
If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
|
||||||
If an encoder-decoder model starts decoding with a different token than BOS.
|
use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
Defaults to `None` and is changed to `BOS` later.
|
Whether or not the model should use the past last key/values attentions (if applicable to the model) to
|
||||||
|
speed up decoding.
|
||||||
use_cache: (`optional`) bool
|
model_specific_kwargs:
|
||||||
If `use_cache` is True, past key values are used to speed up decoding if applicable to model. Defaults to `True`.
|
Additional model specific kwargs will be forwarded to the :obj:`forward` function of the model.
|
||||||
|
|
||||||
Return:
|
Return:
|
||||||
|
|
||||||
output: `tf.Tensor` of `dtype=tf.int32` shape `(batch_size * num_return_sequences, sequence_length)`
|
:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
|
||||||
sequence_length is either equal to max_length or shorter if all batches finished early due to the `eos_token_id`
|
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
|
||||||
|
shorter if all batches finished early due to the :obj:`eos_token_id`.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
|
|
|
@ -27,13 +27,22 @@ logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
class GenerationMixin:
|
class GenerationMixin:
|
||||||
"""
|
"""
|
||||||
A class contraining all of the functions supporting generation, to be used as a mixin in PreTrainedModel.
|
A class contraining all of the functions supporting generation, to be used as a mixin in
|
||||||
|
:class:`~transfomers.PreTrainedModel`.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def prepare_inputs_for_generation(self, input_ids, **kwargs):
|
def prepare_inputs_for_generation(self, input_ids, **kwargs):
|
||||||
|
"""
|
||||||
|
Implement in subclasses of :class:`~transfomers.PreTrainedModel` for custom behavior to prepare inputs in the
|
||||||
|
generate method.
|
||||||
|
"""
|
||||||
return {"input_ids": input_ids}
|
return {"input_ids": input_ids}
|
||||||
|
|
||||||
def adjust_logits_during_generation(self, logits, **kwargs):
|
def adjust_logits_during_generation(self, logits, **kwargs):
|
||||||
|
"""
|
||||||
|
Implement in subclasses of :class:`~transfomers.PreTrainedModel` for custom behavior to adjust the logits in
|
||||||
|
the generate method.
|
||||||
|
"""
|
||||||
return logits
|
return logits
|
||||||
|
|
||||||
def _use_cache(self, outputs, use_cache):
|
def _use_cache(self, outputs, use_cache):
|
||||||
|
@ -45,7 +54,9 @@ class GenerationMixin:
|
||||||
return True
|
return True
|
||||||
|
|
||||||
def enforce_repetition_penalty_(self, lprobs, batch_size, num_beams, prev_output_tokens, repetition_penalty):
|
def enforce_repetition_penalty_(self, lprobs, batch_size, num_beams, prev_output_tokens, repetition_penalty):
|
||||||
"""repetition penalty (from CTRL paper https://arxiv.org/abs/1909.05858). """
|
"""
|
||||||
|
Enforce the repetition penalty (from the `CTRL paper <https://arxiv.org/abs/1909.05858>`__).
|
||||||
|
"""
|
||||||
for i in range(batch_size * num_beams):
|
for i in range(batch_size * num_beams):
|
||||||
for previous_token in set(prev_output_tokens[i].tolist()):
|
for previous_token in set(prev_output_tokens[i].tolist()):
|
||||||
# if score < 0 then repetition penalty has to multiplied to reduce the previous token probability
|
# if score < 0 then repetition penalty has to multiplied to reduce the previous token probability
|
||||||
|
@ -123,89 +134,83 @@ class GenerationMixin:
|
||||||
use_cache: Optional[bool] = None,
|
use_cache: Optional[bool] = None,
|
||||||
**model_specific_kwargs
|
**model_specific_kwargs
|
||||||
) -> torch.LongTensor:
|
) -> torch.LongTensor:
|
||||||
r""" Generates sequences for models with a LM head. The method currently supports greedy decoding, beam-search decoding, sampling with temperature, sampling with top-k or nucleus sampling.
|
r"""
|
||||||
|
Generates sequences for models with a language modeling head. The method currently supports greedy decoding,
|
||||||
|
beam-search decoding, sampling with temperature, sampling with top-k or nucleus sampling.
|
||||||
|
|
||||||
Adapted in part from `Facebook's XLM beam search code`_.
|
Adapted in part from `Facebook's XLM beam search code
|
||||||
|
<https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529>`__.
|
||||||
|
|
||||||
.. _`Facebook's XLM beam search code`:
|
Apart from :obj:`input_ids` and :obj:`attention_mask`, all the arguments below will default to the value of the
|
||||||
https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529
|
attribute of the same name inside the :class:`~transformers.PretrainedConfig` of the model. The default values
|
||||||
|
indicated are the default values of those config.
|
||||||
|
|
||||||
|
Most of these parameters are explained in more detail in `this blog post
|
||||||
|
<https://huggingface.co/blog/how-to-generate>`__.
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
|
|
||||||
input_ids: (`optional`) `torch.LongTensor` of shape `(batch_size, sequence_length)`
|
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||||
The sequence used as a prompt for the generation. If `None` the method initializes
|
The sequence used as a prompt for the generation. If :obj:`None` the method initializes
|
||||||
it as an empty `torch.LongTensor` of shape `(1,)`.
|
it as an empty :obj:`torch.LongTensor` of shape :obj:`(1,)`.
|
||||||
|
max_length (:obj:`int`, `optional`, defaults to 20):
|
||||||
|
The maximum length of the sequence to be generated.
|
||||||
|
min_length (:obj:`int`, `optional`, defaults to 10):
|
||||||
|
The minimum length of the sequence to be generated.
|
||||||
|
do_sample (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
Whether or not to use sampling ; use greedy decoding otherwise.
|
||||||
|
early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
|
||||||
|
num_beams (:obj:`int`, `optional`, defaults to 1):
|
||||||
|
Number of beams for beam search. 1 means no beam search.
|
||||||
|
temperature (:obj:`float`, `optional`, defaults tp 1.0):
|
||||||
|
The value used to module the next token probabilities.
|
||||||
|
top_k (:obj:`int`, `optional`, defaults to 50):
|
||||||
|
The number of highest probability vocabulary tokens to keep for top-k-filtering.
|
||||||
|
top_p (:obj:`float`, `optional`, defaults to 1.0):
|
||||||
|
If set to float < 1, only the most probable tokens with probabilities that add up to ``top_p`` or
|
||||||
|
higher are kept for generation.
|
||||||
|
repetition_penalty (:obj:`float`, `optional`, defaults to 1.0):
|
||||||
|
The parameter for repetition penalty. 1.0 means no penalty. See `this paper
|
||||||
|
<https://arxiv.org/pdf/1909.05858.pdf>`__ for more details.
|
||||||
|
pad_token_id (:obj:`int`, `optional`):
|
||||||
|
The id of the `padding` token.
|
||||||
|
bos_token_id (:obj:`int`, `optional`):
|
||||||
|
The id of the `beginning-of-sequence` token.
|
||||||
|
eos_token_id (:obj:`int`, `optional`):
|
||||||
|
The id of the `end-of-sequence` token.
|
||||||
|
length_penalty (:obj:`float`, `optional`, defaults to 1.0):
|
||||||
|
Exponential penalty to the length. 1.0 means no penalty.
|
||||||
|
|
||||||
max_length: (`optional`) int
|
Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in
|
||||||
The max length of the sequence to be generated. Between `min_length` and infinity. Default to 20.
|
order to encourage the model to produce longer sequences.
|
||||||
|
no_repeat_ngram_size (:obj:`int`, `optional`, defaults to 0):
|
||||||
|
If set to int > 0, all ngrams of that size can only occur once.
|
||||||
|
bad_words_ids(:obj:`List[int]`, `optional`):
|
||||||
|
List of token ids that are not allowed to be generated. In order to get the tokens of the words that
|
||||||
|
should not appear in the generated text, use :obj:`tokenizer.encode(bad_word, add_prefix_space=True)`.
|
||||||
|
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
|
||||||
|
The number of independently computed returned sequences for each element in the batch.
|
||||||
|
attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||||
|
Mask to avoid performing attention on padding token indices. Mask values are in ``[0, 1]``, 1 for
|
||||||
|
tokens that are not masked, and 0 for masked tokens.
|
||||||
|
|
||||||
min_length: (`optional`) int
|
If not provided, will default to a tensor the same shape as :obj:`input_ids` that masks the pad token.
|
||||||
The min length of the sequence to be generated. Between 0 and infinity. Default to 0.
|
|
||||||
|
|
||||||
do_sample: (`optional`) bool
|
|
||||||
If set to `False` greedy decoding is used. Otherwise sampling is used. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`.
|
|
||||||
|
|
||||||
early_stopping: (`optional`) bool
|
|
||||||
if set to `True` beam search is stopped when at least `num_beams` sentences finished per batch. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`.
|
|
||||||
|
|
||||||
num_beams: (`optional`) int
|
|
||||||
Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1.
|
|
||||||
|
|
||||||
temperature: (`optional`) float
|
|
||||||
The value used to module the next token probabilities. Must be strictly positive. Default to 1.0.
|
|
||||||
|
|
||||||
top_k: (`optional`) int
|
|
||||||
The number of highest probability vocabulary tokens to keep for top-k-filtering. Between 1 and infinity. Default to 50.
|
|
||||||
|
|
||||||
top_p: (`optional`) float
|
|
||||||
The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling. Must be between 0 and 1. Default to 1.
|
|
||||||
|
|
||||||
repetition_penalty: (`optional`) float
|
|
||||||
The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0.
|
|
||||||
|
|
||||||
pad_token_id: (`optional`) int
|
|
||||||
Padding token. Default to specicic model pad_token_id or None if it does not exist.
|
|
||||||
|
|
||||||
bos_token_id: (`optional`) int
|
|
||||||
BOS token. Defaults to `bos_token_id` as defined in the models config.
|
|
||||||
|
|
||||||
eos_token_id: (`optional`) int
|
|
||||||
EOS token. Defaults to `eos_token_id` as defined in the models config.
|
|
||||||
|
|
||||||
length_penalty: (`optional`) float
|
|
||||||
Exponential penalty to the length. Default to 1.
|
|
||||||
|
|
||||||
no_repeat_ngram_size: (`optional`) int
|
|
||||||
If set to int > 0, all ngrams of size `no_repeat_ngram_size` can only occur once.
|
|
||||||
bad_words_ids: (`optional`) list of lists of int
|
|
||||||
`bad_words_ids` contains tokens that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use `tokenizer.encode(bad_word, add_prefix_space=True)`.
|
|
||||||
|
|
||||||
num_return_sequences: (`optional`) int
|
|
||||||
The number of independently computed returned sequences for each element in the batch. Default to 1.
|
|
||||||
|
|
||||||
attention_mask (`optional`) obj: `torch.LongTensor` of same shape as `input_ids`
|
|
||||||
Mask to avoid performing attention on padding token indices.
|
|
||||||
Mask values selected in ``[0, 1]``:
|
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
Defaults to `None`.
|
|
||||||
|
|
||||||
`What are attention masks? <../glossary.html#attention-mask>`__
|
`What are attention masks? <../glossary.html#attention-mask>`__
|
||||||
|
decoder_start_token_id (:obj:`int`, `optional`):
|
||||||
decoder_start_token_id=None: (`optional`) int
|
If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
|
||||||
If an encoder-decoder model starts decoding with a different token than BOS.
|
use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
Defaults to `None` and is changed to `BOS` later.
|
Whether or not the model should use the past last key/values attentions (if applicable to the model) to
|
||||||
|
speed up decoding.
|
||||||
use_cache: (`optional`) bool
|
model_specific_kwargs:
|
||||||
If `use_cache` is True, past key values are used to speed up decoding if applicable to model. Defaults to `True`.
|
Additional model specific kwargs will be forwarded to the :obj:`forward` function of the model.
|
||||||
|
|
||||||
model_specific_kwargs: (`optional`) dict
|
|
||||||
Additional model specific kwargs will be forwarded to the `forward` function of the model.
|
|
||||||
|
|
||||||
Return:
|
Return:
|
||||||
|
|
||||||
output: `torch.LongTensor` of shape `(batch_size * num_return_sequences, sequence_length)`
|
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
|
||||||
sequence_length is either equal to max_length or shorter if all batches finished early due to the `eos_token_id`
|
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
|
||||||
|
shorter if all batches finished early due to the :obj:`eos_token_id`.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue