9.1 KiB
Utilities for Generation
This page lists all the utility functions used by [~generation.GenerationMixin.generate
].
Generate Outputs
The output of [~generation.GenerationMixin.generate
] is an instance of a subclass of
[~utils.ModelOutput
]. This output is a data structure containing all the information returned
by [~generation.GenerationMixin.generate
], but that can also be used as tuple or dictionary.
Here's an example:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The generation_output
object is a [~generation.GenerateDecoderOnlyOutput
], as we can
see in the documentation of that class below, it means it has the following attributes:
sequences
: the generated sequences of tokensscores
(optional): the prediction scores of the language modelling head, for each generation stephidden_states
(optional): the hidden states of the model, for each generation stepattentions
(optional): the attention weights of the model, for each generation step
Here we have the scores
since we passed along output_scores=True
, but we don't have hidden_states
and
attentions
because we didn't pass output_hidden_states=True
or output_attentions=True
.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get None
. Here for instance generation_output.scores
are all the generated prediction scores of the
language modeling head, and generation_output.attentions
is None
.
When using our generation_output
object as a tuple, it only keeps the attributes that don't have None
values.
Here, for instance, it has two elements, loss
then logits
, so
generation_output[:2]
will return the tuple (generation_output.sequences, generation_output.scores)
for instance.
When using our generation_output
object as a dictionary, it only keeps the attributes that don't have None
values. Here, for instance, it has two keys that are sequences
and scores
.
We document here all output types.
PyTorch
autodoc generation.GenerateDecoderOnlyOutput
autodoc generation.GenerateEncoderDecoderOutput
autodoc generation.GenerateBeamDecoderOnlyOutput
autodoc generation.GenerateBeamEncoderDecoderOutput
TensorFlow
autodoc generation.TFGreedySearchEncoderDecoderOutput
autodoc generation.TFGreedySearchDecoderOnlyOutput
autodoc generation.TFSampleEncoderDecoderOutput
autodoc generation.TFSampleDecoderOnlyOutput
autodoc generation.TFBeamSearchEncoderDecoderOutput
autodoc generation.TFBeamSearchDecoderOnlyOutput
autodoc generation.TFBeamSampleEncoderDecoderOutput
autodoc generation.TFBeamSampleDecoderOnlyOutput
autodoc generation.TFContrastiveSearchEncoderDecoderOutput
autodoc generation.TFContrastiveSearchDecoderOnlyOutput
FLAX
autodoc generation.FlaxSampleOutput
autodoc generation.FlaxGreedySearchOutput
autodoc generation.FlaxBeamSearchOutput
LogitsProcessor
A [LogitsProcessor
] can be used to modify the prediction scores of a language model head for
generation.
PyTorch
autodoc AlternatingCodebooksLogitsProcessor - call
autodoc ClassifierFreeGuidanceLogitsProcessor - call
autodoc EncoderNoRepeatNGramLogitsProcessor - call
autodoc EncoderRepetitionPenaltyLogitsProcessor - call
autodoc EpsilonLogitsWarper - call
autodoc EtaLogitsWarper - call
autodoc ExponentialDecayLengthPenalty - call
autodoc ForcedBOSTokenLogitsProcessor - call
autodoc ForcedEOSTokenLogitsProcessor - call
autodoc ForceTokensLogitsProcessor - call
autodoc HammingDiversityLogitsProcessor - call
autodoc InfNanRemoveLogitsProcessor - call
autodoc LogitNormalization - call
autodoc LogitsProcessor - call
autodoc LogitsProcessorList - call
autodoc LogitsWarper - call
autodoc MinLengthLogitsProcessor - call
autodoc MinNewTokensLengthLogitsProcessor - call
autodoc MinPLogitsWarper - call
autodoc NoBadWordsLogitsProcessor - call
autodoc NoRepeatNGramLogitsProcessor - call
autodoc PrefixConstrainedLogitsProcessor - call
autodoc RepetitionPenaltyLogitsProcessor - call
autodoc SequenceBiasLogitsProcessor - call
autodoc SuppressTokensAtBeginLogitsProcessor - call
autodoc SuppressTokensLogitsProcessor - call
autodoc TemperatureLogitsWarper - call
autodoc TopKLogitsWarper - call
autodoc TopPLogitsWarper - call
autodoc TypicalLogitsWarper - call
autodoc UnbatchedClassifierFreeGuidanceLogitsProcessor - call
autodoc WhisperTimeStampLogitsProcessor - call
autodoc WatermarkLogitsProcessor - call
TensorFlow
autodoc TFForcedBOSTokenLogitsProcessor - call
autodoc TFForcedEOSTokenLogitsProcessor - call
autodoc TFForceTokensLogitsProcessor - call
autodoc TFLogitsProcessor - call
autodoc TFLogitsProcessorList - call
autodoc TFLogitsWarper - call
autodoc TFMinLengthLogitsProcessor - call
autodoc TFNoBadWordsLogitsProcessor - call
autodoc TFNoRepeatNGramLogitsProcessor - call
autodoc TFRepetitionPenaltyLogitsProcessor - call
autodoc TFSuppressTokensAtBeginLogitsProcessor - call
autodoc TFSuppressTokensLogitsProcessor - call
autodoc TFTemperatureLogitsWarper - call
autodoc TFTopKLogitsWarper - call
autodoc TFTopPLogitsWarper - call
FLAX
autodoc FlaxForcedBOSTokenLogitsProcessor - call
autodoc FlaxForcedEOSTokenLogitsProcessor - call
autodoc FlaxForceTokensLogitsProcessor - call
autodoc FlaxLogitsProcessor - call
autodoc FlaxLogitsProcessorList - call
autodoc FlaxLogitsWarper - call
autodoc FlaxMinLengthLogitsProcessor - call
autodoc FlaxSuppressTokensAtBeginLogitsProcessor - call
autodoc FlaxSuppressTokensLogitsProcessor - call
autodoc FlaxTemperatureLogitsWarper - call
autodoc FlaxTopKLogitsWarper - call
autodoc FlaxTopPLogitsWarper - call
autodoc FlaxWhisperTimeStampLogitsProcessor - call
StoppingCriteria
A [StoppingCriteria
] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.
autodoc StoppingCriteria - call
autodoc StoppingCriteriaList - call
autodoc MaxLengthCriteria - call
autodoc MaxTimeCriteria - call
autodoc StopStringCriteria - call
autodoc EosTokenCriteria - call
Constraints
A [Constraint
] can be used to force the generation to include specific tokens or sequences in the output. Please note that this is exclusively available to our PyTorch implementations.
autodoc Constraint
autodoc PhrasalConstraint
autodoc DisjunctiveConstraint
autodoc ConstraintListState
BeamSearch
autodoc BeamScorer - process - finalize
autodoc BeamSearchScorer - process - finalize
autodoc ConstrainedBeamSearchScorer - process - finalize
Streamers
autodoc TextStreamer
autodoc TextIteratorStreamer
Caches
autodoc Cache - update
autodoc CacheConfig - update
autodoc QuantizedCacheConfig - validate
autodoc DynamicCache - update - get_seq_length - reorder_cache - to_legacy_cache - from_legacy_cache
autodoc QuantizedCache - update - get_seq_length
autodoc QuantoQuantizedCache
autodoc HQQQuantizedCache
autodoc SinkCache - update - get_seq_length - reorder_cache
autodoc StaticCache - update - get_seq_length - reset
Watermark Utils
autodoc WatermarkDetector - call