transformers/docs/source/en/internal/tokenization_utils.md

1.5 KiB

Utilities for Tokenizers

This page lists all the utility functions used by the tokenizers, mainly the class [~tokenization_utils_base.PreTrainedTokenizerBase] that implements the common methods between [PreTrainedTokenizer] and [PreTrainedTokenizerFast] and the mixin [~tokenization_utils_base.SpecialTokensMixin].

Most of those are only useful if you are studying the code of the tokenizers in the library.

PreTrainedTokenizerBase

autodoc tokenization_utils_base.PreTrainedTokenizerBase - call - all

SpecialTokensMixin

autodoc tokenization_utils_base.SpecialTokensMixin

Enums and namedtuples

autodoc tokenization_utils_base.TruncationStrategy

autodoc tokenization_utils_base.CharSpan

autodoc tokenization_utils_base.TokenSpan