[docstring] Fix bert generation tokenizer (#26820)
* Remove BertGenerationTokenizer from objects to ignore The file BertGenerationTokenizer is removed from objects to ignore as a first step to fix docstring. * Docstrings fix for BertGenerationTokenizer Docstring fix is generated for BertGenerationTokenizer by using check_docstrings.py. * Fix docstring for BertGenerationTokenizer Added sep_token type and docstring in BertGenerationTokenizer.
This commit is contained in:
parent
12cc123359
commit
5c6b83cb69
|
@ -51,15 +51,19 @@ class BertGenerationTokenizer(PreTrainedTokenizer):
|
||||||
vocab_file (`str`):
|
vocab_file (`str`):
|
||||||
[SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that
|
[SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that
|
||||||
contains the vocabulary necessary to instantiate a tokenizer.
|
contains the vocabulary necessary to instantiate a tokenizer.
|
||||||
eos_token (`str`, *optional*, defaults to `"</s>"`):
|
|
||||||
The end of sequence token.
|
|
||||||
bos_token (`str`, *optional*, defaults to `"<s>"`):
|
bos_token (`str`, *optional*, defaults to `"<s>"`):
|
||||||
The begin of sequence token.
|
The begin of sequence token.
|
||||||
|
eos_token (`str`, *optional*, defaults to `"</s>"`):
|
||||||
|
The end of sequence token.
|
||||||
unk_token (`str`, *optional*, defaults to `"<unk>"`):
|
unk_token (`str`, *optional*, defaults to `"<unk>"`):
|
||||||
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
|
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
|
||||||
token instead.
|
token instead.
|
||||||
pad_token (`str`, *optional*, defaults to `"<pad>"`):
|
pad_token (`str`, *optional*, defaults to `"<pad>"`):
|
||||||
The token used for padding, for example when batching sequences of different lengths.
|
The token used for padding, for example when batching sequences of different lengths.
|
||||||
|
sep_token (`str`, *optional*, defaults to `"<::::>"`):
|
||||||
|
The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
|
||||||
|
sequence classification or for a text and a question for question answering. It is also used as the last
|
||||||
|
token of a sequence built with special tokens.
|
||||||
sp_model_kwargs (`dict`, *optional*):
|
sp_model_kwargs (`dict`, *optional*):
|
||||||
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
|
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
|
||||||
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
|
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
|
||||||
|
|
|
@ -94,7 +94,6 @@ OBJECTS_TO_IGNORE = [
|
||||||
"BarthezTokenizerFast",
|
"BarthezTokenizerFast",
|
||||||
"BeitModel",
|
"BeitModel",
|
||||||
"BertConfig",
|
"BertConfig",
|
||||||
"BertGenerationTokenizer",
|
|
||||||
"BertJapaneseTokenizer",
|
"BertJapaneseTokenizer",
|
||||||
"BertModel",
|
"BertModel",
|
||||||
"BertTokenizerFast",
|
"BertTokenizerFast",
|
||||||
|
|
Loading…
Reference in New Issue