2.2 KiB
Data Collator
Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of
the same type as the elements of train_dataset
or eval_dataset
.
To be able to build batches, data collators may apply some processing (like padding). Some of them (like
[DataCollatorForLanguageModeling
]) also apply some random data augmentation (like random masking)
on the formed batch.
Examples of use can be found in the example scripts or example notebooks.
Default data collator
autodoc data.data_collator.default_data_collator
DefaultDataCollator
autodoc data.data_collator.DefaultDataCollator
DataCollatorWithPadding
autodoc data.data_collator.DataCollatorWithPadding
DataCollatorForTokenClassification
autodoc data.data_collator.DataCollatorForTokenClassification
DataCollatorForSeq2Seq
autodoc data.data_collator.DataCollatorForSeq2Seq
DataCollatorForLanguageModeling
autodoc data.data_collator.DataCollatorForLanguageModeling - numpy_mask_tokens - tf_mask_tokens - torch_mask_tokens
DataCollatorForWholeWordMask
autodoc data.data_collator.DataCollatorForWholeWordMask - numpy_mask_tokens - tf_mask_tokens - torch_mask_tokens
DataCollatorForPermutationLanguageModeling
autodoc data.data_collator.DataCollatorForPermutationLanguageModeling - numpy_mask_tokens - tf_mask_tokens - torch_mask_tokens