* add dataset for albert pretrain
* datacollator for albert pretrain
* naming, comprehension, file reading change
* data cleaning is no needed after this modification
* delete prints
* fix a bug
* file structure change
* add tests for albert datacollator
* remove random seed
* add back len and get item function
* sample file for testing and test code added
* format change for black
* more format change
* Style
* var assignment issue resolve
* add back wrongly deleted DataCollatorWithPadding in init file
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>