Small README changes

This commit is contained in:
Matthew Carrigan 2019-03-20 17:35:17 +00:00
parent 832b2b0058
commit 29a392fbcf
1 changed files with 4 additions and 3 deletions

View File

@ -51,9 +51,10 @@ by `pregenerate_training_data.py`. Note that you should use the same bert_model
Also note that max_seq_len does not need to be specified for the `finetune_on_pregenerated.py` script,
as it is inferred from the training examples.
There are various options that can be tweaked, but the most important ones are probably `max_seq_len`, which controls
the length of training examples (in wordpiece tokens) seen by the model, and `--fp16`, which enables fast half-precision
training on recent GPUs. `max_seq_len` defaults to 128 but can be set as high as 512.
There are various options that can be tweaked, but they are mostly set to the values from the BERT paper/repo and should
be left alone. The most relevant ones for the end-user are probably `--max_seq_len`, which controls the length of
training examples (in wordpiece tokens) seen by the model, and `--fp16`, which enables fast half-precision training on
recent GPUs. `--max_seq_len` defaults to 128 but can be set as high as 512.
Higher values may yield stronger language models at the cost of slower and more memory-intensive training
In addition, if memory usage is an issue, especially when training on a single GPU, reducing `--train_batch_size` from