From 29a392fbcfd8bae54bcf732d7a6b508839900458 Mon Sep 17 00:00:00 2001 From: Matthew Carrigan Date: Wed, 20 Mar 2019 17:35:17 +0000 Subject: [PATCH] Small README changes --- examples/lm_finetuning/README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/examples/lm_finetuning/README.md b/examples/lm_finetuning/README.md index d54007734f..1440a78a3b 100644 --- a/examples/lm_finetuning/README.md +++ b/examples/lm_finetuning/README.md @@ -51,9 +51,10 @@ by `pregenerate_training_data.py`. Note that you should use the same bert_model Also note that max_seq_len does not need to be specified for the `finetune_on_pregenerated.py` script, as it is inferred from the training examples. -There are various options that can be tweaked, but the most important ones are probably `max_seq_len`, which controls -the length of training examples (in wordpiece tokens) seen by the model, and `--fp16`, which enables fast half-precision -training on recent GPUs. `max_seq_len` defaults to 128 but can be set as high as 512. +There are various options that can be tweaked, but they are mostly set to the values from the BERT paper/repo and should +be left alone. The most relevant ones for the end-user are probably `--max_seq_len`, which controls the length of +training examples (in wordpiece tokens) seen by the model, and `--fp16`, which enables fast half-precision training on +recent GPUs. `--max_seq_len` defaults to 128 but can be set as high as 512. Higher values may yield stronger language models at the cost of slower and more memory-intensive training In addition, if memory usage is an issue, especially when training on a single GPU, reducing `--train_batch_size` from