From 1e00ef681d213938cfafd678b9ec11c786405bbf Mon Sep 17 00:00:00 2001 From: Sam Shleifer Date: Mon, 27 Jul 2020 18:26:00 -0400 Subject: [PATCH] [s2s] dont document packing because it hurts performance (#6077) --- examples/seq2seq/README.md | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/examples/seq2seq/README.md b/examples/seq2seq/README.md index ed24f59394..5029f38361 100644 --- a/examples/seq2seq/README.md +++ b/examples/seq2seq/README.md @@ -27,17 +27,7 @@ this should make a directory called `cnn_dm/` with files like `test.source`. ``` WMT16 English-Romanian Translation Data: - -This dataset comes in two formats. The "packed" version merges short training examples into examples of <200 tokens to increase GPU utilization (and also improves validation performance). - -```bash -cd examples/seq2seq -wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro_packed_train_200.tgz -tar -xzvf wmt_en_ro_packed_200.tgz -export ENRO_DIR=wmt_en_ro_packed_train_200 -``` - -The original data can also be downloaded with this command: +download with this command: ```bash wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz tar -xzvf wmt_en_ro.tar.gz