diff --git a/examples/distillation/README.md b/examples/distillation/README.md index 344b5f7d46..7da1ad015b 100644 --- a/examples/distillation/README.md +++ b/examples/distillation/README.md @@ -26,12 +26,14 @@ Here are the results on the dev sets of GLUE: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| :---:| :---: | | BERT-base | **77.6** | 48.9 | 84.3 | 88.6 | 89.3 | 89.5 | 71.3 | 91.7 | 91.2 | 43.7 | | DistilBERT | **76.8** | 49.1 | 81.8 | 90.2 | 90.2 | 89.2 | 62.9 | 92.7 | 90.7 | 44.4 | -| :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| :---:| :---: | -| RoBERTa-base (reported) | **83.2**/**86.4**2 | 63.6 | 87.6 | 90.2 | 92.8 | 91.9 | 78.7 | 94.8 | 91.2 | 57.7** | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| RoBERTa-base (reported) | **83.2**/**86.4**2 | 63.6 | 87.6 | 90.2 | 92.8 | 91.9 | 78.7 | 94.8 | 91.2 | 57.73 | | DistilRoBERTa1 | **79.0**/**82.3**2 | 59.4 | 83.9 | 86.6 | 90.8 | 89.4 | 67.9 | 92.5 | 88.3 | 52.1 | -1 We did not use the MNLI checkpoint for fine-tuning but directy perform transfer learning on the pre-trained DistilRoBERTa. +1 We did not use the MNLI checkpoint for fine-tuning but directy perform transfer learning on the pre-trained DistilRoBERTa. + 2 Macro-score computed without WNLI. + 3 We compute this score ourselves for completeness. ## Setup