Pretrained models

2019-11-26 14:52:42 -05:00 · 2019-11-26 14:52:42 -05:00 · 668aac45d2
parent 8742baa531
commit 668aac45d2
1 changed files with 33 additions and 0 deletions
--- a/docs/source/pretrained_models.rst
+++ b/docs/source/pretrained_models.rst
@ -159,5 +159,38 @@ Here is the full list of the currently provided pretrained models together with
 |                   |                                                            | | CamemBERT using the BERT-base architecture                                                                                          |
 |                   |                                                            | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__)                                                 |
 +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+| ALBERT            | ``albert-base-v1``                                         | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters                                                            |
+|                   |                                                            | | ALBERT base model                                                                                                                   |
+|                   |                                                            | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__)                                             |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``albert-large-v1``                                        | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters                                                           |
+|                   |                                                            | | ALBERT large model                                                                                                                  |
+|                   |                                                            | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__)                                             |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``albert-xlarge-v1``                                       | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters                                                           |
+|                   |                                                            | | ALBERT xlarge model                                                                                                                 |
+|                   |                                                            | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__)                                             |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``albert-xxlarge-v1``                                      | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters                                                           |
+|                   |                                                            | | ALBERT xxlarge model                                                                                                                |
+|                   |                                                            | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__)                                             |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``albert-base-v2``                                         | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters                                                            |
+|                   |                                                            | | ALBERT base model with no dropout, additional training data and longer training                                                     |
+|                   |                                                            | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__)                                             |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``albert-large-v2``                                        | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters                                                           |
+|                   |                                                            | | ALBERT large model with no dropout, additional training data and longer training                                                    |
+|                   |                                                            | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__)                                             |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``albert-xlarge-v2``                                       | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters                                                           |
+|                   |                                                            | | ALBERT xlarge model with no dropout, additional training data and longer training                                                   |
+|                   |                                                            | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__)                                             |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``albert-xxlarge-v2``                                      | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters                                                           |
+|                   |                                                            | | ALBERT xxlarge model with no dropout, additional training data and longer training                                                  |
+|                   |                                                            | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__)                                             |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+

 .. <https://huggingface.co/transformers/examples.html>`__