Go to file
Ubuntu f18ae210e1 fix typo 2018-11-03 22:34:37 +00:00
tensorflow_code Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT 2018-11-03 17:38:17 +01:00
.gitignore switch to full google code 2018-10-31 18:46:03 +01:00
CONTRIBUTING.md switch to full google code 2018-10-31 18:46:03 +01:00
Comparing TF and PT models.ipynb Update the comparison notebook 2018-11-03 09:08:05 -04:00
LICENSE switch to full google code 2018-10-31 18:46:03 +01:00
README.md Update README.md 2018-11-03 09:18:44 -04:00
__init__.py switch to full google code 2018-10-31 18:46:03 +01:00
convert_tf_checkpoint_to_pytorch.py update name 2018-11-02 01:56:25 +01:00
convert_tf_checkpoint_to_pytorch_special_edition.py special edition script 2018-11-03 19:06:15 +01:00
extract_features_pytorch.py Create DataParallel model if several GPUs 2018-11-03 10:10:01 -04:00
input.txt adding jupyter, updating extract features adding simple test file 2018-11-02 14:25:21 +01:00
modeling_pytorch.py fix typo 2018-11-03 22:34:37 +00:00
modeling_test_pytorch.py WIP modeling_test_pytorch.py 2018-11-03 22:40:50 +01:00
optimization_pytorch.py fixing optimization 2018-11-03 17:38:15 +01:00
optimization_test_pytorch.py fixing optimization 2018-11-03 17:38:15 +01:00
requirements.txt fix optimization_test 2018-11-03 12:23:00 +01:00
run_classifier_pytorch.py fix typo 2018-11-03 17:10:23 -04:00
run_squad_pytorch.py Multi-Gpu loss - Cleaning 2018-11-03 18:03:17 -04:00
sample_text.txt switch to full google code 2018-10-31 18:46:03 +01:00
tokenization_pytorch.py Remove TensorFlow dependence in `tokenization_pytorch.py` 2018-11-01 15:16:00 +01:00
tokenization_test_pytorch.py Bug fix type=bool -> action='store_true' in argparse 2018-11-02 10:04:41 +01:00

README.md

PyTorch implementation of Google AI's BERT

Introduction

This is a PyTorch implementation of the TensorFlow code released by Google AI with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

Converting the TensorFlow pre-trained models to Pytorch

You can convert the pre-trained weights released by GoogleAI by calling the script convert_tf_checkpoint_to_pytorch.py. It takes a TensorFlow checkpoint (bert_model.ckpt) containg the pre-trained weights and converts it to a .bin file readable for PyTorch.

TensorFlow pre-trained models can be found in the original TensorFlow code. We give an example with the BERT-Base Uncased model:

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export BERT_PYTORCH_DIR=/path/to/pytorch/bert/uncased_L-12_H-768_A-12

python convert_tf_checkpoint_to_pytorch.py \
  --tf_checkpoint_path=$BERT_BASE_DIR/bert_model.ckpt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --pytorch_dump_path=$BERT_PYTORCH_DIR/pytorch_model.bin

Fine-tuning with BERT: running the examples

We showcase the same examples as in the original implementation: fine-tuning on the MRPC classification corpus and the question answering dataset SQUAD.

Before running theses examples you should download the GLUE data by running this script and unpack it to some directory $GLUE_DIR. Please also download the BERT-Base checkpoint, unzip it to some directory $BERT_BASE_DIR, and convert it to its PyTorch version as explained in the previous section.

This example code fine-tunes BERT-Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus and runs in less than 10 minutes on a single K-80.

export GLUE_DIR=/path/to/glue

python run_classifier_pytorch.py \
  --task_name MRPC \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir $GLUE_DIR/MRPC/ \
  --vocab_file $BERT_BASE_DIR/vocab.txt \
  --bert_config_file $BERT_BASE_DIR/bert_config.json \
  --init_checkpoint $BERT_PYTORCH_DIR/pytorch_model.bin \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir /tmp/mrpc_output_pytorch/

The next example fine-tunes BERT-Base on the SQuAD question answering task.

The data for SQuAD can be downloaded with the following links and should be saved in a $SQUAD_DIR directory.

export SQUAD_DIR=/path/to/SQUAD

python run_squad_pytorch.py \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_PYTORCH_DIR/pytorch_model.bin \
  --do_train \
  --train_file=$SQUAD_DIR/train-v1.1.json \
  --do_predict \
  --predict_file=$SQUAD_DIR/dev-v1.1.json \
  --train_batch_size=12 \
  --learning_rate=5e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=../debug_squad/

Comparing TensorFlow and PyTorch models

We also include a small Notebook we used to verify that the conversion of the weights to PyTorch are consistent with the original TensorFlow weights. Please follow the instructions in the Notebook to run it.

Note on pre-training

The original TensorFlow code also release two scripts for pre-training BERT: create_pretraining_data.py and run_pretraining.py. As the authors notice, pre-training BERT is particularly expensive and requires TPU to run in a reasonable amout of time (see here).

We have decided not to port these scripts for now and wait for the TPU support on PyTorch (see the recent official announcement).

Requirements

The main dependencies of this code are:

  • PyTorch (>= 0.4.0)
  • tqdm