## Pytorch RoBERTa to ONNX

This notebook documents how to export the PyTorch NLP model into ONNX format and then use it to make predictions using the ONNX runtime.

The model uses the `simpletransformers` library which is a Python wrappers around the `transformers` library which contains PyTorch NLP transformer architectures and weights.

In [1]:
import torch
import numpy as np
from simpletransformers.model import TransformerModel
from transformers import RobertaForSequenceClassification, RobertaTokenizer
import onnx
import onnxruntime

## Step 1: Load pretrained PyTorch model

Download the model weights from https://storage.googleapis.com/seldon-models/pytorch/moviesentiment_roberta/pytorch_model.bin

In [2]:
model = TransformerModel('roberta', 'roberta-base', args=({'fp16': False}))

In [3]:
model.model.load_state_dict(torch.load('pytorch_model.bin'))



## Step 2: Export as ONNX

PyTorch supports exporting to ONNX, you just need to specify a valid input tensor for the model.

In [4]:
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
input_ids = torch.tensor(tokenizer.encode("This film is so bad", add_special_tokens=True)).unsqueeze(0) # Batch size 1

In [5]:
input_ids

tensor([[ 0, 713, 822, 16, 98, 1099, 2]])

Export as ONNX, we specify dynamic axes for batch dimension and sequence length as sentences come in various lengths.

In [6]:
torch.onnx.export(model.model,
 (input_ids),
 "roberta.onnx",
 input_names=['input'],
 output_names=['output'],
 dynamic_axes={'input' :{0 : 'batch_size',
 1: 'sentence_length'},
 'output': {0: 'batch_size'}})

 if input_ids[:, 0].sum().item() != 0:


## Step 3: Test predictions are the same using ONNX runtime

In [7]:
onnx_model = onnx.load("roberta.onnx")

In [8]:
# checks the exported model, may crash ipython kernel if run together with the PyTorch model in memory
# onnx.checker.check_model(onnx_model)

In [9]:
import onnxruntime

ort_session = onnxruntime.InferenceSession("roberta.onnx")

In [10]:
def to_numpy(tensor):
 return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

In [11]:
input_ids = torch.tensor(tokenizer.encode("This film is so bad", add_special_tokens=True)).unsqueeze(0) # Batch size 1

In [12]:
# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(input_ids)}
ort_out = ort_session.run(None, ort_inputs)

In [13]:
out = model.model(input_ids)

In [14]:
out, ort_out

((tensor([[ 2.3067, -2.6440]], grad_fn=),),
 [array([[ 2.3066945, -2.6439788]], dtype=float32)])

In [15]:
np.testing.assert_allclose(to_numpy(out[0]), ort_out[0], rtol=1e-03, atol=1e-05)