Model card for Transformer_de_en_multi30K
Model Description
This project contains my work on building a transformer from scratch for an German-to-English translation.
This project uses pytorch-original-transformer
work to understand the inner workings of the transformer and how to build it from scratch.
Along with the implementation, we are referring to the original paper to study transformers.
Model Details
This model takes the following arguments as represented in the paper.
'dk': key dimensions -> 32,
'dv': value dimensions -> 32,
'h': Number of parallel attention heads -> 8,
'src_vocab_size': source vocabulary size (German) -> 8500,
'target_vocab_size': target vocabulary size (English) -> 6500,
'src_pad_idx': Source pad index -> 2,
'target_pad_idx': Target pad index -> 2,
'num_encoders': Number of encoder modules -> 3,
'num_decoders': Number of decoder modules -> 3,
'dim_multiplier': Dimension multiplier for inner dimensions in pointwise FFN (dff = dk*h*dim_multiplier) -> 4,
'pdropout': Dropout probability in the network -> 0.1,
'lr': learning rate used to train the model -> 0.0003,
'N_EPOCHS': Number of Epochs -> 50,
'CLIP': 1,
'patience': 5
We use Adam Optimizer along with CrossEntropyLoss to train the model.
We tested the performance of the model on 1000 held-out test data and observed a Bleu score of 30.8
Usage
Make sure to clone the repo and use the following code snippet to load the transformer model
# torch packages
import torch
from model.transformer import Transformer
import json
if __name__ == "__main__":
"""
Following parameters are for Multi30K dataset
"""
# Load config containing model input parameters
with open('params.json') as json_data:
config = json.load(json_data)
print(config)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Instantiate model
model = Transformer(
config["dk"],
config["dv"],
config["h"],
config["src_vocab_size"],
config["target_vocab_size"],
config["num_encoders"],
config["num_decoders"],
config["dim_multiplier"],
config["pdropout"],
device = device)
# Load model weights
model.load_state_dict(torch.load('pytorch_transformer_model.pt',
map_location=device))
print(model)
Source code
Source code used to train the model is linked in this github
Resources
The following code is derived from the pytorch-original-transformer
@misc{Gordić2020PyTorchOriginalTransformer,
author = {Gordić, Aleksa},
title = {pytorch-original-transformer},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/gordicaleksa/pytorch-original-transformer}},
}
and using the following blog