Model card for Transformer_de_en_multi30K

Model Description

This project contains my work on building a transformer from scratch for an German-to-English translation.
This project uses pytorch-original-transformer work to understand the inner workings of the transformer and how to build it from scratch. Along with the implementation, we are referring to the original paper to study transformers.

Model Details

This model takes the following arguments as represented in the paper.

'dk': key dimensions -> 32,
'dv': value dimensions -> 32,
'h': Number of parallel attention heads -> 8,
'src_vocab_size': source vocabulary size (German) -> 8500,
'target_vocab_size': target vocabulary size (English) -> 6500,
'src_pad_idx': Source pad index -> 2,
'target_pad_idx': Target pad index -> 2,
'num_encoders': Number of encoder modules -> 3,
'num_decoders': Number of decoder modules -> 3,
'dim_multiplier': Dimension multiplier for inner dimensions in pointwise FFN (dff = dk*h*dim_multiplier) -> 4,
'pdropout': Dropout probability in the network -> 0.1,
'lr': learning rate used to train the model -> 0.0003,
'N_EPOCHS': Number of Epochs -> 50,
'CLIP': 1,
'patience': 5

We use Adam Optimizer along with CrossEntropyLoss to train the model.

We tested the performance of the model on 1000 held-out test data and observed a Bleu score of 30.8

Usage

Make sure to clone the repo and use the following code snippet to load the transformer model

# torch packages
import torch
from model.transformer import Transformer
import json

if __name__ == "__main__":
    """
    Following parameters are for Multi30K dataset
    """
    # Load config containing model input parameters
    with open('params.json') as json_data:
        config = json.load(json_data)
    print(config)

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # Instantiate model
    model = Transformer(
                    config["dk"], 
                    config["dv"], 
                    config["h"],
                    config["src_vocab_size"],
                    config["target_vocab_size"],
                    config["num_encoders"],
                    config["num_decoders"],
                    config["dim_multiplier"], 
                    config["pdropout"],
                    device = device)
    # Load model weights
    model.load_state_dict(torch.load('pytorch_transformer_model.pt', 
                                     map_location=device))
    print(model)

Source code

Source code used to train the model is linked in this github

Resources

The following code is derived from the pytorch-original-transformer

@misc{Gordić2020PyTorchOriginalTransformer,
  author = {Gordić, Aleksa},
  title = {pytorch-original-transformer},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/gordicaleksa/pytorch-original-transformer}},
}

and using the following blog