--- license: apache-2.0 datasets: - wmt/wmt14 language: - de - en pipeline_tag: text2text-generation ---


This is a custom huggingface model port of the [PyTorch implementation of the original transformer](https://github.com/ubaada/scratch-transformer) model from 2017 introduced in the paper "[Attention Is All You Need](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)". This is the 65M parameter base model version trained to do English-to-German translations. ## Usage: ```python model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer") text = 'This is my cat' output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100)) tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True) # Output: ' Das ist meine Katze.' ``` (remember the `trust_remote_code=True` because of custom modeling file) ## Training: | Parameter | Value | |----------------------|-------------------------------------------------------------------------------------------------| | Dataset | WMT14-de-en | | Translation Pairs | 4.5M (135M tokens total) | | Epochs | 24 | | Batch Size | 16 | | Accumulation Batch | 8 | | Effective Batch Size | 128 (16 * 8) | | Training Script | [train.py](https://github.com/ubaada/scratch-transformer/blob/main/train.py) | | Optimiser | Adam (learning rate = 0.0001) | | Loss Type | Cross Entropy | | Final Test Loss | 1.87 | | GPU. | RTX 4070 (12GB) |

## Results