ubaada's picture
Update README.md
a024350 verified
metadata
license: apache-2.0
datasets:
  - wmt/wmt14
language:
  - de
  - en
pipeline_tag: text2text-generation


This is a custom huggingface model port of the PyTorch implementation of the original transformer model from 2017 introduced in the paper "Attention Is All You Need". This is the 65M parameter base model version trained to do English-to-German translations.

Usage:

model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
text = 'This is my cat'
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
# Output: ' Das ist meine Katze.'

(remember the trust_remote_code=True because of custom modeling file)

Training:

Parameter Value
Dataset WMT14-de-en
Translation Pairs 4.5M (135M tokens total)
Epochs 24
Batch Size 16
Accumulation Batch 8
Effective Batch Size 128 (16 * 8)
Training Script train.py
Optimiser Adam (learning rate = 0.0001)
Loss Type Cross Entropy
Final Test Loss 1.87
GPU. RTX 4070 (12GB)

Results