premsa's picture
Update README.md
5cbc20c verified
metadata
license: apache-2.0

model base: https://huggingface.co/microsoft/mdeberta-v3-base

dataset: https://github.com/ramybaly/Article-Bias-Prediction

training parameters:

  • devices: 2xH100
  • batch_size: 100
  • epochs: 5
  • dropout: 0.05
  • max_length: 512
  • learning_rate: 3e-5
  • warmup_steps: 100
  • random_state: 239

training methodology:

  • sanitize dataset following specific rule-set, utilize random split as provided in the dataset
  • train on train split and evaluate on validation split in each epoch
  • evaluate test split only on the model that performed best on validation loss

result summary:

  • throughout the five training epochs, model of second epoch achieved the lowest validation loss of 0.2573
  • on test split second epoch model achieved f1 score of 0.9184 and a test loss of 0.2904

usage:

model = AutoModelForSequenceClassification.from_pretrained("premsa/political-bias-prediction-allsides-mDeBERTa")
tokenizer = AutoTokenizer.from_pretrained("premsa/political-bias-prediction-allsides-mDeBERTa")
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(nlp("die massen werden von den medien kontrolliert."))