distilroberta-mbfc-bias

This model is a fine-tuned version of distilroberta-base on the Proppy dataset, using political bias from mediabiasfactcheck.com as labels.

It achieves the following results on the evaluation set:

Loss: 1.4130
Acc: 0.6348

Training and evaluation data

The training data used is the proppy corpus. Articles are labeled for political bias using the political bias of the source publication, as scored by mediabiasfactcheck.com. See Proppy: Organizing the News Based on Their Propagandistic Content for details.

To create a more balanced training set, common labels are downsampled to have a maximum of 2000 articles. The resulting label distribution in the training data is as follows:

extremeright     689
leastbiased     2000
left             783
leftcenter      2000
right           1260
rightcenter     1418
unknown         2000

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 32
eval_batch_size: 32
seed: 12345
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 16
num_epochs: 20
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Acc
0.9493	1.0	514	1.2765	0.4730
0.7376	2.0	1028	1.0003	0.5812
0.6702	3.0	1542	1.1294	0.5631
0.6161	4.0	2056	1.0439	0.6058
0.4934	5.0	2570	1.1196	0.6028
0.4558	6.0	3084	1.0993	0.5977
0.4717	7.0	3598	1.0308	0.6373
0.3961	8.0	4112	1.1291	0.6234
0.3829	9.0	4626	1.1554	0.6316
0.3442	10.0	5140	1.1548	0.6465
0.2505	11.0	5654	1.3605	0.6169
0.2105	12.0	6168	1.3310	0.6297
0.262	13.0	6682	1.2706	0.6383
0.2031	14.0	7196	1.3658	0.6378
0.2021	15.0	7710	1.4130	0.6348

Framework versions

Transformers 4.11.2
Pytorch 1.7.1
Datasets 1.11.0
Tokenizers 0.10.3

valurank
/

distilroberta-mbfc-bias