llama2-7bb-tweet-summarization-gradnorm-0.3-warmupratio-0.05

This model is a fine-tuned version of NousResearch/Llama-2-7b-hf on the dialogstudio dataset. It achieves the following results on the evaluation set:

Loss: 2.8360
Rouge Scores: {'rouge1': 93.92075452911438, 'rouge2': 78.28015883656892, 'rougeL': 64.88738306318788, 'rougeLsum': 93.91572652306441}
Bleu Scores: [0.9489359800839542, 0.9362845242017266, 0.908851614503138, 0.877219164400539]
Gen Len: 463.0182

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 7
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge Scores	Bleu Scores	Gen Len
1.9797	1.0	220	1.8468	{'rouge1': 92.58753211785933, 'rouge2': 78.56005630365634, 'rougeL': 67.90431046147724, 'rougeLsum': 92.60062258669173}	[0.9044331868078556, 0.8935689766031644, 0.8701723917264629, 0.8432837507770929]	463.0182
1.6875	2.0	440	1.8317	{'rouge1': 93.57745806777827, 'rouge2': 79.20734399292829, 'rougeL': 68.03949913123978, 'rougeLsum': 93.56573169703795}	[0.9260232753232301, 0.9151369981183058, 0.8909296719649512, 0.8630015430201563]	463.0182
1.3609	3.0	660	1.9440	{'rouge1': 93.64149561116312, 'rouge2': 78.9369863604149, 'rougeL': 67.28929677118091, 'rougeLsum': 93.6354094969574}	[0.933089623937576, 0.9213474707086045, 0.8961256783117583, 0.8671119660431741]	463.0182
0.9973	4.0	880	2.1479	{'rouge1': 93.77098210043772, 'rouge2': 78.72676191106424, 'rougeL': 66.61685782420736, 'rougeLsum': 93.77132525696588}	[0.9407524092990022, 0.9287706231907287, 0.9027163186452807, 0.8726978389893866]	463.0182
0.6828	5.0	1100	2.3624	{'rouge1': 93.76850681087201, 'rouge2': 78.54959646542315, 'rougeL': 65.96739684743356, 'rougeLsum': 93.76918986163282}	[0.9447432833130077, 0.9323421216849288, 0.9057018192399795, 0.8750402029132044]	463.0182
0.4662	6.0	1320	2.6675	{'rouge1': 93.85846920408349, 'rouge2': 78.2490547871314, 'rougeL': 65.29853641567857, 'rougeLsum': 93.85718380561036}	[0.9469562557636547, 0.9345062357610694, 0.9072329702581828, 0.8757492563086158]	463.0182
0.3594	7.0	1540	2.8360	{'rouge1': 93.92075452911438, 'rouge2': 78.28015883656892, 'rougeL': 64.88738306318788, 'rougeLsum': 93.91572652306441}	[0.9489359800839542, 0.9362845242017266, 0.908851614503138, 0.877219164400539]	463.0182

Framework versions

PEFT 0.8.2.dev0
Transformers 4.38.0.dev0
Pytorch 2.1.0+cu121
Datasets 2.16.2.dev0
Tokenizers 0.15.1

DrishtiSharma
/

llama2-7bb-tweet-summarization-gradnorm-0.3-warmupratio-0.05

llama2-7bb-tweet-summarization-gradnorm-0.3-warmupratio-0.05

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for DrishtiSharma/llama2-7bb-tweet-summarization-gradnorm-0.3-warmupratio-0.05

Evaluation results