llama2-7bb-tweet-summarization-gradnorm-0.3-warmupratio-0.05
This model is a fine-tuned version of NousResearch/Llama-2-7b-hf on the dialogstudio dataset. It achieves the following results on the evaluation set:
- Loss: 2.8360
- Rouge Scores: {'rouge1': 93.92075452911438, 'rouge2': 78.28015883656892, 'rougeL': 64.88738306318788, 'rougeLsum': 93.91572652306441}
- Bleu Scores: [0.9489359800839542, 0.9362845242017266, 0.908851614503138, 0.877219164400539]
- Gen Len: 463.0182
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 7
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge Scores | Bleu Scores | Gen Len |
---|---|---|---|---|---|---|
1.9797 | 1.0 | 220 | 1.8468 | {'rouge1': 92.58753211785933, 'rouge2': 78.56005630365634, 'rougeL': 67.90431046147724, 'rougeLsum': 92.60062258669173} | [0.9044331868078556, 0.8935689766031644, 0.8701723917264629, 0.8432837507770929] | 463.0182 |
1.6875 | 2.0 | 440 | 1.8317 | {'rouge1': 93.57745806777827, 'rouge2': 79.20734399292829, 'rougeL': 68.03949913123978, 'rougeLsum': 93.56573169703795} | [0.9260232753232301, 0.9151369981183058, 0.8909296719649512, 0.8630015430201563] | 463.0182 |
1.3609 | 3.0 | 660 | 1.9440 | {'rouge1': 93.64149561116312, 'rouge2': 78.9369863604149, 'rougeL': 67.28929677118091, 'rougeLsum': 93.6354094969574} | [0.933089623937576, 0.9213474707086045, 0.8961256783117583, 0.8671119660431741] | 463.0182 |
0.9973 | 4.0 | 880 | 2.1479 | {'rouge1': 93.77098210043772, 'rouge2': 78.72676191106424, 'rougeL': 66.61685782420736, 'rougeLsum': 93.77132525696588} | [0.9407524092990022, 0.9287706231907287, 0.9027163186452807, 0.8726978389893866] | 463.0182 |
0.6828 | 5.0 | 1100 | 2.3624 | {'rouge1': 93.76850681087201, 'rouge2': 78.54959646542315, 'rougeL': 65.96739684743356, 'rougeLsum': 93.76918986163282} | [0.9447432833130077, 0.9323421216849288, 0.9057018192399795, 0.8750402029132044] | 463.0182 |
0.4662 | 6.0 | 1320 | 2.6675 | {'rouge1': 93.85846920408349, 'rouge2': 78.2490547871314, 'rougeL': 65.29853641567857, 'rougeLsum': 93.85718380561036} | [0.9469562557636547, 0.9345062357610694, 0.9072329702581828, 0.8757492563086158] | 463.0182 |
0.3594 | 7.0 | 1540 | 2.8360 | {'rouge1': 93.92075452911438, 'rouge2': 78.28015883656892, 'rougeL': 64.88738306318788, 'rougeLsum': 93.91572652306441} | [0.9489359800839542, 0.9362845242017266, 0.908851614503138, 0.877219164400539] | 463.0182 |
Framework versions
- PEFT 0.8.2.dev0
- Transformers 4.38.0.dev0
- Pytorch 2.1.0+cu121
- Datasets 2.16.2.dev0
- Tokenizers 0.15.1
- Downloads last month
- 0
Model tree for DrishtiSharma/llama2-7bb-tweet-summarization-gradnorm-0.3-warmupratio-0.05
Base model
NousResearch/Llama-2-7b-hf