lapp0's picture
End of training
c3ee546 verified
metadata
license: apache-2.0
base_model: google/flan-t5-small
tags:
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: flan-t5-small-query-expansion-merged-lr-2e-4-ep-30
    results: []

flan-t5-small-query-expansion-merged-lr-2e-4-ep-30

This model is a fine-tuned version of google/flan-t5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0687
  • Rouge1: 88.0058
  • Rouge2: 86.0177
  • Rougel: 87.4622
  • Rougelsum: 87.8743
  • Gen Len: 18.3001

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.5903 1.0 3377 0.6694 64.0967 45.4218 57.4907 61.0826 18.3315
0.6513 2.0 6754 0.5487 66.0187 48.1106 59.9204 63.1815 18.2596
0.8114 3.0 10131 0.4678 68.4505 52.0787 63.0805 66.1556 18.2296
0.4854 4.0 13508 0.3981 69.9674 54.7741 64.9602 67.9319 18.2352
0.5574 5.0 16885 0.3512 71.8602 57.3778 67.222 69.9169 18.1424
0.5343 6.0 20262 0.3047 72.9426 59.4383 68.6726 71.1382 18.0677
0.5003 7.0 23639 0.2670 74.6434 62.3906 71.017 73.1537 18.2826
0.4381 8.0 27016 0.2366 75.5879 63.5581 71.9563 74.0976 18.2247
0.4298 9.0 30393 0.2065 77.1535 66.3128 74.04 75.8557 18.1933
0.3524 10.0 33770 0.1877 78.2066 68.6445 75.2292 77.1067 18.2107
0.3374 11.0 37147 0.1650 79.2401 70.0953 76.392 78.1788 18.1982
0.2578 12.0 40524 0.1424 80.3072 72.4561 78.1583 79.5657 18.2659
0.2468 13.0 43901 0.1280 81.8419 75.2485 80.0524 81.2131 18.2401
0.2079 14.0 47278 0.1147 82.7505 76.7955 81.1375 82.1929 18.2212
0.1632 15.0 50655 0.1023 83.7303 78.3819 82.1019 83.1492 18.2826
0.1669 16.0 54032 0.0945 84.5118 79.8875 83.2797 84.1232 18.2561
0.1974 17.0 57409 0.0886 85.5067 81.4914 84.3091 85.178 18.2840
0.1461 18.0 60786 0.0829 85.9375 82.3743 85.025 85.6625 18.2805
0.1262 19.0 64163 0.0797 86.3679 83.1603 85.507 86.0875 18.2722
0.0982 20.0 67540 0.0759 87.215 84.5141 86.4955 86.9934 18.2770
0.087 21.0 70917 0.0726 87.2046 84.548 86.4369 86.9678 18.2924
0.0914 22.0 74294 0.0715 87.7024 85.3997 86.9993 87.4716 18.2882
0.0945 23.0 77671 0.0703 87.8468 85.7513 87.2094 87.6558 18.2896
0.0586 24.0 81048 0.0698 87.883 85.8184 87.3243 87.692 18.2882
0.062 25.0 84425 0.0689 87.9345 85.9142 87.3693 87.7799 18.2875
0.0758 26.0 87802 0.0687 87.9042 85.8727 87.3166 87.7249 18.2903
0.0771 27.0 91179 0.0686 87.989 86.0401 87.4768 87.8379 18.2882
0.0744 28.0 94556 0.0687 88.0227 86.0604 87.4917 87.8992 18.3001
0.0419 29.0 97933 0.0687 88.0058 86.0177 87.4622 87.8743 18.3001
0.0615 30.0 101310 0.0687 88.0058 86.0177 87.4622 87.8743 18.3001

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2