Edit model card

distily_bench_obj_cross_v2.12b_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 2720.0
  • eval_frwikippl: 32256.0
  • eval_zhwikippl: 296960.0
  • eval_tinystoriesppl: 1392.0
  • eval_loss: 2.8924
  • eval_runtime: 12.4707
  • eval_samples_per_second: 48.113
  • eval_steps_per_second: 12.028

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.9381 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 19.3254 12.5492 47.812 11.953 12079595520.0 98956046499840.0
1500 0.0253 46439333888.0 5153960755200.0 13.9821 12.5442 47.831 11.958 285212672.0 10445360463872.0
3000 0.0505 2179072.0 66060288.0 7.7394 12.573 47.721 11.93 158720.0 209715200.0
4500 0.0758 95744.0 2523136.0 5.2142 12.6045 47.602 11.901 17920.0 6029312.0
6000 0.1010 10816.0 158720.0 4.0370 12.5895 47.659 11.915 5760.0 671744.0
7500 0.1263 4448.0 55040.0 3.3192 12.5498 47.809 11.952 2720.0 296960.0
9000 0.1515 2720.0 32256.0 2.8924 12.4707 48.113 12.028 1392.0 296960.0
10500 0.1768 1960.0 20608.0 2.6753 12.5367 47.859 11.965 992.0 278528.0
12000 0.2020 864.0 4896.0 2.2104 12.4794 48.079 12.02 544.0 85504.0
13500 0.2273 564.0 4672.0 1.9660 12.4591 48.158 12.039 382.0 2304.0
15000 0.2525 452.0 2816.0 1.8089 12.5819 47.687 11.922 316.0 788.0
16500 0.2778 398.0 2160.0 1.7757 12.581 47.691 11.923 304.0 548.0
18000 0.3030 374.0 1944.0 1.6982 12.5631 47.759 11.94 296.0 478.0
19500 0.3283 358.0 1488.0 1.6521 12.6042 47.603 11.901 274.0 444.0
21000 0.3535 352.0 1544.0 1.6516 12.5472 47.819 11.955 268.0 466.0
22500 0.3788 336.0 1464.0 1.6172 12.5526 47.799 11.95 266.0 386.0
24000 0.4040 326.0 1280.0 1.5683 12.5056 47.979 11.995 242.0 248.0
25500 0.4293 298.0 1216.0 1.5292 12.5815 47.689 11.922 244.0 255.0
27000 0.4545 290.0 1072.0 1.4859 12.5923 47.648 11.912 236.0 236.0
28500 0.4798 276.0 1144.0 1.4542 12.5108 47.959 11.99 228.0 244.0
30000 0.5051 276.0 1200.0 1.4598 12.5421 47.839 11.96 204.0 258.0
31500 0.5303 270.0 1112.0 1.4433 12.5006 47.998 11.999 212.0 205.0
33000 0.5556 272.0 1040.0 1.4221 12.5626 47.761 11.94 209.0 236.0
34500 0.5808 252.0 1176.0 1.4007 12.5775 47.704 11.926 202.0 222.0
36000 0.6061 248.0 976.0 1.3998 12.5397 47.848 11.962 207.0 266.0
37500 0.6313 226.0 836.0 1.3400 12.6024 47.61 11.902 183.0 260.0
39000 0.6566 213.0 852.0 1.2991 12.6581 47.4 11.85 172.0 182.0
40500 0.6818 208.0 932.0 1.2862 12.5163 47.937 11.984 170.0 163.0
42000 0.7071 206.0 788.0 1.2804 12.6037 47.605 11.901 172.0 159.0
43500 0.7323 204.0 824.0 1.2747 12.5859 47.672 11.918 165.0 163.0
45000 0.7576 201.0 848.0 1.2704 12.722 47.162 11.791 165.0 153.0
46500 0.7828 203.0 760.0 1.2726 12.5879 47.665 11.916 169.0 156.0
48000 0.8081 205.0 820.0 1.2693 12.5698 47.734 11.933 170.0 165.0
49500 0.8333 199.0 792.0 1.2608 12.5756 47.712 11.928 166.0 165.0
51000 0.8586 198.0 768.0 1.2563 12.5984 47.625 11.906 167.0 160.0
52500 0.8838 197.0 788.0 1.2558 12.5705 47.731 11.933 164.0 159.0
54000 0.9091 197.0 776.0 1.2553 12.6019 47.612 11.903 166.0 166.0
55500 0.9343 197.0 784.0 1.2540 12.6329 47.495 11.874 165.0 163.0
57000 0.9596 197.0 776.0 1.2534 12.5525 47.799 11.95 165.0 161.0
58500 0.9848 196.0 780.0 1.2539 12.5854 47.674 11.919 165.0 161.0
59400 1.0 196.0 780.0 1.2536 12.5194 47.925 11.981 165.0 161.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
5
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.12b_gpt2

Quantized
(50)
this model