Edit model card

This is a distillation experiment with SmolLM2-1.7B as teacher and SmolLM2-360M as student model.

It slightly improves upon the performance of the basemodel on the following tasks (wip):

Tasks HuggingFaceTB/SmolLM2-360M Value aloobun/d-SmolLM2-360M Value
- leaderboard_bbh_causal_judgement 0.4545 0.4652
- leaderboard_bbh_geometric_shapes 0.1680 0.2040
- leaderboard_bbh_movie_recommendation 0.2120 0.2440
- leaderboard_bbh_penguins_in_a_table 0.2055 0.2123
- leaderboard_bbh_reasoning_about_colored_objects 0.1160 0.1320
- leaderboard_bbh_ruin_names 0.2360 0.2480
- leaderboard_bbh_salient_translation_error_detection 0.1480 0.2120
- leaderboard_bbh_snarks 0.5169 0.5281
- leaderboard_bbh_temporal_sequences 0.2720 0.2800
- leaderboard_musr_murder_mysteries 0.5040 0.5160

Well, it didn’t work as well as I hoped, will try again.

Eval Results aloobun/d-SmolLM2-360M (WIP)

GPQA

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.2071 ± 0.0289
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.2308 ± 0.0180
- leaderboard_gpqa_main 1 none 0 acc_norm 0.2679 ± 0.0209

MUSR

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5160 ± 0.0317
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.2383 ± 0.0267
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.4400 ± 0.0315

BBH

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.5480 ± 0.0315
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.4652 ± 0.0366
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.1560 ± 0.0230
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.3120 ± 0.0294
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.5240 ± 0.0316
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.2040 ± 0.0255
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.5000 ± 0.0317
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.2240 ± 0.0264
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.1440 ± 0.0222
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.3320 ± 0.0298
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.2440 ± 0.0272
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.5800 ± 0.0313
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.2080 ± 0.0257
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.2123 ± 0.0340
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.1320 ± 0.0215
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.2480 ± 0.0274
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.2120 ± 0.0259
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.5281 ± 0.0375
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.4600 ± 0.0316
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.2800 ± 0.0285
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.1720 ± 0.0239
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1440 ± 0.0222
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3000 ± 0.0290
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.5480 ± 0.0315

MMLU_PRO

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_mmlu_pro 0.1 none 5 acc 0.1173 ± 0.0029

IFEVAL

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.2866 ± N/A
none 0 inst_level_strict_acc 0.2770 ± N/A
none 0 prompt_level_loose_acc 0.1497 ± 0.0154
none 0 prompt_level_strict_acc 0.1423 ± 0.0150

MATH HARD

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 2 none 4 exact_match 0.0033 ± 0.0033
- leaderboard_math_counting_and_prob_hard 2 none 4 exact_match 0.0081 ± 0.0081
- leaderboard_math_geometry_hard 2 none 4 exact_match 0.0000 ± 0.0000
- leaderboard_math_intermediate_algebra_hard 2 none 4 exact_match 0.0000 ± 0.0000
- leaderboard_math_num_theory_hard 2 none 4 exact_match 0.0065 ± 0.0065
- leaderboard_math_prealgebra_hard 2 none 4 exact_match 0.0104 ± 0.0073
- leaderboard_math_precalculus_hard 2 none 4 exact_match 0.0000 ± 0.0000
Downloads last month
9
Safetensors
Model size
362M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for aloobun/d-SmolLM2-360M

Quantizations
1 model