Safetensors
Russian
mistral
IlyaGusev commited on
Commit
47fd3a6
1 Parent(s): 08b2776

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -28
README.md CHANGED
@@ -87,35 +87,12 @@ v1:
87
  - SFT model config: [saiga_nemo_12b_sft_m9.json](https://github.com/IlyaGusev/saiga/blob/main/configs/models/saiga_nemo_12b_sft_m9.json)
88
  - SimPO dataset config: [pref_d31.json](https://github.com/IlyaGusev/saiga/blob/main/configs/datasets/pref_d31.json)
89
  - SimPO model config: [saiga_nemo_12b_simpo_m19.json](https://github.com/IlyaGusev/saiga/blob/main/configs/models/saiga_nemo_12b_simpo_m19.json)
90
- - SFT wandb: [link](https://wandb.ai/ilyagusev/rulm_self_instruct/runs/2ympfu9y)
91
- - SimPO wandb: [link](https://wandb.ai/ilyagusev/rulm_self_instruct/runs/9zn4825e)
92
 
93
 
94
  ## Evaluation
95
 
96
- * Dataset: https://github.com/IlyaGusev/rulm/blob/master/self_instruct/data/tasks.jsonl
97
- * Framework: https://github.com/tatsu-lab/alpaca_eval
98
- * Evaluator: alpaca_eval_cot_gpt4_turbo_fn
99
-
100
- Pivot: chatgpt_3_5_turbo
101
- | model | length_controlled_winrate | win_rate | standard_error | avg_length |
102
- |-----|-----|-----|-----|-----|
103
- |chatgpt_4_turbo | 76.04 | 90.00 |1.46 | 1270 |
104
- |chatgpt_3_5_turbo | 50.00 | 50.00 | 0.00 | 536 |
105
- |saiga_llama3_8b, v6 | 49.33 | 68.31 | 2.26 | 1262 |
106
- |sfr-iter-dpo | 49.11 | 74.94 | 2.13 | 1215 |
107
- |suzume | 49.05 | 71.57 | 2.20 | 1325 |
108
- |saiga_llama3_8b, v7| 48.95 | 69.40 | 2.25 | 1266 |
109
- |saiga_llama3_8b, v5 | 47.13 | 66.18 | 2.31 | 1194 |
110
- |saiga_llama3_8b, v4 | 43.64 | 65.90 | 2.31 | 1200 |
111
- |saiga_llama3_8b, v3 | 36.97 | 61.08 | 2.38 | 1162 |
112
- |saiga_llama3_8b, v2 | 33.07 | 48.19 | 2.45 | 1166 |
113
- |saiga_mistral_7b | 23.38 | 35.99 | 2.34 | 949 |
114
-
115
- Pivot: sfr
116
- | model | length_controlled_winrate | win_rate | standard_error | avg_length |
117
- |-----|-----|-----|-----|-----|
118
- | sfr | 50.00 | 50.00 | 0.00 | 1215 |
119
- | saiga_llama3_8b, v7 | 48.95 | 49.16 | 2.46 | 1266 |
120
- | saiga_llama3_8b, v6 | 46.91 | 47.23 | 2.45 | 1262 |
121
- | suzume_8b | 43.69 | 48.19 | 2.46 | 1325 |
 
87
  - SFT model config: [saiga_nemo_12b_sft_m9.json](https://github.com/IlyaGusev/saiga/blob/main/configs/models/saiga_nemo_12b_sft_m9.json)
88
  - SimPO dataset config: [pref_d31.json](https://github.com/IlyaGusev/saiga/blob/main/configs/datasets/pref_d31.json)
89
  - SimPO model config: [saiga_nemo_12b_simpo_m19.json](https://github.com/IlyaGusev/saiga/blob/main/configs/models/saiga_nemo_12b_simpo_m19.json)
90
+ - SFT wandb: [link](https://wandb.ai/ilyagusev/rulm_self_instruct/runs/e74ozfzh)
91
+ - SimPO wandb: [link](https://wandb.ai/ilyagusev/rulm_self_instruct/runs/b094iiej)
92
 
93
 
94
  ## Evaluation
95
 
96
+ RuArenaHard:
97
+
98
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/5fc2346dea82dd667bb0ffbc/-uG--3Wu9oUi9_bC_ZFP4.png)