Update README.md
Browse files
README.md
CHANGED
@@ -87,35 +87,12 @@ v1:
|
|
87 |
- SFT model config: [saiga_nemo_12b_sft_m9.json](https://github.com/IlyaGusev/saiga/blob/main/configs/models/saiga_nemo_12b_sft_m9.json)
|
88 |
- SimPO dataset config: [pref_d31.json](https://github.com/IlyaGusev/saiga/blob/main/configs/datasets/pref_d31.json)
|
89 |
- SimPO model config: [saiga_nemo_12b_simpo_m19.json](https://github.com/IlyaGusev/saiga/blob/main/configs/models/saiga_nemo_12b_simpo_m19.json)
|
90 |
-
- SFT wandb: [link](https://wandb.ai/ilyagusev/rulm_self_instruct/runs/
|
91 |
-
- SimPO wandb: [link](https://wandb.ai/ilyagusev/rulm_self_instruct/runs/
|
92 |
|
93 |
|
94 |
## Evaluation
|
95 |
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
Pivot: chatgpt_3_5_turbo
|
101 |
-
| model | length_controlled_winrate | win_rate | standard_error | avg_length |
|
102 |
-
|-----|-----|-----|-----|-----|
|
103 |
-
|chatgpt_4_turbo | 76.04 | 90.00 |1.46 | 1270 |
|
104 |
-
|chatgpt_3_5_turbo | 50.00 | 50.00 | 0.00 | 536 |
|
105 |
-
|saiga_llama3_8b, v6 | 49.33 | 68.31 | 2.26 | 1262 |
|
106 |
-
|sfr-iter-dpo | 49.11 | 74.94 | 2.13 | 1215 |
|
107 |
-
|suzume | 49.05 | 71.57 | 2.20 | 1325 |
|
108 |
-
|saiga_llama3_8b, v7| 48.95 | 69.40 | 2.25 | 1266 |
|
109 |
-
|saiga_llama3_8b, v5 | 47.13 | 66.18 | 2.31 | 1194 |
|
110 |
-
|saiga_llama3_8b, v4 | 43.64 | 65.90 | 2.31 | 1200 |
|
111 |
-
|saiga_llama3_8b, v3 | 36.97 | 61.08 | 2.38 | 1162 |
|
112 |
-
|saiga_llama3_8b, v2 | 33.07 | 48.19 | 2.45 | 1166 |
|
113 |
-
|saiga_mistral_7b | 23.38 | 35.99 | 2.34 | 949 |
|
114 |
-
|
115 |
-
Pivot: sfr
|
116 |
-
| model | length_controlled_winrate | win_rate | standard_error | avg_length |
|
117 |
-
|-----|-----|-----|-----|-----|
|
118 |
-
| sfr | 50.00 | 50.00 | 0.00 | 1215 |
|
119 |
-
| saiga_llama3_8b, v7 | 48.95 | 49.16 | 2.46 | 1266 |
|
120 |
-
| saiga_llama3_8b, v6 | 46.91 | 47.23 | 2.45 | 1262 |
|
121 |
-
| suzume_8b | 43.69 | 48.19 | 2.46 | 1325 |
|
|
|
87 |
- SFT model config: [saiga_nemo_12b_sft_m9.json](https://github.com/IlyaGusev/saiga/blob/main/configs/models/saiga_nemo_12b_sft_m9.json)
|
88 |
- SimPO dataset config: [pref_d31.json](https://github.com/IlyaGusev/saiga/blob/main/configs/datasets/pref_d31.json)
|
89 |
- SimPO model config: [saiga_nemo_12b_simpo_m19.json](https://github.com/IlyaGusev/saiga/blob/main/configs/models/saiga_nemo_12b_simpo_m19.json)
|
90 |
+
- SFT wandb: [link](https://wandb.ai/ilyagusev/rulm_self_instruct/runs/e74ozfzh)
|
91 |
+
- SimPO wandb: [link](https://wandb.ai/ilyagusev/rulm_self_instruct/runs/b094iiej)
|
92 |
|
93 |
|
94 |
## Evaluation
|
95 |
|
96 |
+
RuArenaHard:
|
97 |
+
|
98 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5fc2346dea82dd667bb0ffbc/-uG--3Wu9oUi9_bC_ZFP4.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|