Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ Only LoRA adapter for base model can be found here (https://huggingface.co/radm/
|
|
26 |
## Uses
|
27 |
|
28 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
29 |
-
|
30 |
|
31 |
## Training Details
|
32 |
|
@@ -53,7 +53,31 @@ Datasets:
|
|
53 |
|
54 |
### Results
|
55 |
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
## Hardware
|
59 |
|
|
|
26 |
## Uses
|
27 |
|
28 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
29 |
+
Use repository (https://github.com/radaevm/arena-hard-local) for evaluate with local judge model.
|
30 |
|
31 |
## Training Details
|
32 |
|
|
|
53 |
|
54 |
### Results
|
55 |
|
56 |
+
|
57 |
+
#### Llama-3-70B-Instruct-GPTQ as judge:
|
58 |
+
```console
|
59 |
+
Llama-3-Instruct-8B-SimPO | score: 78.3 | 95% CI: (-1.5, 1.2) | average #tokens: 545
|
60 |
+
SELM-Llama-3-8B-Instruct-iter-3 | score: 72.8 | 95% CI: (-2.1, 1.4) | average #tokens: 606
|
61 |
+
Meta-Llama-3-8B-Instruct-f16 | score: 65.3 | 95% CI: (-1.8, 2.1) | average #tokens: 560
|
62 |
+
suzume-llama-3-8B-multilingual-orpo-borda-half | score: 63.5 | 95% CI: (-1.6, 2.1) | average #tokens: 978
|
63 |
+
Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801
|
64 |
+
suzume-llama-3-8B-multilingual | score: 48.1 | 95% CI: (-2.2, 1.8) | average #tokens: 767
|
65 |
+
aya-23-8B | score: 48.0 | 95% CI: (-2.0, 2.1) | average #tokens: 834
|
66 |
+
Vikhr-7B-instruct_0.5 | score: 19.6 | 95% CI: (-1.3, 1.5) | average #tokens: 794
|
67 |
+
alpindale_gemma-2b-it | score: 11.2 | 95% CI: (-1.0, 0.8) | average #tokens: 425
|
68 |
+
```
|
69 |
+
#### Llama-3-70B-Instruct-AH-AWQ as judge:
|
70 |
+
```console
|
71 |
+
Llama-3-Instruct-8B-SimPO | score: 83.8 | 95% CI: (-1.4, 1.3) | average #tokens: 545
|
72 |
+
SELM-Llama-3-8B-Instruct-iter-3 | score: 78.8 | 95% CI: (-1.7, 1.9) | average #tokens: 606
|
73 |
+
suzume-llama-3-8B-multilingual-orpo-borda-half | score: 71.8 | 95% CI: (-1.7, 2.4) | average #tokens: 978
|
74 |
+
Meta-Llama-3-8B-Instruct-f16 | score: 69.8 | 95% CI: (-1.9, 1.7) | average #tokens: 560
|
75 |
+
suzume-llama-3-8B-multilingual | score: 54.0 | 95% CI: (-2.1, 2.1) | average #tokens: 767
|
76 |
+
aya-23-8B | score: 50.4 | 95% CI: (-1.7, 1.7) | average #tokens: 834
|
77 |
+
Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801
|
78 |
+
Vikhr-7B-instruct_0.5 | score: 14.2 | 95% CI: (-1.3, 1.0) | average #tokens: 794
|
79 |
+
alpindale_gemma-2b-it | score: 7.9 | 95% CI: (-0.9, 0.8) | average #tokens: 425
|
80 |
+
```
|
81 |
|
82 |
## Hardware
|
83 |
|