radm commited on
Commit
bbaf548
1 Parent(s): 91e1e02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -2
README.md CHANGED
@@ -26,7 +26,7 @@ Only LoRA adapter for base model can be found here (https://huggingface.co/radm/
26
  ## Uses
27
 
28
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
29
- [More Information Needed]
30
 
31
  ## Training Details
32
 
@@ -53,7 +53,31 @@ Datasets:
53
 
54
  ### Results
55
 
56
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
  ## Hardware
59
 
 
26
  ## Uses
27
 
28
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
29
+ Use repository (https://github.com/radaevm/arena-hard-local) for evaluate with local judge model.
30
 
31
  ## Training Details
32
 
 
53
 
54
  ### Results
55
 
56
+
57
+ #### Llama-3-70B-Instruct-GPTQ as judge:
58
+ ```console
59
+ Llama-3-Instruct-8B-SimPO | score: 78.3 | 95% CI: (-1.5, 1.2) | average #tokens: 545
60
+ SELM-Llama-3-8B-Instruct-iter-3 | score: 72.8 | 95% CI: (-2.1, 1.4) | average #tokens: 606
61
+ Meta-Llama-3-8B-Instruct-f16 | score: 65.3 | 95% CI: (-1.8, 2.1) | average #tokens: 560
62
+ suzume-llama-3-8B-multilingual-orpo-borda-half | score: 63.5 | 95% CI: (-1.6, 2.1) | average #tokens: 978
63
+ Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801
64
+ suzume-llama-3-8B-multilingual | score: 48.1 | 95% CI: (-2.2, 1.8) | average #tokens: 767
65
+ aya-23-8B | score: 48.0 | 95% CI: (-2.0, 2.1) | average #tokens: 834
66
+ Vikhr-7B-instruct_0.5 | score: 19.6 | 95% CI: (-1.3, 1.5) | average #tokens: 794
67
+ alpindale_gemma-2b-it | score: 11.2 | 95% CI: (-1.0, 0.8) | average #tokens: 425
68
+ ```
69
+ #### Llama-3-70B-Instruct-AH-AWQ as judge:
70
+ ```console
71
+ Llama-3-Instruct-8B-SimPO | score: 83.8 | 95% CI: (-1.4, 1.3) | average #tokens: 545
72
+ SELM-Llama-3-8B-Instruct-iter-3 | score: 78.8 | 95% CI: (-1.7, 1.9) | average #tokens: 606
73
+ suzume-llama-3-8B-multilingual-orpo-borda-half | score: 71.8 | 95% CI: (-1.7, 2.4) | average #tokens: 978
74
+ Meta-Llama-3-8B-Instruct-f16 | score: 69.8 | 95% CI: (-1.9, 1.7) | average #tokens: 560
75
+ suzume-llama-3-8B-multilingual | score: 54.0 | 95% CI: (-2.1, 2.1) | average #tokens: 767
76
+ aya-23-8B | score: 50.4 | 95% CI: (-1.7, 1.7) | average #tokens: 834
77
+ Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801
78
+ Vikhr-7B-instruct_0.5 | score: 14.2 | 95% CI: (-1.3, 1.0) | average #tokens: 794
79
+ alpindale_gemma-2b-it | score: 7.9 | 95% CI: (-0.9, 0.8) | average #tokens: 425
80
+ ```
81
 
82
  ## Hardware
83