TobDeBer commited on
Commit
dc91234
1 Parent(s): 88dec36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -3
README.md CHANGED
@@ -1,3 +1,134 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: false
4
+ license: apache-2.0
5
+ model-index:
6
+ - name: ibm/PowerMoE-3b
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ dataset:
11
+ type: lm-eval-harness
12
+ name: ARC
13
+ metrics:
14
+ - name: accuracy-norm
15
+ type: accuracy-norm
16
+ value: 58.1
17
+ verified: false
18
+ - task:
19
+ type: text-generation
20
+ dataset:
21
+ type: lm-eval-harness
22
+ name: BoolQ
23
+ metrics:
24
+ - name: accuracy
25
+ type: accuracy
26
+ value: 65
27
+ verified: false
28
+ - task:
29
+ type: text-generation
30
+ dataset:
31
+ type: lm-eval-harness
32
+ name: Hellaswag
33
+ metrics:
34
+ - name: accuracy-norm
35
+ type: accuracy-norm
36
+ value: 71.5
37
+ verified: false
38
+ - task:
39
+ type: text-generation
40
+ dataset:
41
+ type: lm-eval-harness
42
+ name: OpenBookQA
43
+ metrics:
44
+ - name: accuracy-norm
45
+ type: accuracy-norm
46
+ value: 41
47
+ verified: false
48
+ - task:
49
+ type: text-generation
50
+ dataset:
51
+ type: lm-eval-harness
52
+ name: PIQA
53
+ metrics:
54
+ - name: accuracy-norm
55
+ type: accuracy-norm
56
+ value: 79.1
57
+ verified: false
58
+ - task:
59
+ type: text-generation
60
+ dataset:
61
+ type: lm-eval-harness
62
+ name: Winogrande
63
+ metrics:
64
+ - name: accuracy-norm
65
+ type: accuracy-norm
66
+ value: 65
67
+ verified: false
68
+ - task:
69
+ type: text-generation
70
+ dataset:
71
+ type: lm-eval-harness
72
+ name: MMLU (5 shot)
73
+ metrics:
74
+ - name: accuracy
75
+ type: accuracy
76
+ value: 42.8
77
+ verified: false
78
+ - task:
79
+ type: text-generation
80
+ dataset:
81
+ type: lm-eval-harness
82
+ name: GSM8k (5 shot)
83
+ metrics:
84
+ - name: accuracy
85
+ type: accuracy
86
+ value: 25.9
87
+ verified: false
88
+ - task:
89
+ type: text-generation
90
+ dataset:
91
+ type: lm-eval-harness
92
+ name: math (4 shot)
93
+ metrics:
94
+ - name: accuracy
95
+ type: accuracy
96
+ value: 14.8
97
+ verified: false
98
+ - task:
99
+ type: text-generation
100
+ dataset:
101
+ type: bigcode-eval
102
+ name: humaneval
103
+ metrics:
104
+ - name: pass@1
105
+ type: pass@1
106
+ value: 20.1
107
+ verified: false
108
+ - task:
109
+ type: text-generation
110
+ dataset:
111
+ type: bigcode-eval
112
+ name: MBPP
113
+ metrics:
114
+ - name: pass@1
115
+ type: pass@1
116
+ value: 32.4
117
+ verified: false
118
+ base_model:
119
+ - ibm/PowerMoE-3b
120
+ ---
121
+
122
+ ## Model Summary
123
+ PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
124
+ Paper: https://arxiv.org/abs/2408.13359
125
+
126
+ This is a GGUF quantized version.
127
+
128
+ ## Usage
129
+ Requires latest llama.cpp to run.
130
+
131
+ ### Generation
132
+ This is a simple example of how to use the PowerMoe GGUF:
133
+
134
+ ./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"