TobDeBer
/

PowerMoe-3b-GGUF

Text Generation

Model card Files Files and versions Community

TobDeBer commited on Sep 17

Commit

dc91234

•

1 Parent(s): 88dec36

Update README.md

Files changed (1) hide show

README.md +134 -3

README.md CHANGED Viewed

@@ -1,3 +1,134 @@
----
-license: apache-2.0
----

+---
+pipeline_tag: text-generation
+inference: false
+license: apache-2.0
+model-index:
+- name: ibm/PowerMoE-3b
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: ARC
+    metrics:
+    - name: accuracy-norm
+      type: accuracy-norm
+      value: 58.1
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: BoolQ
+    metrics:
+    - name: accuracy
+      type: accuracy
+      value: 65
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: Hellaswag
+    metrics:
+    - name: accuracy-norm
+      type: accuracy-norm
+      value: 71.5
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: OpenBookQA
+    metrics:
+    - name: accuracy-norm
+      type: accuracy-norm
+      value: 41
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: PIQA
+    metrics:
+    - name: accuracy-norm
+      type: accuracy-norm
+      value: 79.1
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: Winogrande
+    metrics:
+    - name: accuracy-norm
+      type: accuracy-norm
+      value: 65
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: MMLU (5 shot)
+    metrics:
+    - name: accuracy
+      type: accuracy
+      value: 42.8
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: GSM8k (5 shot)
+    metrics:
+    - name: accuracy
+      type: accuracy
+      value: 25.9
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: lm-eval-harness
+      name: math (4 shot)
+    metrics:
+    - name: accuracy
+      type: accuracy
+      value: 14.8
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: bigcode-eval
+      name: humaneval
+    metrics:
+    - name: pass@1
+      type: pass@1
+      value: 20.1
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: bigcode-eval
+      name: MBPP
+    metrics:
+    - name: pass@1
+      type: pass@1
+      value: 32.4
+      verified: false
+base_model:
+- ibm/PowerMoE-3b
+---
+## Model Summary
+PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
+Paper: https://arxiv.org/abs/2408.13359
+This is a GGUF quantized version.
+## Usage
+Requires latest llama.cpp to run.
+### Generation
+This is a simple example of how to use the PowerMoe GGUF:
+./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"