health360
/

Healix-1.1B-V1-Chat-dDPO

Text Generation

Transformers

Safetensors

text-generation-inference

Eval Results

Inference Endpoints

Model card Files Files and versions Community

krvhrv

leaderboard-pr-bot commited on Jun 13

Commit

3bf4de9

•

1 Parent(s): 07dd053

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (4387d29abc760c27658bd45eb761285b279a2e07)

Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show

README.md +120 -4

README.md CHANGED Viewed

@@ -1,14 +1,117 @@
 ---
-datasets:
-- krvhrv/Healix-Medical-Shot
 language:
 - en
 tags:
 - medical
 - biology
 - chemistry
 - text-generation-inference
-license: apache-2.0
 ---
 # Healix 1.1B Model Card
@@ -31,4 +134,17 @@ Users are urged to use Healix 1.1B responsibly, considering the ethical implicat
 Details on the development, training, and evaluation of Healix 1.1B will be available in our forthcoming publications, offering insights into its creation and the advancements it brings to medical informatics.
 ### Input Format
-Use the Alpaca model format.

 ---
 language:
 - en
+license: apache-2.0
 tags:
 - medical
 - biology
 - chemistry
 - text-generation-inference
+datasets:
+- krvhrv/Healix-Medical-Shot
+model-index:
+- name: Healix-1.1B-V1-Chat-dDPO
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 30.55
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 44.78
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 24.64
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 41.55
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 56.51
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 0.0
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
+      name: Open LLM Leaderboard
 ---
 # Healix 1.1B Model Card
 Details on the development, training, and evaluation of Healix 1.1B will be available in our forthcoming publications, offering insights into its creation and the advancements it brings to medical informatics.
 ### Input Format
+Use the Alpaca model format.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_health360__Healix-1.1B-V1-Chat-dDPO)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |33.00|
+|AI2 Reasoning Challenge (25-Shot)|30.55|
+|HellaSwag (10-Shot)              |44.78|
+|MMLU (5-Shot)                    |24.64|
+|TruthfulQA (0-shot)              |41.55|
+|Winogrande (5-shot)              |56.51|
+|GSM8k (5-shot)                   | 0.00|