akjindal53244 commited on
Commit
0ff6652
β€’
0 Parent(s):

Add model files and update README

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Storm-8B.Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9132a94ae3441cd18132e94222d8e6b12d5f30627cfc0c46a27aa50551b49fa3
3
+ size 4920734496
Llama-3.1-Storm-8B.Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a817f5faf2cdee455563913882ba2d63b74c1ba6317b8441341faae1a9f458b
3
+ size 5732987680
Llama-3.1-Storm-8B.Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58bced62244245319393fb2992a0d4ad57a39999d20dd126ec14cd356fdee493
3
+ size 6596006688
Llama-3.1-Storm-8B.Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7e2c522af01158c9a427b350c55119d96517462dcfb34a71ccdd0dca6f07705
3
+ size 8540771104
README.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - de
5
+ - fr
6
+ - it
7
+ - pt
8
+ - hi
9
+ - es
10
+ - th
11
+ pipeline_tag: text-generation
12
+ tags:
13
+ - llama-3.1
14
+ - conversational
15
+ - instruction following
16
+ - reasoning
17
+ - function calling
18
+ license: llama3.1
19
+ base_model: akjindal53244/Llama-3.1-Storm-8B
20
+ ---
21
+
22
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/tmOlbERGKP7JSODa6T06J.jpeg)
23
+
24
+ Authors: [Ashvini Kumar Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/), [Pawan Kumar Rajpoot](https://www.linkedin.com/in/pawanrajpoot/), [Ankur Parikh](https://www.linkedin.com/in/ankurnlpexpert/), [Akshita Sukhlecha](https://www.linkedin.com/in/akshita-sukhlecha/)
25
+
26
+ **πŸ€— Hugging Face Announcement Blog**: https://huggingface.co/blog/akjindal53244/llama31-storm8b
27
+
28
+ <br>
29
+
30
+ # Llama-3.1-Storm-8B-GGUF
31
+ **This is the GGUF quantized version of [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B), for use with [llama.cpp](https://github.com/ggerganov/llama.cpp). BF16 Model [here](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)**
32
+
33
+ ## TL;DR
34
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/mDtDeiHwnBupw1k_n99Lf.png)
35
+
36
+ We present the [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model that outperforms Meta AI's [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) models significantly across diverse benchmarks as shown in the performance comparison plot in the next section. Our approach consists of three key steps:
37
+ 1. **Self-Curation**: We applied two self-curation methods to select approximately 1 million high-quality examples from a pool of ~2.8 million open-source examples. **Our curation criteria focused on educational value and difficulty level, using the same SLM for annotation instead of larger models (e.g. 70B, 405B).**
38
+ 2. **Targeted fine-tuning**: We performed [Spectrum](https://arxiv.org/abs/2406.06623)-based targeted fine-tuning over the Llama-3.1-8B-Instruct model. The Spectrum method accelerates training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. In our work, 50% of layers are frozen.
39
+ 3. **Model Merging**: We merged our fine-tuned model with the [Llama-Spark](https://huggingface.co/arcee-ai/Llama-Spark) model using [SLERP](https://huggingface.co/blog/mlabonne/merge-models#1-slerp) method. The merging method produces a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both its parents. [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) improves Llama-3.1-8B-Instruct across 10 diverse benchmarks. These benchmarks cover areas such as instruction-following, knowledge-driven QA, reasoning, truthful answer generation, and function calling.
40
+
41
+ ## πŸ† Introducing Llama-3.1-Storm-8B
42
+ [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) builds upon the foundation of Llama-3.1-8B-Instruct, aiming to enhance both conversational and function calling capabilities within the 8B parameter model class.
43
+
44
+ As shown in the left subplot of the above figure, [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model improves Meta-Llama-3.1-8B-Instruct across various benchmarks - Instruction-following ([IFEval](https://arxiv.org/abs/2311.07911)), Knowledge-driven QA benchmarks ([GPQA](https://arxiv.org/abs/2311.12022), [MMLU-Pro](https://arxiv.org/pdf/2406.01574)), Reasoning ([ARC-C](https://arxiv.org/abs/1803.05457), [MuSR](https://arxiv.org/abs/2310.16049), [BBH](https://arxiv.org/pdf/2210.09261)), Reduced Hallucinations ([TruthfulQA](https://arxiv.org/abs/2109.07958)), and Function-Calling ([BFCL](https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard)). This improvement is particularly significant for AI developers and enthusiasts who work with limited computational resources.
45
+
46
+ We also benchmarked our model with the recently published model [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) built on top of the Llama-3.1-8B-Instruct model. As shown in the right subplot of the above figure, **Llama-3.1-Storm-8B outperforms Hermes-3-Llama-3.1-8B on 7 out of 9 benchmarks**, with Hermes-3-Llama-3.1-8B surpassing Llama-3.1-Storm-8B on the MuSR benchmark and both models showing comparable performance on the BBH benchmark.
47
+
48
+
49
+ ## Llama-3.1-Storm-8B Model Strengths
50
+ Llama-3.1-Storm-8B is a powerful generalist model useful for diverse applications. We invite the AI community to explore [Llama-3.1-Storm-8B](https://huggingface.co/collections/akjindal53244/storm-66ba6c96b7e24ecb592787a9) and look forward to seeing how it will be utilized in various projects and applications.
51
+
52
+ <table>
53
+ <tr>
54
+ <td><strong>Model Strength</strong>
55
+ </td>
56
+ <td><strong>Relevant Benchmarks</strong>
57
+ </td>
58
+ <tr>
59
+ <tr>
60
+ <td>🎯 Improved Instruction Following
61
+ </td>
62
+ <td>IFEval Strict (+3.93%)
63
+ </td>
64
+ <tr>
65
+ <tr>
66
+ <td>🌐 Enhanced Knowledge Driven Question Answering
67
+ </td>
68
+ <td>GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%)
69
+ </td>
70
+ <tr>
71
+ <tr>
72
+ <td>🧠 Better Reasoning
73
+ </td>
74
+ <td>ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%)
75
+ </td>
76
+ <tr>
77
+ <tr>
78
+ <td>πŸ€– Superior Agentic Capabilities
79
+ </td>
80
+ <td>BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%)
81
+ </td>
82
+ <tr>
83
+ <tr>
84
+ <td>🚫 Reduced Hallucinations
85
+ </td>
86
+ <td>TruthfulQA (+9%)
87
+ </td>
88
+ <tr>
89
+ </table>
90
+
91
+ **Note**: All improvements are absolute gains over Meta-Llama-3.1-8B-Instruct.
92
+
93
+
94
+ ## Llama-3.1-Storm-8B Models
95
+ 1. `BF16`: [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)
96
+ 2. ⚑ `FP8`: [Llama-3.1-Storm-8B-FP8-Dynamic](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic)
97
+ 3. ⚑ `GGUF`: [Llama-3.1-Storm-8B-GGUF](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B-GGUF)
98
+
99
+ ## πŸ’» How to Use GGUF Model
100
+
101
+ ```bash
102
+ pip install llama-cpp-python
103
+ ```
104
+
105
+ ```python
106
+ from huggingface_hub import hf_hub_download
107
+ from llama_cpp import Llama
108
+
109
+ ## Download the GGUF model
110
+ model_name = "akjindal53244/Llama-3.1-Storm-8B-GGUF"
111
+ model_file = "Llama-3.1-Storm-8B.Q8_0.gguf" # this is the specific model file we'll use in this example. It's a 4-bit quant, but other levels of quantization are available in the model repo if preferred
112
+ model_path = hf_hub_download(model_name, filename=model_file)
113
+
114
+ ## Instantiate model from downloaded file
115
+ llm = Llama(
116
+ model_path=model_path,
117
+ n_ctx=16000, # Context length to use
118
+ n_threads=32, # Number of CPU threads to use
119
+ n_gpu_layers=0 # Number of model layers to offload to GPU
120
+ )
121
+
122
+ generation_kwargs = {
123
+ "max_tokens":200,
124
+ "stop":["<|eot_id|>"],
125
+ "echo":False, # Echo the prompt in the output
126
+ "top_k":1 # Set this value > 1 for sampling decoding
127
+ }
128
+
129
+ prompt = "What is 2+2?"
130
+ res = llm(prompt, **generation_kwargs)
131
+ print(res["choices"][0]["text"])
132
+ ```
133
+
134
+
135
+ ## Alignment Note
136
+ While **Llama-3.1-Storm-8B** did not undergo an explicit model alignment process, it may still retain some alignment properties inherited from the Meta-Llama-3.1-8B-Instruct model.
137
+
138
+ ## Cite Our Work
139
+ ```
140
+ @misc {ashvini_kumar_jindal_2024,
141
+ author = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} },
142
+ title = { Llama-3.1-Storm-8B },
143
+ year = 2024,
144
+ url = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B },
145
+ doi = { 10.57967/hf/2902 },
146
+ publisher = { Hugging Face }
147
+ }
148
+ ```
149
+
150
+ ## Support Our Work
151
+ With 3 team-members spanned across 3 different time-zones, we have won [NeurIPS LLM Efficiency Challenge 2023](https://llm-efficiency-challenge.github.io/) and 4 other competitions in Finance and Arabic LLM space. We have also published [SOTA mathematical reasoning model](https://huggingface.co/akjindal53244/Arithmo-Mistral-7B).
152
+
153
+ **Llama-3.1-Storm-8B** is our most valuable contribution so far towards the open-source community. We are committed in developing efficient generalist LLMs. **We're seeking both computational resources and innovative collaborators to drive this initiative forward.**
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "model_type": "llama"
3
+ }