RichardErkhov commited on
Commit
5341d89
1 Parent(s): 5715c92

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +292 -0
README.md ADDED
@@ -0,0 +1,292 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ stablelm-3b-4e1t - GGUF
11
+ - Model creator: https://huggingface.co/stabilityai/
12
+ - Original model: https://huggingface.co/stabilityai/stablelm-3b-4e1t/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [stablelm-3b-4e1t.Q2_K.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q2_K.gguf) | Q2_K | 1.01GB |
18
+ | [stablelm-3b-4e1t.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.IQ3_XS.gguf) | IQ3_XS | 1.11GB |
19
+ | [stablelm-3b-4e1t.IQ3_S.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.IQ3_S.gguf) | IQ3_S | 1.17GB |
20
+ | [stablelm-3b-4e1t.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q3_K_S.gguf) | Q3_K_S | 1.17GB |
21
+ | [stablelm-3b-4e1t.IQ3_M.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.IQ3_M.gguf) | IQ3_M | 1.23GB |
22
+ | [stablelm-3b-4e1t.Q3_K.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q3_K.gguf) | Q3_K | 1.3GB |
23
+ | [stablelm-3b-4e1t.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q3_K_M.gguf) | Q3_K_M | 1.3GB |
24
+ | [stablelm-3b-4e1t.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q3_K_L.gguf) | Q3_K_L | 1.4GB |
25
+ | [stablelm-3b-4e1t.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.IQ4_XS.gguf) | IQ4_XS | 1.43GB |
26
+ | [stablelm-3b-4e1t.Q4_0.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q4_0.gguf) | Q4_0 | 1.5GB |
27
+ | [stablelm-3b-4e1t.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.IQ4_NL.gguf) | IQ4_NL | 1.51GB |
28
+ | [stablelm-3b-4e1t.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q4_K_S.gguf) | Q4_K_S | 1.51GB |
29
+ | [stablelm-3b-4e1t.Q4_K.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q4_K.gguf) | Q4_K | 1.59GB |
30
+ | [stablelm-3b-4e1t.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q4_K_M.gguf) | Q4_K_M | 1.59GB |
31
+ | [stablelm-3b-4e1t.Q4_1.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q4_1.gguf) | Q4_1 | 1.65GB |
32
+ | [stablelm-3b-4e1t.Q5_0.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q5_0.gguf) | Q5_0 | 1.81GB |
33
+ | [stablelm-3b-4e1t.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q5_K_S.gguf) | Q5_K_S | 1.81GB |
34
+ | [stablelm-3b-4e1t.Q5_K.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q5_K.gguf) | Q5_K | 1.86GB |
35
+ | [stablelm-3b-4e1t.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q5_K_M.gguf) | Q5_K_M | 1.86GB |
36
+ | [stablelm-3b-4e1t.Q5_1.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q5_1.gguf) | Q5_1 | 1.96GB |
37
+ | [stablelm-3b-4e1t.Q6_K.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q6_K.gguf) | Q6_K | 2.14GB |
38
+ | [stablelm-3b-4e1t.Q8_0.gguf](https://huggingface.co/RichardErkhov/stabilityai_-_stablelm-3b-4e1t-gguf/blob/main/stablelm-3b-4e1t.Q8_0.gguf) | Q8_0 | 2.77GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ language:
46
+ - en
47
+ license: cc-by-sa-4.0
48
+ tags:
49
+ - causal-lm
50
+ datasets:
51
+ - tiiuae/falcon-refinedweb
52
+ - togethercomputer/RedPajama-Data-1T
53
+ - CarperAI/pilev2-dev
54
+ - bigcode/starcoderdata
55
+ - allenai/peS2o
56
+ model-index:
57
+ - name: stablelm-3b-4e1t
58
+ results:
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: AI2 Reasoning Challenge (25-Shot)
64
+ type: ai2_arc
65
+ config: ARC-Challenge
66
+ split: test
67
+ args:
68
+ num_few_shot: 25
69
+ metrics:
70
+ - type: acc_norm
71
+ value: 46.59
72
+ name: normalized accuracy
73
+ source:
74
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: HellaSwag (10-Shot)
81
+ type: hellaswag
82
+ split: validation
83
+ args:
84
+ num_few_shot: 10
85
+ metrics:
86
+ - type: acc_norm
87
+ value: 75.94
88
+ name: normalized accuracy
89
+ source:
90
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
91
+ name: Open LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: MMLU (5-Shot)
97
+ type: cais/mmlu
98
+ config: all
99
+ split: test
100
+ args:
101
+ num_few_shot: 5
102
+ metrics:
103
+ - type: acc
104
+ value: 45.23
105
+ name: accuracy
106
+ source:
107
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: TruthfulQA (0-shot)
114
+ type: truthful_qa
115
+ config: multiple_choice
116
+ split: validation
117
+ args:
118
+ num_few_shot: 0
119
+ metrics:
120
+ - type: mc2
121
+ value: 37.2
122
+ source:
123
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
124
+ name: Open LLM Leaderboard
125
+ - task:
126
+ type: text-generation
127
+ name: Text Generation
128
+ dataset:
129
+ name: Winogrande (5-shot)
130
+ type: winogrande
131
+ config: winogrande_xl
132
+ split: validation
133
+ args:
134
+ num_few_shot: 5
135
+ metrics:
136
+ - type: acc
137
+ value: 71.19
138
+ name: accuracy
139
+ source:
140
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
141
+ name: Open LLM Leaderboard
142
+ - task:
143
+ type: text-generation
144
+ name: Text Generation
145
+ dataset:
146
+ name: GSM8k (5-shot)
147
+ type: gsm8k
148
+ config: main
149
+ split: test
150
+ args:
151
+ num_few_shot: 5
152
+ metrics:
153
+ - type: acc
154
+ value: 3.34
155
+ name: accuracy
156
+ source:
157
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
158
+ name: Open LLM Leaderboard
159
+ ---
160
+ # `StableLM-3B-4E1T`
161
+
162
+ ## Model Description
163
+
164
+ `StableLM-3B-4E1T` is a 3 billion parameter decoder-only language model pre-trained on 1 trillion tokens of diverse English and code datasets for 4 epochs.
165
+
166
+ ## Usage
167
+
168
+ Get started generating text with `StableLM-3B-4E1T` by using the following code snippet:
169
+
170
+ ```python
171
+ from transformers import AutoModelForCausalLM, AutoTokenizer
172
+ tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t")
173
+ model = AutoModelForCausalLM.from_pretrained(
174
+ "stabilityai/stablelm-3b-4e1t",
175
+ torch_dtype="auto",
176
+ )
177
+ model.cuda()
178
+ inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to(model.device)
179
+ tokens = model.generate(
180
+ **inputs,
181
+ max_new_tokens=64,
182
+ temperature=0.75,
183
+ top_p=0.95,
184
+ do_sample=True,
185
+ )
186
+ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
187
+ ```
188
+
189
+ ### Run with Flash Attention 2 ⚡️
190
+
191
+ <details>
192
+ <summary> Click to expand </summary>
193
+
194
+ ```python
195
+ from transformers import AutoModelForCausalLM, AutoTokenizer
196
+ tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t")
197
+ model = AutoModelForCausalLM.from_pretrained(
198
+ "stabilityai/stablelm-3b-4e1t",
199
+ torch_dtype="auto",
200
+ attn_implementation="flash_attention_2",
201
+ )
202
+ model.cuda()
203
+ inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to(model.device)
204
+ tokens = model.generate(
205
+ **inputs,
206
+ max_new_tokens=64,
207
+ temperature=0.75,
208
+ top_p=0.95,
209
+ do_sample=True,
210
+ )
211
+ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
212
+ ```
213
+
214
+ </details>
215
+
216
+
217
+ ## Model Details
218
+
219
+ * **Developed by**: [Stability AI](https://stability.ai/)
220
+ * **Model type**: `StableLM-3B-4E1T` models are auto-regressive language models based on the transformer decoder architecture.
221
+ * **Language(s)**: English
222
+ * **Library**: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
223
+ * **License**: Model checkpoints are licensed under the Creative Commons license ([CC BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)). Under this license, you must give [credit](https://creativecommons.org/licenses/by/4.0/#) to Stability AI, provide a link to the license, and [indicate if changes were made](https://creativecommons.org/licenses/by/4.0/#). You may do so in any reasonable manner, but not in any way that suggests the Stability AI endorses you or your use.
224
+ * **Contact**: For questions and comments about the model, please email `[email protected]`
225
+
226
+ ### Model Architecture
227
+
228
+ The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)) architecture with the following modifications:
229
+
230
+ | Parameters | Hidden Size | Layers | Heads | Sequence Length |
231
+ |----------------|-------------|--------|-------|-----------------|
232
+ | 2,795,443,200 | 2560 | 32 | 32 | 4096 |
233
+
234
+ * **Position Embeddings**: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864)) applied to the first 25% of head embedding dimensions for improved throughput following [Black et al. (2022)](https://arxiv.org/pdf/2204.06745.pdf).
235
+ * **Normalization**: LayerNorm ([Ba et al., 2016](https://arxiv.org/abs/1607.06450)) with learned bias terms as opposed to RMSNorm ([Zhang & Sennrich, 2019](https://arxiv.org/abs/1910.07467)).
236
+ * **Tokenizer**: GPT-NeoX ([Black et al., 2022](https://arxiv.org/abs/2204.06745)).
237
+
238
+ ## Training
239
+
240
+ For complete dataset and training details, please see the [StableLM-3B-4E1T Technical Report](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo).
241
+
242
+ ### Training Dataset
243
+
244
+ The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), RedPajama-Data ([Together Computer., 2023](https://github.com/togethercomputer/RedPajama-Data)) and The Pile ([Gao et al., 2020](https://arxiv.org/abs/2101.00027)) both without the *Books3* subset, and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)).
245
+
246
+ * Given the large amount of web data, we recommend fine-tuning the base StableLM-3B-4E1T for your downstream tasks.
247
+
248
+ ### Training Procedure
249
+
250
+ The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW, and trained using the NeoX tokenizer with a vocabulary size of 50,257. We outline the complete hyperparameters choices in the project's [GitHub repository - config](https://github.com/Stability-AI/StableLM/blob/main/configs/stablelm-3b-4e1t.yml).
251
+
252
+ ### Training Infrastructure
253
+
254
+ * **Hardware**: `StableLM-3B-4E1T` was trained on the Stability AI cluster across 256 NVIDIA A100 40GB GPUs (AWS P4d instances). Training began on August 23, 2023, and took approximately 30 days to complete.
255
+
256
+ * **Software**: We use a fork of `gpt-neox` ([EleutherAI, 2021](https://github.com/EleutherAI/gpt-neox)), train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 ([Rajbhandari et al., 2019](https://arxiv.org/abs/1910.02054v3)), and rely on flash-attention as well as SwiGLU and Rotary Embedding kernels from FlashAttention-2 ([Dao et al., 2023](https://tridao.me/publications/flash2/flash2.pdf))
257
+
258
+ ## Use and Limitations
259
+
260
+ ### Intended Use
261
+
262
+ The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications.
263
+
264
+ ### Limitations and Bias
265
+
266
+ As a base model, this model may exhibit unreliable, unsafe, or other undesirable behaviors that must be corrected through evaluation and fine-tuning prior to deployment. The pre-training dataset may have contained offensive or inappropriate content, even after applying data cleansing filters, which can be reflected in the model-generated text. We recommend that users exercise caution when using these models in production systems. Do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others.
267
+
268
+ ## How to Cite
269
+
270
+ ```bibtex
271
+ @misc{StableLM-3B-4E1T,
272
+ url={[https://huggingface.co/stabilityai/stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t)},
273
+ title={StableLM 3B 4E1T},
274
+ author={Tow, Jonathan and Bellagente, Marco and Mahan, Dakota and Riquelme, Carlos}
275
+ }
276
+ ```
277
+
278
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
279
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_stabilityai__stablelm-3b-4e1t)
280
+
281
+ | Metric |Value|
282
+ |---------------------------------|----:|
283
+ |Avg. |46.58|
284
+ |AI2 Reasoning Challenge (25-Shot)|46.59|
285
+ |HellaSwag (10-Shot) |75.94|
286
+ |MMLU (5-Shot) |45.23|
287
+ |TruthfulQA (0-shot) |37.20|
288
+ |Winogrande (5-shot) |71.19|
289
+ |GSM8k (5-shot) | 3.34|
290
+
291
+
292
+