edwko commited on
Commit
b4f814f
1 Parent(s): 90390e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +265 -3
README.md CHANGED
@@ -1,3 +1,265 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ <style>
5
+ table {
6
+ border-collapse: collapse;
7
+ width: 100%;
8
+ margin-bottom: 20px;
9
+ }
10
+ th, td {
11
+ border: 1px solid #ddd;
12
+ padding: 8px;
13
+ text-align: center;
14
+ }
15
+ .best {
16
+ font-weight: bold;
17
+ text-decoration: underline;
18
+ }
19
+ </style>
20
+
21
+ <div style="text-align: center; margin: 20px auto; padding: 20px; border: 3px solid #ddd; border-radius: 10px;">
22
+ <h2 style="margin-bottom: 4px; margin-top: 0px;">OuteAI</h2>
23
+ <a href="https://www.outeai.com/" target="_blank" style="margin-right: 10px;">🌎 OuteAI.com</a>
24
+ <a href="https://discord.gg/vyBM87kAmf" target="_blank" style="margin-right: 10px;">🤝 Join our Discord</a>
25
+ <a href="https://x.com/OuteAI" target="_blank">𝕏 @OuteAI</a>
26
+ </div>
27
+
28
+ ## Introduction
29
+ We're excited to introduce our latest model, the Lite Oute 2 Mamba2Attn 250M. <br>
30
+ This is our third generation model featuring the new Mamba2 architecture with attention layers. <br>
31
+ If you're interested in more technical details that covers the training process, architecture, and performance: <a href="https://outeai.com/blog/lite-oute-2-mamba2attn" target="_blank">Read the full blog post here</a>
32
+
33
+ ## Model Variants
34
+ - [Lite-Oute-2-Mamba2Attn-Instruct](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-Instruct)
35
+ - [Lite-Oute-2-Mamba2Attn-Base](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-Base)
36
+
37
+ ## Training Details
38
+ The model was pre-trained on 30 billion tokens using a balanced mixture of datasets:
39
+ - **50% dclm-baseline-1.0**
40
+ - **50% fineweb-edu**
41
+
42
+ Base model training was conducted on single NVIDIA 4090 and NVIDIA H100 GPUs, with the following key parameters:
43
+ - **Max learning rate:** 4e-4
44
+ - **Min learning rate:** 1e-4
45
+ - **Block size:** 4096
46
+ - **Token batches:** ~100k tokens
47
+
48
+ For instruction training, we first trained the model with Supervised Fine-tuning (SFT) then further refined the model using DPO training.
49
+
50
+ ## Benchmark Results
51
+ <table>
52
+ <tr>
53
+ <th>Benchmark</th>
54
+ <th>Lite-Oute-2-Mamba2Attn-250M-Instruct</th>
55
+ <th>Lite-Oute-1-300M-Instruct</th>
56
+ <th>Lite-Mistral-150M-v2-Instruct</th>
57
+ </tr>
58
+ <tr>
59
+ <td>ARC-C (0-shot)</td>
60
+ <td class="best">26.71</td>
61
+ <td>26.02</td>
62
+ <td>-</td>
63
+ </tr>
64
+ <tr>
65
+ <td>ARC-E (0-shot)</td>
66
+ <td class="best">53.70</td>
67
+ <td>49.79</td>
68
+ <td>39.52</td>
69
+ </tr>
70
+ <tr>
71
+ <td>HellaSWAG (0-shot)</td>
72
+ <td class="best">38.19</td>
73
+ <td>34.50</td>
74
+ <td>31.01</td>
75
+ </tr>
76
+ <tr>
77
+ <td>MMLU (0-shot)</td>
78
+ <td class="best">25.13</td>
79
+ <td>24.00</td>
80
+ <td>25.28</td>
81
+ </tr>
82
+ <tr>
83
+ <td>OpenBookQA (0-shot)</td>
84
+ <td class="best">32.20</td>
85
+ <td>32.20</td>
86
+ <td>28.40</td>
87
+ </tr>
88
+ <tr>
89
+ <td>PIQA (0-shot)</td>
90
+ <td class="best">66.59</td>
91
+ <td>65.40</td>
92
+ <td>60.23</td>
93
+ </tr>
94
+ <tr>
95
+ <td>Winogrande (0-shot)</td>
96
+ <td>53.28</td>
97
+ <td class="best">53.75</td>
98
+ <td>51.78</td>
99
+ </tr>
100
+
101
+ <tr>
102
+ <td>ARC-C (5-shot)</td>
103
+ <td class="best">27.82</td>
104
+ <td>26.37</td>
105
+ <td>-</td>
106
+ </tr>
107
+ <tr>
108
+ <td>ARC-E (5-shot)</td>
109
+ <td class="best">57.28</td>
110
+ <td>51.43</td>
111
+ <td>46.30</td>
112
+ </tr>
113
+ <tr>
114
+ <td>HellaSWAG (5-shot)</td>
115
+ <td class="best">37.71</td>
116
+ <td>34.93</td>
117
+ <td>30.73</td>
118
+ </tr>
119
+ <tr>
120
+ <td>MMLU (5-shot)</td>
121
+ <td>25.81</td>
122
+ <td class="best">25.87</td>
123
+ <td>-</td>
124
+ </tr>
125
+ <tr>
126
+ <td>OpenBookQA (5-shot)</td>
127
+ <td class="best">31.80</td>
128
+ <td>31.40</td>
129
+ <td>28.00</td>
130
+ </tr>
131
+ <tr>
132
+ <td>PIQA (5-shot)</td>
133
+ <td class="best">66.38</td>
134
+ <td>65.07</td>
135
+ <td>62.68</td>
136
+ </tr>
137
+ <tr>
138
+ <td>Winogrande (5-shot)</td>
139
+ <td>51.14</td>
140
+ <td class="best">52.01</td>
141
+ <td>51.30</td>
142
+ </tr>
143
+ </table>
144
+
145
+ ## Interfacing with the Instruct Model
146
+ > [!IMPORTANT]
147
+ > To ensure optimal performance, please use the following template when interacting with the model:
148
+
149
+ ```
150
+ <s>system\n
151
+ {System message}</s>
152
+ <s>user\n
153
+ {Your question or message}</s>
154
+ <s>assistant\n
155
+ {The model's response}</s>
156
+ ```
157
+
158
+ When prompting the model for a response, your input should be structured as follows:
159
+
160
+ ```
161
+ <s>system\n
162
+ {System message}</s>
163
+ <s>user\n
164
+ {Your question or message}</s>
165
+ <s>assistant\n
166
+ ```
167
+
168
+ This format ensures the best performance, as the model was fine-tuned using this specific template.
169
+
170
+ ### Recommended Settings
171
+ - **Temperature:** 0.1 - 0.4
172
+ - **Repetition Penalty:** 1.1 - 1.14
173
+
174
+ If you're using a Hugging Face tokenizer, make sure to use `tokenizer.apply_chat_template` or adjust your interface to match the template above.
175
+
176
+ ## Let's look at some real world examples
177
+
178
+ ```
179
+ <s>user
180
+ What are three key benefits of electric vehicles compared to traditional gasoline-powered cars?</s>
181
+ <s>assistant
182
+ Electric vehicles have several advantages over gasoline-powered cars. Here are three key benefits:
183
+ 1. Efficient: Electric vehicles use less fuel and emit fewer emissions than gasoline-powered cars, which can save consumers money on their monthly bills.
184
+ 2. Fuel economy: Electric vehicles produce more power per mile driven, which means they consume less fuel overall. This helps reduce the need for refueling stations and reduces carbon emissions from transportation.
185
+ 3. Flexibility: Electric vehicles can be adapted to different driving conditions, such as snowy or rainy days, and can accommodate different types of terrain. This makes them easier to operate in urban areas with limited parking spaces.
186
+ Overall, electric vehicles offer a range of benefits that make them an attractive choice for many drivers. They provide a cost-effective alternative to gasoline-powered cars, reducing the need for frequent refueling stations, and making it easier to drive long distances without having to worry about running out of gas.</s>
187
+ ```
188
+
189
+ ```
190
+ <s>user
191
+ Who wrote the play "Romeo and Juliet"?</s>
192
+ <s>assistant
193
+ The play "Romeo and Juliet" by William Shakespeare is a classic example of a tragic love story. It tells the story of Romeo and Juliet, a young couple who fall in love with each other but ultimately lose their lives due to their respective flaws. The play was written during the Renaissance period and it has been performed around the world since then.</s>
194
+ ```
195
+
196
+ ## Usage with HuggingFace transformers
197
+ Model weights were converted to be Hugging Face compatible, with custom modeling files included due to the lack of official support for Mamba2 attention layers.
198
+
199
+ To speed up inference, we recommend installing mamba-ssm and flash attention 2.
200
+
201
+ mamba-ssm:
202
+ ```bash
203
+ pip install causal-conv1d>=1.4.0
204
+ pip install mamba-ssm
205
+ ```
206
+
207
+ flash attention 2:
208
+ ```bash
209
+ pip install flash-attn --no-build-isolation
210
+ ```
211
+
212
+ ```python
213
+ import torch
214
+ from transformers import AutoModelForCausalLM, AutoTokenizer
215
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
216
+ model = AutoModelForCausalLM.from_pretrained(
217
+ "OuteAI/Lite-Oute-2-Mamba2Attn-Instruct",
218
+ # To allow custom modeling files
219
+ trust_remote_code=True,
220
+
221
+ # If you have installed flash attention 2
222
+ # attn_implementation="flash_attention_2",
223
+ )
224
+ model.to(device)
225
+ tokenizer = AutoTokenizer.from_pretrained("OuteAI/Lite-Oute-2-Mamba2Attn-Instruct")
226
+
227
+ def generate_response(message: str, temperature: float = 0.1, repetition_penalty: float = 1.12) -> str:
228
+ # Apply the chat template and convert to PyTorch tensors
229
+ messages = [
230
+ {"role": "system", "content": "You are an AI assistant. You will be given a task. You must generate a detailed answer."},
231
+ {"role": "user", "content": message}
232
+ ]
233
+ input_ids = tokenizer.apply_chat_template(
234
+ messages, add_generation_prompt=True, return_tensors="pt"
235
+ ).to(device)
236
+ # Generate the response
237
+ output = model.generate(
238
+ input_ids,
239
+ max_length=512,
240
+ temperature=temperature,
241
+ repetition_penalty=repetition_penalty,
242
+ do_sample=True
243
+ )
244
+ # Decode the generated output
245
+ generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
246
+ return generated_text
247
+ message = "Who wrote the play Romeo and Juliet?"
248
+ response = generate_response(message)
249
+ print(response)
250
+ ```
251
+
252
+ ## Fine-Tuning for Specific Tasks:
253
+ For optimal task-specific performance, it is recommended to fine-tune the model on datasets relevant to the intended use case. <br>
254
+ Fine-tuning can significantly improve the model's accuracy, relevance, and depth of knowledge in specialized domains or for particular types of tasks. <br>
255
+ This process allows the model to adapt its pre-trained knowledge to the nuances and requirements of specific applications, potentially mitigating some of the weaknesses observed in general use.
256
+
257
+ ## Conclusion
258
+ The Lite-Oute-2-Mamba2Attn-250M-Instruct model shows significant improvements over previous versions, particularly in zero-shot and few-shot learning tasks. <br>
259
+ Despite its smaller size, it outperforms older models in most benchmarks, demonstrating better common sense reasoning, language understanding, and general knowledge. <br>
260
+ While it can provide coherent responses to various queries, it has limitations due to its small size (250 million parameters). <br>
261
+ Users should be aware that it may produce incorrect outputs and should not be relied upon for critical decisions without verification. <br>
262
+ It's best suited for lightweight applications or as a basis for further development.
263
+
264
+ ## Disclaimer
265
+ By using this model, you acknowledge that you understand and assume the risks associated with its use. You are solely responsible for ensuring compliance with all applicable laws and regulations. We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.