legolasyiu
commited on
Commit
•
a2905ab
1
Parent(s):
20b20bc
Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,37 @@ tags:
|
|
11 |
- trl
|
12 |
---
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
# Uploaded model
|
15 |
|
16 |
- **Developed by:** EpistemeAI
|
|
|
11 |
- trl
|
12 |
---
|
13 |
|
14 |
+
# Fireball-Mistral-Nemo-12B-Philos
|
15 |
+
Supervised Fined tuned by dataset of philosophy, math, coding and languages.
|
16 |
+
|
17 |
+
# Original Model Card
|
18 |
+
|
19 |
+
# Model Card for Mistral-Nemo-Instruct-2407
|
20 |
+
|
21 |
+
The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.
|
22 |
+
|
23 |
+
For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-nemo/).
|
24 |
+
|
25 |
+
## Key features
|
26 |
+
- Released under the **Apache 2 License**
|
27 |
+
- Pre-trained and instructed versions
|
28 |
+
- Trained with a **128k context window**
|
29 |
+
- Trained on a large proportion of **multilingual and code data**
|
30 |
+
- Drop-in replacement of Mistral 7B
|
31 |
+
|
32 |
+
## Model Architecture
|
33 |
+
Mistral Nemo is a transformer model, with the following architecture choices:
|
34 |
+
- **Layers:** 40
|
35 |
+
- **Dim:** 5,120
|
36 |
+
- **Head dim:** 128
|
37 |
+
- **Hidden dim:** 14,336
|
38 |
+
- **Activation Function:** SwiGLU
|
39 |
+
- **Number of heads:** 32
|
40 |
+
- **Number of kv-heads:** 8 (GQA)
|
41 |
+
- **Vocabulary size:** 2**17 ~= 128k
|
42 |
+
- **Rotary embeddings (theta = 1M)**
|
43 |
+
|
44 |
+
|
45 |
# Uploaded model
|
46 |
|
47 |
- **Developed by:** EpistemeAI
|