Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,16 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
**Key features**
|
4 |
- Released under the Apache 2 License
|
@@ -6,14 +18,17 @@ Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained
|
|
6 |
- Trained with a 128k context window
|
7 |
- Trained on a large proportion of multilingual and code data
|
8 |
|
9 |
-
|
10 |
-
license: apache-2.0
|
11 |
-
---
|
12 |
|
13 |
-
|
14 |
-
|
|
|
15 |
|
16 |
-
Mistral-NeMo
|
|
|
|
|
|
|
|
|
17 |
|
18 |
- Layers: 40
|
19 |
- Dim: 5,120
|
@@ -24,11 +39,12 @@ Mistral-NeMo is a transformer model, with the following architecture choices:
|
|
24 |
- Number of kv-heads: 8 (GQA)
|
25 |
- Rotary embeddings (theta = 1M)
|
26 |
- Vocabulary size: 2**17 ~= 128k
|
|
|
|
|
27 |
|
28 |
-
|
29 |
-
|
30 |
-
Main benchmarks
|
31 |
|
|
|
32 |
- HellaSwag (0-shot): 83.5%
|
33 |
- Winogrande (0-shot): 76.8%
|
34 |
- OpenBookQA (0-shot): 60.6%
|
@@ -38,9 +54,9 @@ Main benchmarks
|
|
38 |
- TriviaQA (5-shot): 73.8%
|
39 |
- NaturalQuestions (5-shot): 31.2%
|
40 |
|
41 |
-
Multilingual benchmarks
|
42 |
|
43 |
-
MMLU
|
44 |
- French: 62.3%
|
45 |
- German: 62.7%
|
46 |
- Spanish: 64.6%
|
@@ -48,4 +64,4 @@ MMLU
|
|
48 |
- Portuguese: 63.3%
|
49 |
- Russian: 59.2%
|
50 |
- Chinese: 59.0%
|
51 |
-
-Japanese: 59.0%
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- nvidia
|
5 |
+
---
|
6 |
+
|
7 |
+
## Mistral-NeMo-12B-Base
|
8 |
+
|
9 |
+
[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)[![Model size](https://img.shields.io/badge/Params-12B-green)](#model-architecture)[![Language](https://img.shields.io/badge/Language-Multilingual-green)](#datasets)
|
10 |
+
|
11 |
+
### Model Overview:
|
12 |
+
|
13 |
+
Mistral-NeMo-12B-Base is a Large Language Model (LLM) composed of 12B parameters, trained jointly by NVIDIA and Mistral AI. It significantly outperforms existing models smaller or similar in size.
|
14 |
|
15 |
**Key features**
|
16 |
- Released under the Apache 2 License
|
|
|
18 |
- Trained with a 128k context window
|
19 |
- Trained on a large proportion of multilingual and code data
|
20 |
|
21 |
+
### Intended use
|
|
|
|
|
22 |
|
23 |
+
Mistral-NeMo-12B-Base is a completion model intended for use in over 80+ programming languages and designed for global, multilingual applications. It is fast, trained on function-calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is compatible with [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html). For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner). Refer to the [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron/index.html) for examples.
|
24 |
+
|
25 |
+
**Model Developer:** [NVIDIA](https://www.nvidia.com/en-us/) and [MistralAI](https://mistral.ai/)
|
26 |
|
27 |
+
**Model Dates:** Mistral-NeMo-12B-Base was trained between 2023 and July 2024.
|
28 |
+
|
29 |
+
### Model Architecture:
|
30 |
+
|
31 |
+
Mistral-NeMo-12B-Base is a transformer model, with the following architecture choices:
|
32 |
|
33 |
- Layers: 40
|
34 |
- Dim: 5,120
|
|
|
39 |
- Number of kv-heads: 8 (GQA)
|
40 |
- Rotary embeddings (theta = 1M)
|
41 |
- Vocabulary size: 2**17 ~= 128k
|
42 |
+
|
43 |
+
**Architecture Type:** Transformer Decoder (auto-regressive language model)
|
44 |
|
45 |
+
### Evaluation Results
|
|
|
|
|
46 |
|
47 |
+
**Main Benchmarks**
|
48 |
- HellaSwag (0-shot): 83.5%
|
49 |
- Winogrande (0-shot): 76.8%
|
50 |
- OpenBookQA (0-shot): 60.6%
|
|
|
54 |
- TriviaQA (5-shot): 73.8%
|
55 |
- NaturalQuestions (5-shot): 31.2%
|
56 |
|
57 |
+
**Multilingual benchmarks**
|
58 |
|
59 |
+
Multilingual MMLU in 5-shot setting:
|
60 |
- French: 62.3%
|
61 |
- German: 62.7%
|
62 |
- Spanish: 64.6%
|
|
|
64 |
- Portuguese: 63.3%
|
65 |
- Russian: 59.2%
|
66 |
- Chinese: 59.0%
|
67 |
+
- Japanese: 59.0%
|