NeMo
nvidia
shrimai19 commited on
Commit
f96a64a
1 Parent(s): 9a02a14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -13
README.md CHANGED
@@ -1,4 +1,16 @@
1
- Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained jointly by Mistral AI and NVIDIA. It significantly outperforms existing models smaller or similar in size.
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  **Key features**
4
  - Released under the Apache 2 License
@@ -6,14 +18,17 @@ Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained
6
  - Trained with a 128k context window
7
  - Trained on a large proportion of multilingual and code data
8
 
9
- ---
10
- license: apache-2.0
11
- ---
12
 
13
- ---
14
- Model Architecture
 
15
 
16
- Mistral-NeMo is a transformer model, with the following architecture choices:
 
 
 
 
17
 
18
  - Layers: 40
19
  - Dim: 5,120
@@ -24,11 +39,12 @@ Mistral-NeMo is a transformer model, with the following architecture choices:
24
  - Number of kv-heads: 8 (GQA)
25
  - Rotary embeddings (theta = 1M)
26
  - Vocabulary size: 2**17 ~= 128k
 
 
27
 
28
- ---
29
-
30
- Main benchmarks
31
 
 
32
  - HellaSwag (0-shot): 83.5%
33
  - Winogrande (0-shot): 76.8%
34
  - OpenBookQA (0-shot): 60.6%
@@ -38,9 +54,9 @@ Main benchmarks
38
  - TriviaQA (5-shot): 73.8%
39
  - NaturalQuestions (5-shot): 31.2%
40
 
41
- Multilingual benchmarks
42
 
43
- MMLU
44
  - French: 62.3%
45
  - German: 62.7%
46
  - Spanish: 64.6%
@@ -48,4 +64,4 @@ MMLU
48
  - Portuguese: 63.3%
49
  - Russian: 59.2%
50
  - Chinese: 59.0%
51
- -Japanese: 59.0%
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - nvidia
5
+ ---
6
+
7
+ ## Mistral-NeMo-12B-Base
8
+
9
+ [![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)[![Model size](https://img.shields.io/badge/Params-12B-green)](#model-architecture)[![Language](https://img.shields.io/badge/Language-Multilingual-green)](#datasets)
10
+
11
+ ### Model Overview:
12
+
13
+ Mistral-NeMo-12B-Base is a Large Language Model (LLM) composed of 12B parameters, trained jointly by NVIDIA and Mistral AI. It significantly outperforms existing models smaller or similar in size.
14
 
15
  **Key features**
16
  - Released under the Apache 2 License
 
18
  - Trained with a 128k context window
19
  - Trained on a large proportion of multilingual and code data
20
 
21
+ ### Intended use
 
 
22
 
23
+ Mistral-NeMo-12B-Base is a completion model intended for use in over 80+ programming languages and designed for global, multilingual applications. It is fast, trained on function-calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is compatible with [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html). For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner). Refer to the [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron/index.html) for examples.
24
+
25
+ **Model Developer:** [NVIDIA](https://www.nvidia.com/en-us/) and [MistralAI](https://mistral.ai/)
26
 
27
+ **Model Dates:** Mistral-NeMo-12B-Base was trained between 2023 and July 2024.
28
+
29
+ ### Model Architecture:
30
+
31
+ Mistral-NeMo-12B-Base is a transformer model, with the following architecture choices:
32
 
33
  - Layers: 40
34
  - Dim: 5,120
 
39
  - Number of kv-heads: 8 (GQA)
40
  - Rotary embeddings (theta = 1M)
41
  - Vocabulary size: 2**17 ~= 128k
42
+
43
+ **Architecture Type:** Transformer Decoder (auto-regressive language model)
44
 
45
+ ### Evaluation Results
 
 
46
 
47
+ **Main Benchmarks**
48
  - HellaSwag (0-shot): 83.5%
49
  - Winogrande (0-shot): 76.8%
50
  - OpenBookQA (0-shot): 60.6%
 
54
  - TriviaQA (5-shot): 73.8%
55
  - NaturalQuestions (5-shot): 31.2%
56
 
57
+ **Multilingual benchmarks**
58
 
59
+ Multilingual MMLU in 5-shot setting:
60
  - French: 62.3%
61
  - German: 62.7%
62
  - Spanish: 64.6%
 
64
  - Portuguese: 63.3%
65
  - Russian: 59.2%
66
  - Chinese: 59.0%
67
+ - Japanese: 59.0%