Update README.md - Add model Details

#8
by Citaman - opened
Files changed (1) hide show
  1. README.md +56 -2
README.md CHANGED
@@ -2,8 +2,62 @@
2
  license: apache-2.0
3
  ---
4
  # Grok-1
 
5
 
6
- This repository contains the weights of the Grok-1 open-weights model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
9
 
@@ -18,4 +72,4 @@ You should be seeing output from the language model.
18
 
19
  Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
20
 
21
- p.s. we're hiring: https://x.ai/career
 
2
  license: apache-2.0
3
  ---
4
  # Grok-1
5
+ _This repository contains the weights of the Grok-1 open-weights model._
6
 
7
+ ╔══════════════════════════╗
8
+ β•‘ _______ β•‘
9
+ β•‘ /\ |_ _| β•‘
10
+ β•‘ __ __ / \ | | β•‘
11
+ β•‘ \ \/ / / /\ \ | | β•‘
12
+ β•‘ > < / ____ \ _| |_ β•‘
13
+ β•‘ /_/\_\/_/ \_\_____| β•‘
14
+ β•‘ β•‘
15
+ β•‘ Understand the Universe β•‘
16
+ β•‘ [https://x.ai] β•‘
17
+ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•β•β•β•
18
+ β•”β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•—
19
+ β•‘ xAI Grok-1 (314B) β•‘
20
+ β•šβ•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•
21
+ β•”β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•—
22
+ β•‘ 314B parameter Mixture of Experts model β•‘
23
+ β•‘ - Base model (not finetuned) β•‘
24
+ β•‘ - 8 experts (2 active) β•‘
25
+ β•‘ - 86B active parameters β•‘
26
+ β•‘ - Apache 2.0 license β•‘
27
+ β•‘ - Code: https://github.com/xai-org/grok-1 β•‘
28
+ β•‘ - Happy coding! β•‘
29
+ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
30
+
31
+
32
+ ## Model Configuration Details
33
+
34
+ **Vocabulary Size**: 131,072
35
+
36
+ **Special Tokens**:
37
+ - Pad Token: 0
38
+ - End of Sequence Token: 2
39
+
40
+ **Sequence Length**: 8192
41
+
42
+ ### **Model Architecture**: MoE
43
+ - **Embedding Size**: 6,144
44
+ - **Layers**: 64
45
+ - **Experts**: 8
46
+ - **Selected Experts**: 2
47
+ - **Widening Factor**: 8
48
+ - **Key Size**: 128
49
+ - **Query Heads**: 48
50
+ - **Key Value Heads**: 8
51
+ - **Activation Sharding**: Data-wise, Model-wise
52
+
53
+ ### **Inference Configuration**:
54
+ - Batch Size per Device: 0.125
55
+ - Tokenizer: `./tokenizer.model`
56
+ - Local Mesh: 1x8
57
+ - Between Hosts: 1x1
58
+
59
+
60
+ ## Inference Details
61
 
62
  Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
63
 
 
72
 
73
  Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
74
 
75
+ **p.s. we're hiring: https://x.ai/career**