File size: 459 Bytes
1de83e7
fcb62c1
1de83e7
fcb62c1
1de83e7
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
### Micro Mistral

This is a small mistral model with 6 layers

It is similar to smol llama varaints uses GQA and tied embeddings.
Except it uses mistral style arch with GQA and sliding window attention

This architecture takes GQA and tied embeddings to create an effeceint 0.5B model that uses the mistral architecture(It is supported in downstream applications)

#### Dataset

Minipile
Instruct
Math
OpenOrca
Synthetic Data

TODO: Complete Dataset section