|
--- |
|
license: openrail |
|
datasets: |
|
- JeanKaddour/minipile |
|
- Open-Orca/OpenOrca |
|
language: |
|
- en |
|
--- |
|
Micro Mistral |
|
This is a small mistral model with 6 layers |
|
|
|
It is similar to smol llama varaints uses GQA and tied embeddings. Except it uses mistral style arch with GQA and sliding window attention |
|
|
|
This architecture takes GQA and tied embeddings to create an effeceint 0.5B model that uses the mistral architecture(It is supported in downstream applications) |
|
|
|
Dataset |
|
Minipile Instruct Math OpenOrca Synthetic Data |
|
|
|
TODO: Complete Dataset section |