|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
<p style="font-size:20px;" align="center"> |
|
<div style="width: 100%; height: 300px; overflow: hidden; border-radius: 15px; margin: auto; position: relative;"> |
|
<img |
|
src="https://cdn.shuttleai.app/thumbnail.png" |
|
alt="ShuttleAI Thumbnail" |
|
style="width: 100%; height: auto; display: block; margin: auto; position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); object-fit: cover;"> |
|
</div> |
|
|
|
<p align="center"> |
|
💻 <a href="https://shuttleai.app/" target="_blank">Use via API</a> |
|
</p> |
|
|
|
## Shuttle-2.5-mini [2024/07/26] |
|
|
|
We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks. |
|
|
|
- **Shuttle-2.5-mini** is a fine-tuned version of [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), emulating the writing style of Claude 3 models and thoroughly trained on role-playing data. |
|
|
|
## Model Details |
|
|
|
* **Model Name**: Shuttle-2.5-mini |
|
* **Developed by**: ShuttleAI Inc. |
|
* **Base Model**: [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) |
|
* **Parameters**: 13B |
|
* **Language(s)**: Multilingual |
|
* **Repository**: [https://huggingface.co/shuttleai](https://huggingface.co/shuttleai) |
|
* **Fine-Tuned Model**: [https://huggingface.co/shuttleai/shuttle-2.5-mini](https://huggingface.co/shuttleai/shuttle-2.5-mini) |
|
* **Paper**: Shuttle-2.5-mini (Upcoming) |
|
* **License**: Apache 2.0 |
|
|
|
## Base Model Architecture |
|
|
|
**Mistral Nemo** is a transformer model with the following architecture choices: |
|
|
|
- **Layers**: 40 |
|
- **Dimension**: 5,120 |
|
- **Head Dimension**: 128 |
|
- **Hidden Dimension**: 14,436 |
|
- **Activation Function**: SwiGLU |
|
- **Number of Heads**: 32 |
|
- **Number of kv-heads**: 8 (GQA) |
|
- **Vocabulary Size**: 2^17 (approximately 128k) |
|
- **Rotary Embeddings**: Theta = 1M |
|
|
|
### Key Features |
|
|
|
- Released under the Apache 2 License |
|
- Trained with a 128k context window |
|
- Pretrained on a large proportion of multilingual and code data |
|
- Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data |
|
|
|
## Fine-Tuning Details |
|
|
|
- **Training Setup**: Trained on 4x A100 GPU for 2 epochs, totaling 24 hours. |
|
|
|
## Prompting |
|
|
|
Shuttle-2.5-mini uses ChatML as its prompting format: |
|
|
|
``` |
|
<|im_start|>system |
|
You are a pirate! Yardy harr harr!<|im_end|> |
|
<|im_start|>user |
|
Where are you currently!<|im_end|> |
|
<|im_start|>assistant |
|
Look ahoy ye scallywag! We're on the high seas!<|im_end|> |
|
``` |