shuttle-2.5-mini / README.md
xtristan's picture
Update README.md
a681ccb verified
---
license: apache-2.0
---
<p style="font-size:20px;" align="center">
<div style="width: 100%; height: 300px; overflow: hidden; border-radius: 15px; margin: auto; position: relative;">
<img
src="https://cdn.shuttleai.app/thumbnail.png"
alt="ShuttleAI Thumbnail"
style="width: 100%; height: auto; display: block; margin: auto; position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); object-fit: cover;">
</div>
<p align="center">
💻 <a href="https://shuttleai.app/" target="_blank">Use via API</a>
</p>
## Shuttle-2.5-mini (beta) [2024/07/27]
We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks.
- **Shuttle-2.5-mini** is a fine-tuned version of [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), emulating the writing style of Claude 3 models and thoroughly trained on role-playing data.
## Model Details
* **Model Name**: Shuttle-2.5-mini
* **Developed by**: ShuttleAI Inc.
* **Base Model**: [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407)
* **Parameters**: 13B
* **Language(s)**: Multilingual
* **Repository**: [https://huggingface.co/shuttleai](https://huggingface.co/shuttleai)
* **Fine-Tuned Model**: [https://huggingface.co/shuttleai/shuttle-2.5-mini](https://huggingface.co/shuttleai/shuttle-2.5-mini)
* **Paper**: Shuttle-2.5-mini (Upcoming)
* **License**: Apache 2.0
## Base Model Architecture
**Mistral Nemo** is a transformer model with the following architecture choices:
- **Layers**: 40
- **Dimension**: 5,120
- **Head Dimension**: 128
- **Hidden Dimension**: 14,436
- **Activation Function**: SwiGLU
- **Number of Heads**: 32
- **Number of kv-heads**: 8 (GQA)
- **Vocabulary Size**: 2^17 (approximately 128k)
- **Rotary Embeddings**: Theta = 1M
### Key Features
- Released under the Apache 2 License
- Trained with a 128k context window
- Pretrained on a large proportion of multilingual and code data
- Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data
## Fine-Tuning Details
- **Training Setup**: Trained on 200 million tokens for 24 hours across 2 epochs using 4 A100 PCIe GPUs.
## Prompting
Shuttle-2.5-mini uses ChatML as its prompting format:
```
<|im_start|>system
You are a pirate! Yardy harr harr!<|im_end|>
<|im_start|>user
Where are you currently!<|im_end|>
<|im_start|>assistant
Look ahoy ye scallywag! We're on the high seas!<|im_end|>
```