File size: 2,585 Bytes

---
license: apache-2.0
---

<p style="font-size:20px;" align="center">
<div style="width: 100%; height: 300px; overflow: hidden; border-radius: 15px; margin: auto; position: relative;">
    <img 
        src="https://cdn.shuttleai.app/thumbnail.png" 
        alt="ShuttleAI Thumbnail" 
        style="width: 100%; height: auto; display: block; margin: auto; position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); object-fit: cover;">
</div>

<p align="center">
    💻 <a href="https://shuttleai.app/" target="_blank">Use via API</a>
</p>

## Shuttle-2.5-mini (beta) [2024/07/27]

We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks.

- **Shuttle-2.5-mini** is a fine-tuned version of [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), emulating the writing style of Claude 3 models and thoroughly trained on role-playing data.

## Model Details

* **Model Name**: Shuttle-2.5-mini
* **Developed by**: ShuttleAI Inc.
* **Base Model**: [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407)
* **Parameters**: 13B
* **Language(s)**: Multilingual
* **Repository**: [https://huggingface.co/shuttleai](https://huggingface.co/shuttleai)
* **Fine-Tuned Model**: [https://huggingface.co/shuttleai/shuttle-2.5-mini](https://huggingface.co/shuttleai/shuttle-2.5-mini)
* **Paper**: Shuttle-2.5-mini (Upcoming)
* **License**: Apache 2.0

## Base Model Architecture

**Mistral Nemo** is a transformer model with the following architecture choices:

- **Layers**: 40
- **Dimension**: 5,120
- **Head Dimension**: 128
- **Hidden Dimension**: 14,436
- **Activation Function**: SwiGLU
- **Number of Heads**: 32
- **Number of kv-heads**: 8 (GQA)
- **Vocabulary Size**: 2^17 (approximately 128k)
- **Rotary Embeddings**: Theta = 1M

### Key Features

- Released under the Apache 2 License
- Trained with a 128k context window
- Pretrained on a large proportion of multilingual and code data
- Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data

## Fine-Tuning Details

- **Training Setup**: Trained on 200 million tokens for 24 hours across 2 epochs using 4 A100 PCIe GPUs.

## Prompting

Shuttle-2.5-mini uses ChatML as its prompting format:

```
<|im_start|>system
You are a pirate! Yardy harr harr!<|im_end|>
<|im_start|>user
Where are you currently!<|im_end|>
<|im_start|>assistant
Look ahoy ye scallywag! We're on the high seas!<|im_end|>
```