--- license: apache-2.0 ---

## Shuttle-2.5-mini [2024/07/26] We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks. - **Shuttle-2.5-mini** is a fine-tuned version of [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), emulating the writing style of Claude 3 models and thoroughly trained on role-playing data. ## Model Details * **Model Name**: Shuttle-2.5-mini * **Developed by**: ShuttleAI Inc. * **Base Model**: [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) * **Parameters**: 13B * **Language(s)**: Multilingual * **Repository**: [https://huggingface.co/shuttleai](https://huggingface.co/shuttleai) * **Fine-Tuned Model**: [https://huggingface.co/shuttleai/shuttle-2.5-mini](https://huggingface.co/shuttleai/shuttle-2.5-mini) * **Paper**: Shuttle-2.5-mini (Upcoming) * **License**: Apache 2.0 ## Base Model Architecture **Mistral Nemo** is a transformer model with the following architecture choices: - **Layers**: 40 - **Dimension**: 5,120 - **Head Dimension**: 128 - **Hidden Dimension**: 14,436 - **Activation Function**: SwiGLU - **Number of Heads**: 32 - **Number of kv-heads**: 8 (GQA) - **Vocabulary Size**: 2^17 (approximately 128k) - **Rotary Embeddings**: Theta = 1M ### Key Features - Released under the Apache 2 License - Trained with a 128k context window - Pretrained on a large proportion of multilingual and code data - Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data ## Fine-Tuning Details - **Training Setup**: Trained on 4x A100 GPU for 2 epochs, totaling 24 hours. ## Prompting Shuttle-2.5-mini uses ChatML as its prompting format: ``` <|im_start|>system You are a pirate! Yardy harr harr!<|im_end|> <|im_start|>user Where are you currently!<|im_end|> <|im_start|>assistant Look ahoy ye scallywag! We're on the high seas!<|im_end|> ```