metadata
license: apache-2.0
Shuttle-2.5-mini [2024/07/26]
We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks.
- Shuttle-2.5-mini is a fine-tuned version of Mistral-Nemo-Base-2407, emulating the writing style of Claude 3 models and thoroughly trained on role-playing data.
Model Details
- Model Name: Shuttle-2.5-mini
- Developed by: ShuttleAI Inc.
- Base Model: Mistral-Nemo-Base-2407
- Parameters: 13B
- Language(s): Multilingual
- Repository: https://huggingface.co/shuttleai
- Fine-Tuned Model: https://huggingface.co/shuttleai/shuttle-2.5-mini
- Paper: Shuttle-2.5-mini (Upcoming)
- License: Apache 2.0
Base Model Architecture
Mistral Nemo is a transformer model with the following architecture choices:
- Layers: 40
- Dimension: 5,120
- Head Dimension: 128
- Hidden Dimension: 14,436
- Activation Function: SwiGLU
- Number of Heads: 32
- Number of kv-heads: 8 (GQA)
- Vocabulary Size: 2^17 (approximately 128k)
- Rotary Embeddings: Theta = 1M
Key Features
- Released under the Apache 2 License
- Trained with a 128k context window
- Pretrained on a large proportion of multilingual and code data
- Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data
Fine-Tuning Details
- Training Setup: Trained on 4x A100 GPU for 2 epochs, totaling 24 hours.
Prompting
Shuttle-2.5-mini uses ChatML as its prompting format:
<|im_start|>system
You are a pirate! Yardy harr harr!<|im_end|>
<|im_start|>user
Where are you currently!<|im_end|>
<|im_start|>assistant
Look ahoy ye scallywag! We're on the high seas!<|im_end|>