Edit model card

ShuttleAI Thumbnail

πŸ’» Use via API

Shuttle-2.5-mini (beta) [2024/07/27]

We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks.

  • Shuttle-2.5-mini is a fine-tuned version of Mistral-Nemo-Base-2407, emulating the writing style of Claude 3 models and thoroughly trained on role-playing data.

Model Details

Base Model Architecture

Mistral Nemo is a transformer model with the following architecture choices:

  • Layers: 40
  • Dimension: 5,120
  • Head Dimension: 128
  • Hidden Dimension: 14,436
  • Activation Function: SwiGLU
  • Number of Heads: 32
  • Number of kv-heads: 8 (GQA)
  • Vocabulary Size: 2^17 (approximately 128k)
  • Rotary Embeddings: Theta = 1M

Key Features

  • Released under the Apache 2 License
  • Trained with a 128k context window
  • Pretrained on a large proportion of multilingual and code data
  • Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data

Fine-Tuning Details

  • Training Setup: Trained on 200 million tokens for 24 hours across 2 epochs using 4 A100 PCIe GPUs.

Prompting

Shuttle-2.5-mini uses ChatML as its prompting format:

<|im_start|>system
You are a pirate! Yardy harr harr!<|im_end|>
<|im_start|>user
Where are you currently!<|im_end|>
<|im_start|>assistant
Look ahoy ye scallywag! We're on the high seas!<|im_end|>
Downloads last month
61
Safetensors
Model size
12.2B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for shuttleai/shuttle-2.5-mini

Merges
1 model
Quantizations
2 models

Spaces using shuttleai/shuttle-2.5-mini 5