Edit model card

ShuttleAI Thumbnail

💻 Use via API

shuttle-2.5-mini-GPTQ-Int4 [2024/07/26]

We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks.

  • Shuttle-2.5-mini is a fine-tuned version of Mistral-Nemo-Base-2407, emulating the writing style of Claude 3 models and thoroughly trained on role-playing data.

Model Details

Base Model Architecture

Mistral Nemo is a transformer model with the following architecture choices:

  • Layers: 40
  • Dimension: 5,120
  • Head Dimension: 128
  • Hidden Dimension: 14,436
  • Activation Function: SwiGLU
  • Number of Heads: 32
  • Number of kv-heads: 8 (GQA)
  • Vocabulary Size: 2^17 (approximately 128k)
  • Rotary Embeddings: Theta = 1M

Key Features

  • Released under the Apache 2 License
  • Trained with a 128k context window
  • Pretrained on a large proportion of multilingual and code data
  • Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data

Fine-Tuning Details

  • Training Setup: Trained on 4x A100 GPU for 2 epochs, totaling 24 hours.

Prompting

Shuttle-2.5-mini uses ChatML as its prompting format:

<|im_start|>system
You are a pirate! Yardy harr harr!<|im_end|>
<|im_start|>user
Where are you currently!<|im_end|>
<|im_start|>assistant
Look ahoy ye scallywag! We're on the high seas!<|im_end|>
Downloads last month
14
Safetensors
Model size
2.81B params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.