Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
<p style="font-size:20px;" align="center">
|
6 |
+
<div style="width: 100%; height: 300px; overflow: hidden; border-radius: 15px; margin: auto; position: relative;">
|
7 |
+
<img
|
8 |
+
src="https://cdn.shuttleai.app/thumbnail.png"
|
9 |
+
alt="ShuttleAI Thumbnail"
|
10 |
+
style="width: 100%; height: auto; display: block; margin: auto; position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); object-fit: cover;">
|
11 |
+
</div>
|
12 |
+
|
13 |
+
<p align="center">
|
14 |
+
💻 <a href="https://shuttleai.app/" target="_blank">Use via API</a>
|
15 |
+
</p>
|
16 |
+
|
17 |
+
## shuttle-2.5-mini-GPTQ-Int4 [2024/07/26]
|
18 |
+
|
19 |
+
We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks.
|
20 |
+
|
21 |
+
- **Shuttle-2.5-mini** is a fine-tuned version of [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), emulating the writing style of Claude 3 models and thoroughly trained on role-playing data.
|
22 |
+
|
23 |
+
## Model Details
|
24 |
+
|
25 |
+
* **Model Name**: Shuttle-2.5-mini
|
26 |
+
* **Developed by**: ShuttleAI Inc.
|
27 |
+
* **Base Model**: [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407)
|
28 |
+
* **Parameters**: 13B
|
29 |
+
* **Language(s)**: Multilingual
|
30 |
+
* **Repository**: [https://huggingface.co/shuttleai](https://huggingface.co/shuttleai)
|
31 |
+
* **Fine-Tuned Model**: [https://huggingface.co/shuttleai/shuttle-2.5-mini](https://huggingface.co/shuttleai/shuttle-2.5-mini)
|
32 |
+
* **Paper**: Shuttle-2.5-mini (Upcoming)
|
33 |
+
* **License**: Apache 2.0
|
34 |
+
|
35 |
+
## Base Model Architecture
|
36 |
+
|
37 |
+
**Mistral Nemo** is a transformer model with the following architecture choices:
|
38 |
+
|
39 |
+
- **Layers**: 40
|
40 |
+
- **Dimension**: 5,120
|
41 |
+
- **Head Dimension**: 128
|
42 |
+
- **Hidden Dimension**: 14,436
|
43 |
+
- **Activation Function**: SwiGLU
|
44 |
+
- **Number of Heads**: 32
|
45 |
+
- **Number of kv-heads**: 8 (GQA)
|
46 |
+
- **Vocabulary Size**: 2^17 (approximately 128k)
|
47 |
+
- **Rotary Embeddings**: Theta = 1M
|
48 |
+
|
49 |
+
### Key Features
|
50 |
+
|
51 |
+
- Released under the Apache 2 License
|
52 |
+
- Trained with a 128k context window
|
53 |
+
- Pretrained on a large proportion of multilingual and code data
|
54 |
+
- Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data
|
55 |
+
|
56 |
+
## Fine-Tuning Details
|
57 |
+
|
58 |
+
- **Training Setup**: Trained on 4x A100 GPU for 2 epochs, totaling 24 hours.
|
59 |
+
|
60 |
+
## Prompting
|
61 |
+
|
62 |
+
Shuttle-2.5-mini uses ChatML as its prompting format:
|
63 |
+
|
64 |
+
```
|
65 |
+
<|im_start|>system
|
66 |
+
You are a pirate! Yardy harr harr!<|im_end|>
|
67 |
+
<|im_start|>user
|
68 |
+
Where are you currently!<|im_end|>
|
69 |
+
<|im_start|>assistant
|
70 |
+
Look ahoy ye scallywag! We're on the high seas!<|im_end|>
|
71 |
+
```
|