xtristan commited on
Commit
31dd7f5
1 Parent(s): 264c792

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ <p style="font-size:20px;" align="center">
6
+ <div style="width: 100%; height: 300px; overflow: hidden; border-radius: 15px; margin: auto; position: relative;">
7
+ <img
8
+ src="https://cdn.shuttleai.app/thumbnail.png"
9
+ alt="ShuttleAI Thumbnail"
10
+ style="width: 100%; height: auto; display: block; margin: auto; position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); object-fit: cover;">
11
+ </div>
12
+
13
+ <p align="center">
14
+ 💻 <a href="https://shuttleai.app/" target="_blank">Use via API</a>
15
+ </p>
16
+
17
+ ## shuttle-2.5-mini-GPTQ-Int4 [2024/07/26]
18
+
19
+ We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks.
20
+
21
+ - **Shuttle-2.5-mini** is a fine-tuned version of [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), emulating the writing style of Claude 3 models and thoroughly trained on role-playing data.
22
+
23
+ ## Model Details
24
+
25
+ * **Model Name**: Shuttle-2.5-mini
26
+ * **Developed by**: ShuttleAI Inc.
27
+ * **Base Model**: [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407)
28
+ * **Parameters**: 13B
29
+ * **Language(s)**: Multilingual
30
+ * **Repository**: [https://huggingface.co/shuttleai](https://huggingface.co/shuttleai)
31
+ * **Fine-Tuned Model**: [https://huggingface.co/shuttleai/shuttle-2.5-mini](https://huggingface.co/shuttleai/shuttle-2.5-mini)
32
+ * **Paper**: Shuttle-2.5-mini (Upcoming)
33
+ * **License**: Apache 2.0
34
+
35
+ ## Base Model Architecture
36
+
37
+ **Mistral Nemo** is a transformer model with the following architecture choices:
38
+
39
+ - **Layers**: 40
40
+ - **Dimension**: 5,120
41
+ - **Head Dimension**: 128
42
+ - **Hidden Dimension**: 14,436
43
+ - **Activation Function**: SwiGLU
44
+ - **Number of Heads**: 32
45
+ - **Number of kv-heads**: 8 (GQA)
46
+ - **Vocabulary Size**: 2^17 (approximately 128k)
47
+ - **Rotary Embeddings**: Theta = 1M
48
+
49
+ ### Key Features
50
+
51
+ - Released under the Apache 2 License
52
+ - Trained with a 128k context window
53
+ - Pretrained on a large proportion of multilingual and code data
54
+ - Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data
55
+
56
+ ## Fine-Tuning Details
57
+
58
+ - **Training Setup**: Trained on 4x A100 GPU for 2 epochs, totaling 24 hours.
59
+
60
+ ## Prompting
61
+
62
+ Shuttle-2.5-mini uses ChatML as its prompting format:
63
+
64
+ ```
65
+ <|im_start|>system
66
+ You are a pirate! Yardy harr harr!<|im_end|>
67
+ <|im_start|>user
68
+ Where are you currently!<|im_end|>
69
+ <|im_start|>assistant
70
+ Look ahoy ye scallywag! We're on the high seas!<|im_end|>
71
+ ```