manojpreveen
commited on
Commit
•
485bb3c
1
Parent(s):
29aa7d5
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- iamplus/LLama2-SFT-Data
|
5 |
+
- iamplus/Open_Platypus_Orca
|
6 |
+
- iamplus/Orca
|
7 |
+
- iamplus/Conversational_Data
|
8 |
+
---
|
9 |
+
|
10 |
+
|
11 |
+
**Description :**
|
12 |
+
|
13 |
+
This model is trained on a mix of Orca data and Open Source + Closed Multi-turn Conversation data to create a better reasoning model which is capable of holding multi-turn conversations as well.
|
14 |
+
|
15 |
+
The Dataset split description, Prompt description as well as Training Parameters are given below.
|
16 |
+
|
17 |
+
**Prompt Description :**
|
18 |
+
|
19 |
+
The prompt template for the first turn looks like this:
|
20 |
+
```
|
21 |
+
<s>[INST] <<SYS>>
|
22 |
+
{{ system_prompt }}
|
23 |
+
<</SYS>>
|
24 |
+
|
25 |
+
{{ user_message }} [/INST]
|
26 |
+
```
|
27 |
+
|
28 |
+
The prompt template for the multi-turn conversation looks like this:
|
29 |
+
```
|
30 |
+
<s>[INST] <<SYS>>
|
31 |
+
{{ system_prompt }}
|
32 |
+
<</SYS>>
|
33 |
+
|
34 |
+
{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]
|
35 |
+
```
|
36 |
+
|
37 |
+
This model follows the official Meta's chat model Prompt format. Please refer here : https://huggingface.co/blog/llama2#how-to-prompt-llama-2 on how to prompt the model for single/multi-turn conversations.
|
38 |
+
|
39 |
+
**Base model :** meta-llama/Llama-2-70b-hf
|
40 |
+
|
41 |
+
**Data :**
|
42 |
+
1. 1M Orca dara (Gpt-4 Orca data - OpenOrca)
|
43 |
+
2. 1.7M chat data (includes OpenAssistant Chat data, Ultrachat, and many more open source Chat Datasets)
|
44 |
+
3. 30k OpenPlatypus data
|
45 |
+
|
46 |
+
**Training Params :**
|
47 |
+
```
|
48 |
+
Number of Epochs : 2
|
49 |
+
Batch Size : 64
|
50 |
+
Sequence Length : 4096
|
51 |
+
Learning Rate : 2e-5 (Cosine)
|
52 |
+
Weight Decay : 0.1
|
53 |
+
Gradient Clipping : 1.0
|
54 |
+
Gamma : 0.85
|
55 |
+
beta_1 : 0.9
|
56 |
+
beta_2 : 0.95
|
57 |
+
eps : 1e-5
|
58 |
+
Precision : bf16
|
59 |
+
Optimizer : Any Precision AdamW Optimizer
|
60 |
+
```
|