license: mit | |
datasets: | |
- iamplus/LLama2-SFT-Data | |
- iamplus/Open_Platypus_Orca | |
- iamplus/Orca | |
- iamplus/Conversational_Data | |
**Description :** | |
This model is trained on a mix of Orca data and Open Source + Closed Multi-turn Conversation data to create a better reasoning model which is capable of holding multi-turn conversations as well. | |
The Dataset split description, Prompt description as well as Training Parameters are given below. | |
**Prompt Description :** | |
The prompt template for the first turn looks like this: | |
``` | |
<s>[INST] <<SYS>> | |
{{ system_prompt }} | |
<</SYS>> | |
{{ user_message }} [/INST] | |
``` | |
The prompt template for the multi-turn conversation looks like this: | |
``` | |
<s>[INST] <<SYS>> | |
{{ system_prompt }} | |
<</SYS>> | |
{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST] | |
``` | |
This model follows the official Meta's chat model Prompt format. Please refer here : https://huggingface.co/blog/llama2#how-to-prompt-llama-2 on how to prompt the model for single/multi-turn conversations. | |
**Base model :** meta-llama/Llama-2-70b-hf | |
**Data :** | |
1. 1M Orca dara (Gpt-4 Orca data - OpenOrca) | |
2. 1.7M chat data (includes OpenAssistant Chat data, Ultrachat, and many more open source Chat Datasets) | |
3. 30k OpenPlatypus data | |
**Training Params :** | |
``` | |
Number of Epochs : 1 | |
Batch Size : 64 | |
Sequence Length : 4096 | |
Learning Rate : 2e-5 (Cosine) | |
Weight Decay : 0.1 | |
Gradient Clipping : 1.0 | |
Gamma : 0.85 | |
beta_1 : 0.9 | |
beta_2 : 0.95 | |
eps : 1e-5 | |
Precision : bf16 | |
Optimizer : Any Precision AdamW Optimizer | |
``` |