Text Generation
Transformers
Safetensors
English
olmo2
conversational
Inference Endpoints
hamishivi commited on
Commit
5b6d4e3
1 Parent(s): 4c117dc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ base_model:
7
+ - allenai/OLMo-2-1124-13B-DPO
8
+ library_name: transformers
9
+ datasets:
10
+ - allenai/RLVR-GSM-MATH-IF-Mixed-Constraints
11
+ ---
12
+
13
+ <img src="https://allenai.org/olmo/olmo-7b-animation.gif" alt="OLMo Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
14
+
15
+ # OLMo-2-1124-13B-Instruct
16
+
17
+ OLMo-2 13B Instruct November 2024 is finetuned variant of the [OLMo-2 13B November 2024](https://huggingface.co/allenai/OLMo2-13B-1124) model, which has undergone supervised finetuning on the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix), and finally RLVR training using [this data](https://huggingface.co/datasets/allenai/RLVR-GSM-MATH-IF-Mixed-Constraints).
18
+ Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
19
+ Check out [the OLMo-2 paper](https://TODO) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
20
+
21
+ OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
22
+ These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
23
+ The core models released in this batch include the following:
24
+
25
+
26
+ | **Stage** | **OLMo-2 7B** | **OLMo-2 7B** |
27
+ |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
28
+ | **Base Model** | [allenai/OLMo2-7B-1124](https://huggingface.co/allenai/OLMo2-7B-1124) | [allenai/OLMo-2-13B-1124](https://huggingface.co/allenai/OLMo-2-13B-1124) |
29
+ | **SFT** | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT) | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT) |
30
+ | **DPO** | [allenai/OLMo-2-1124-7B-DPO](https://huggingface.co/allenai/OLMo-2-1124-7B-DPO) | [allenai/OLMo-2-1124-13B-DPO](https://huggingface.co/allenai/OLMo-2-1124-13B-DPO) |
31
+ | **Final Models (RLVR)** | [allenai/OLMo-2-1124-7B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-7B-Instruct) | [allenai/OLMo-2-1124-13B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-13B-Instruct) |
32
+ | **Reward Model (RM)**| [allenai/OLMo-2-1124-7B-RM](https://huggingface.co/allenai/OLMo-2-1124-7B-RM) | (Same as 8B) |
33
+
34
+
35
+
36
+ ## Model description
37
+
38
+ - **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
39
+ - **Language(s) (NLP):** Primarily English
40
+ - **License:** Apache 2.0
41
+ - **Finetuned from model:** allenai/OLMo-2-13B-1124-DPO
42
+
43
+ ### Model Sources
44
+
45
+ - **Project Page:** https://allenai.org/olmo
46
+ - **Repositories:**
47
+ - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
48
+ - Evaluation code: https://github.com/allenai/olmes
49
+ - Further fine-tuning code: https://github.com/allenai/open-instruct
50
+ - **Paper:** Coming soon! TODO
51
+ - **Demo:** https://playground.allenai.org/
52
+
53
+ ## Using the model
54
+
55
+ ### Loading with HuggingFace
56
+
57
+ To load the model with HuggingFace, use the following snippet:
58
+ ```
59
+ from transformers import AutoModelForCausalLM
60
+
61
+ olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-13B-Instruct")
62
+ ```
63
+
64
+ ### Chat template
65
+
66
+ The chat template for our models is formatted as:
67
+ ```
68
+ <|endoftext|><|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
69
+ ```
70
+ Or with new lines expanded:
71
+ ```
72
+ <|endoftext|><|user|>
73
+ How are you doing?
74
+ <|assistant|>
75
+ I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
76
+ ```
77
+ It is embedded within the tokenizer as well, for `tokenizer.apply_chat_template`.
78
+
79
+ ### System prompt
80
+
81
+ In Ai2 demos, we use this system prompt by default:
82
+ ```
83
+ You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI.
84
+ ```
85
+ The model has not been trained with a specific system prompt in mind.
86
+
87
+ ### Bias, Risks, and Limitations
88
+
89
+ The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
90
+ See the Falcon 180B model card for an example of this.
91
+
92
+
93
+ ## Performance
94
+
95
+ TODO
96
+
97
+ ## Hyperparameters
98
+
99
+ ppo_olmo_13b_25_rm_best_gsm_math_if_beta_0.03_lr_4e-7_25218__1__1732572469_step_360
100
+
101
+ PPO settings for RLVR:
102
+ - **Learning Rate**: 4 × 10⁻⁷
103
+ - **Discount Factor (gamma)**: 1.0
104
+ - **General Advantage Estimation (lambda)**: 0.95
105
+ - **Mini-batches (N_mb)**: 1
106
+ - **PPO Update Iterations (K)**: 4
107
+ - **PPO's Clipping Coefficient (epsilon)**: 0.2
108
+ - **Value Function Coefficient (c1)**: 0.1
109
+ - **Gradient Norm Threshold**: 1.0
110
+ - **Learning Rate Schedule**: Linear
111
+ - **Generation Temperature**: 1.0
112
+ - **Batch Size (effective)**: 512
113
+ - **Max Token Length**: 2,048
114
+ - **Max Prompt Token Length**: 2,048
115
+ - **Penalty Reward Value for Responses without an EOS Token**: -10.0
116
+ - **Response Length**: 2,048
117
+ - **Total Episodes**: 100,000 (this checkpoint is training step 360)
118
+ - **KL penalty coefficient (beta)**: 0.03
119
+ - **Warm up ratio (omega)**: 0.0
120
+
121
+ ## License and use
122
+
123
+ OLMo-2 is licensed under the Apache 2.0 license.
124
+ OLMo-2 is intended for research and educational use.
125
+ For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
126
+
127
+ ## Citation
128
+
129
+ If OLMo-2 or any of the related materials were helpful to your work, please cite:
130
+ ```
131
+ TODO
132
+ ```