allenai
/

OLMo-2-1124-13B-Instruct

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+base_model:
+- allenai/OLMo-2-1124-13B-DPO
+library_name: transformers
+datasets:
+- allenai/RLVR-GSM-MATH-IF-Mixed-Constraints
+---
+<img src="https://allenai.org/olmo/olmo-7b-animation.gif" alt="OLMo Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# OLMo-2-1124-13B-Instruct
+OLMo-2 13B Instruct November 2024 is finetuned variant of the [OLMo-2 13B November 2024](https://huggingface.co/allenai/OLMo2-13B-1124) model, which has undergone supervised finetuning on the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix), and finally RLVR training using [this data](https://huggingface.co/datasets/allenai/RLVR-GSM-MATH-IF-Mixed-Constraints).
+Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
+Check out [the OLMo-2 paper](https://TODO) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
+OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
+These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
+The core models released in this batch include the following:
+| **Stage**           | **OLMo-2 7B**                                                                                          | **OLMo-2 7B**                                                                                         |
+|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
+| **Base Model**       | [allenai/OLMo2-7B-1124](https://huggingface.co/allenai/OLMo2-7B-1124)                                | [allenai/OLMo-2-13B-1124](https://huggingface.co/allenai/OLMo-2-13B-1124)                             |
+| **SFT**              | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT)                | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT)              |
+| **DPO**              | [allenai/OLMo-2-1124-7B-DPO](https://huggingface.co/allenai/OLMo-2-1124-7B-DPO)                | [allenai/OLMo-2-1124-13B-DPO](https://huggingface.co/allenai/OLMo-2-1124-13B-DPO)              |
+| **Final Models (RLVR)** | [allenai/OLMo-2-1124-7B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-7B-Instruct)                        | [allenai/OLMo-2-1124-13B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-13B-Instruct)                      |
+| **Reward Model (RM)**| [allenai/OLMo-2-1124-7B-RM](https://huggingface.co/allenai/OLMo-2-1124-7B-RM)                                                     | (Same as 8B)                                                     |
+## Model description
+- **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
+- **Language(s) (NLP):** Primarily English
+- **License:** Apache 2.0
+- **Finetuned from model:** allenai/OLMo-2-13B-1124-DPO
+### Model Sources
+- **Project Page:** https://allenai.org/olmo
+- **Repositories:**
+    - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
+    - Evaluation code: https://github.com/allenai/olmes
+    - Further fine-tuning code: https://github.com/allenai/open-instruct
+- **Paper:** Coming soon! TODO
+- **Demo:** https://playground.allenai.org/
+## Using the model
+### Loading with HuggingFace
+To load the model with HuggingFace, use the following snippet:
+```
+from transformers import AutoModelForCausalLM
+olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-13B-Instruct")
+```
+### Chat template
+The chat template for our models is formatted as:
+```
+<|endoftext|><|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
+```
+Or with new lines expanded:
+```
+<|endoftext|><|user|>
+How are you doing?
+<|assistant|>
+I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
+```
+It is embedded within the tokenizer as well, for `tokenizer.apply_chat_template`.
+### System prompt
+In Ai2 demos, we use this system prompt by default:
+```
+You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI.
+```
+The model has not been trained with a specific system prompt in mind.
+### Bias, Risks, and Limitations
+The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
+See the Falcon 180B model card for an example of this.
+## Performance
+TODO
+## Hyperparameters
+ppo_olmo_13b_25_rm_best_gsm_math_if_beta_0.03_lr_4e-7_25218__1__1732572469_step_360
+PPO settings for RLVR:
+- **Learning Rate**: 4 × 10⁻⁷
+- **Discount Factor (gamma)**: 1.0
+- **General Advantage Estimation (lambda)**: 0.95
+- **Mini-batches (N_mb)**: 1
+- **PPO Update Iterations (K)**: 4
+- **PPO's Clipping Coefficient (epsilon)**: 0.2
+- **Value Function Coefficient (c1)**: 0.1
+- **Gradient Norm Threshold**: 1.0
+- **Learning Rate Schedule**: Linear
+- **Generation Temperature**: 1.0
+- **Batch Size (effective)**: 512
+- **Max Token Length**: 2,048
+- **Max Prompt Token Length**: 2,048
+- **Penalty Reward Value for Responses without an EOS Token**: -10.0
+- **Response Length**: 2,048
+- **Total Episodes**: 100,000 (this checkpoint is training step 360)
+- **KL penalty coefficient (beta)**: 0.03
+- **Warm up ratio (omega)**: 0.0
+## License and use
+OLMo-2 is licensed under the Apache 2.0 license.
+OLMo-2 is intended for research and educational use.
+For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
+## Citation
+If OLMo-2 or any of the related materials were helpful to your work, please cite:
+```
+TODO
+```