Model description
This repository contains over 500 model checkpoints for the paper Loss-to-Loss Prediction: Scaling Laws for All Datasets, with models ranging in size from 20M parameters up to 3.3B parameters and FLOP budgets from 2e17 to 1e21 FLOPs across 6 different pretraining datasets.
Each subdirectory name contains four different parameters to identify the model in that subdirectory:
- Dataset: one of
fineweb-100b
,fineweb-edu-100b
,proof-pile-2
,slimpajama-chunk1
,smollm-corpus
, orstarcoder
- N: the number of model parameters
- D: the number of training tokens
- C: the number of training FLOPs
For example, a model trained on starcoder
with 1.1e08 parameters on 3.0e08 tokens for a total of 2.0e17 FLOPs would have the name: L2L_starcoder_N1.1e08_D3.0e08_C2.0e17/
Full training details for the models can be found in the training repository or paper.
How to load a model
First, follow the instructions in the training repository to install our fork of the OLMo package.
With this installed, you can then use the huggingface_hub
and transformers
packages to load a model with the following snippet:
from olmo.model import HFMixinOLMo
from huggingface_hub import snapshot_download
tmp_dir = "tmp"
model_name = "L2L_starcoder_N1.1e08_D3.0e08_C2.0e17"
snapshot_download(
repo_id="KempnerInstituteAI/loss-to-loss",
allow_patterns=f"{model_name}/*",
local_dir=tmp_dir,
)
model = HFMixinOLMo.from_pretrained(f"{tmp_dir}/{model_name}")
Citation
If you use these models in your research, please cite this paper:
@article{brandfonbrener2024loss,
title={Loss-to-Loss Prediction: Scaling Laws for All Datasets},
author={Brandfonbrener, David and Anand, Nikhil and Vyas, Nikhil and Malach, Eran and Kakade, Sham},
journal={arXiv preprint arXiv:2411.12925},
year={2024}
}
License
These models are licensed under Apache 2.0. It is intended for research and educational use.