KempnerInstitute
commited on
Commit
•
baac830
1
Parent(s):
1fa49a6
Update README with paper info
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
# Model description
|
2 |
|
3 |
-
This repository contains over 500 model checkpoints ranging in size from 20M parameters up to 3.3B parameters and FLOP budgets from 2e17 to 1e21 FLOPs across 6 different pretraining datasets.
|
4 |
|
5 |
Each subdirectory name contains four different parameters to identify the model in that subdirectory:
|
6 |
|
@@ -40,7 +40,12 @@ model = HFMixinOLMo.from_pretrained(f"{tmp_dir}/{model_name}")
|
|
40 |
If you use these models in your research, please cite this paper:
|
41 |
|
42 |
```bibtex
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
44 |
```
|
45 |
|
46 |
# License
|
|
|
1 |
# Model description
|
2 |
|
3 |
+
This repository contains over 500 model checkpoints for the paper [Loss-to-Loss Prediction: Scaling Laws for All Datasets](https://arxiv.org/abs/2411.12925), with models ranging in size from 20M parameters up to 3.3B parameters and FLOP budgets from 2e17 to 1e21 FLOPs across 6 different pretraining datasets.
|
4 |
|
5 |
Each subdirectory name contains four different parameters to identify the model in that subdirectory:
|
6 |
|
|
|
40 |
If you use these models in your research, please cite this paper:
|
41 |
|
42 |
```bibtex
|
43 |
+
@article{brandfonbrener2024loss,
|
44 |
+
title={Loss-to-Loss Prediction: Scaling Laws for All Datasets},
|
45 |
+
author={Brandfonbrener, David and Anand, Nikhil and Vyas, Nikhil and Malach, Eran and Kakade, Sham},
|
46 |
+
journal={arXiv preprint arXiv:2411.12925},
|
47 |
+
year={2024}
|
48 |
+
}
|
49 |
```
|
50 |
|
51 |
# License
|