Models Trained with Random Mixture
This is a collection of 64 language models, each with approximately 1B parameters, trained on different random mixtures of data. This project aims to validate the generalization capabilities of the RegMix approach (https://huggingface.co/papers/2407.01492) from small-scale (e.g., 1M parameters) to large-scale (e.g., 1B parameters) models.
Key Features
- Model Size: 64 separate models, each with ~1B parameters
- Training Data: Random data mixtures on the RegMix-Data dataset
- Purpose: To validate the effectiveness of RegMix on identifying high-performing data mixture
Dataset
The models were trained using the RegMix-Data dataset, which is split into different domains from The Pile dataset.
Training Hyperparameters
Hyperparameter |
Value |
Batch Size |
1M tokens |
Learning Rate |
4e-4 |
Minimum Learning Rate |
1e-5 |
Learning Rate Schedule |
Cosine |
Warmup Ratio |
4% |
Total Tokens |
25B |
How to Load a Model
You can load any model using the corresponding branch with the Hugging Face Transformers library:
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
Data Mixture
The specific data mixture used for training each 1B model can be found in the file train_config.yaml
in each corresponding model branch.
Model Variants
To access different model variants, simply change the revision
parameter in the from_pretrained
method to the desired model index (e.g., "model-index-2", "model-index-3"), and the maxium index is 64.
Usage Notes
- These models are primarily intended for research purposes.
- Performance may vary depending on the specific task and domain.
Citation
If you use these models in your research, please cite the RegMix paper:
@article{liu2024regmix,
title={RegMix: Data Mixture as Regression for Language Model Pre-training},
author={Liu, Qian and Zheng, Xiaosen and Muennighoff, Niklas and Zeng, Guangtao and Dou, Longxu and Pang, Tianyu and Jiang, Jing and Lin, Min},
journal={arXiv preprint arXiv:2407.01492},
year={2024}
}
For more information about the RegMix methodology and its applications, please refer to the original paper.
Performance
We evaluated each model using lm-evaluation-harness. The performance metric for each task is the average of 0-shot to 5-shot accnorm
(accuracy normalized, if available) or acc
(accuracy) scores.
Table 1: Model Index 1-8
Task |
Model 1 |
Model 2 |
Model 3 |
Model 4 |
Model 5 |
Model 6 |
Model 7 |
Model 8 |
Social IQA |
33.27 |
33.33 |
33.62 |
33.53 |
33.49 |
33.56 |
33.62 |
33.55 |
HellaSwag |
40.58 |
36.86 |
40.58 |
36.06 |
40.07 |
37.85 |
37.93 |
39.59 |
PiQA |
67.29 |
65.14 |
67.97 |
64.66 |
67.03 |
65.36 |
66.00 |
66.55 |
OpenBookQA |
28.63 |
27.87 |
29.33 |
29.10 |
29.23 |
28.33 |
29.13 |
28.73 |
Lambada |
29.17 |
26.86 |
31.55 |
27.11 |
29.16 |
28.92 |
31.53 |
30.92 |
SciQ |
80.68 |
79.98 |
81.05 |
80.80 |
82.40 |
79.88 |
78.67 |
79.70 |
COPA |
70.50 |
63.83 |
69.17 |
65.00 |
67.50 |
66.00 |
66.67 |
68.67 |
RACE |
29.47 |
30.00 |
32.11 |
28.82 |
31.13 |
30.06 |
29.90 |
30.75 |
ARC Easy |
50.03 |
48.72 |
50.01 |
46.64 |
51.06 |
47.46 |
46.75 |
48.39 |
LogiQA |
23.76 |
24.17 |
25.29 |
25.29 |
24.55 |
25.96 |
25.45 |
26.32 |
QQP |
55.71 |
55.90 |
54.84 |
56.52 |
54.01 |
56.34 |
52.35 |
54.20 |
WinoGrande |
51.54 |
51.59 |
51.39 |
50.91 |
53.13 |
52.26 |
51.26 |
51.45 |
MultiRC |
52.65 |
53.39 |
51.89 |
50.92 |
49.03 |
53.09 |
53.64 |
50.23 |
Average |
47.18 |
45.97 |
47.60 |
45.80 |
47.06 |
46.54 |
46.38 |
46.85 |
Table 2: Model Index 9-16
Task |
Model 9 |
Model 10 |
Model 11 |
Model 12 |
Model 13 |
Model 14 |
Model 15 |
Model 16 |
Social IQA |
33.43 |
33.21 |
33.31 |
33.17 |
33.28 |
32.43 |
33.57 |
33.70 |
HellaSwag |
40.05 |
35.89 |
39.55 |
39.89 |
38.63 |
36.18 |
39.52 |
35.94 |
PiQA |
66.60 |
64.74 |
66.29 |
66.27 |
66.90 |
64.05 |
66.70 |
64.51 |
OpenBookQA |
28.87 |
26.60 |
29.33 |
28.73 |
29.40 |
27.87 |
29.67 |
27.83 |
Lambada |
31.39 |
27.37 |
30.32 |
30.31 |
31.38 |
26.25 |
29.86 |
26.95 |
SciQ |
81.10 |
79.12 |
79.97 |
82.85 |
79.42 |
81.40 |
81.38 |
81.23 |
COPA |
67.00 |
64.50 |
66.83 |
69.50 |
67.33 |
65.83 |
69.50 |
66.33 |
RACE |
30.57 |
29.63 |
30.49 |
30.85 |
30.35 |
28.66 |
31.21 |
29.57 |
ARC Easy |
50.66 |
47.74 |
47.47 |
50.18 |
49.92 |
49.52 |
50.73 |
48.65 |
LogiQA |
23.60 |
25.65 |
26.37 |
23.81 |
25.58 |
26.29 |
25.86 |
25.12 |
QQP |
54.89 |
54.79 |
54.20 |
55.23 |
53.69 |
57.09 |
53.95 |
54.24 |
WinoGrande |
50.83 |
51.84 |
51.05 |
51.83 |
52.12 |
52.00 |
51.01 |
51.82 |
MultiRC |
54.18 |
54.48 |
50.17 |
52.12 |
51.42 |
52.69 |
51.87 |
53.48 |
Average |
47.17 |
45.81 |
46.57 |
47.29 |
46.88 |
46.17 |
47.30 |
46.11 |
Table 3: Model Index 17-24
Task |
Model 17 |
Model 18 |
Model 19 |
Model 20 |
Model 21 |
Model 22 |
Model 23 |
Model 24 |
Social IQA |
33.89 |
33.31 |
33.53 |
33.38 |
33.75 |
33.24 |
33.56 |
33.71 |
HellaSwag |
38.68 |
39.90 |
34.67 |
37.12 |
37.44 |
36.07 |
42.15 |
34.67 |
PiQA |
66.83 |
67.39 |
63.33 |
64.83 |
65.00 |
63.68 |
67.80 |
62.99 |
OpenBookQA |
28.13 |
30.67 |
28.03 |
29.40 |
27.67 |
27.77 |
29.37 |
25.83 |
Lambada |
28.78 |
28.56 |
24.13 |
29.41 |
27.67 |
28.03 |
33.47 |
24.04 |
SciQ |
79.60 |
78.83 |
77.42 |
78.98 |
78.95 |
78.72 |
81.83 |
79.12 |
COPA |
65.17 |
68.17 |
65.33 |
67.33 |
67.67 |
62.67 |
69.83 |
65.83 |
RACE |
28.74 |
30.03 |
29.76 |
29.49 |
30.77 |
29.76 |
31.21 |
27.91 |
ARC Easy |
48.86 |
49.42 |
47.90 |
48.30 |
47.88 |
46.68 |
50.92 |
45.24 |
LogiQA |
25.91 |
26.34 |
26.24 |
25.76 |
26.11 |
26.24 |
24.17 |
25.91 |
QQP |
53.35 |
53.18 |
50.61 |
51.49 |
54.27 |
54.99 |
52.77 |
55.19 |
WinoGrande |
52.54 |
51.17 |
52.01 |
51.09 |
52.13 |
52.03 |
52.50 |
50.28 |
MultiRC |
51.49 |
52.45 |
55.40 |
54.87 |
51.73 |
49.49 |
50.61 |
50.29 |
Average |
46.30 |
46.88 |
45.26 |
46.27 |
46.23 |
45.34 |
47.71 |
44.69 |
Table 4: Model Index 25-32
Task |
Model 25 |
Model 26 |
Model 27 |
Model 28 |
Model 29 |
Model 30 |
Model 31 |
Model 32 |
Social IQA |
33.51 |
33.40 |
33.59 |
33.52 |
33.53 |
33.49 |
33.16 |
33.56 |
HellaSwag |
36.75 |
36.97 |
40.81 |
38.25 |
40.28 |
35.71 |
37.37 |
37.39 |
PiQA |
64.09 |
64.74 |
67.97 |
66.15 |
66.88 |
63.84 |
64.47 |
65.05 |
OpenBookQA |
29.47 |
28.70 |
29.57 |
29.77 |
29.50 |
29.13 |
29.47 |
28.00 |
Lambada |
26.69 |
33.00 |
31.60 |
33.08 |
31.49 |
27.69 |
26.99 |
29.54 |
SciQ |
80.03 |
79.17 |
80.12 |
80.22 |
81.92 |
78.23 |
77.42 |
80.87 |
COPA |
67.67 |
65.50 |
69.00 |
65.67 |
68.33 |
63.33 |
64.67 |
67.17 |
RACE |
30.05 |
30.19 |
30.96 |
30.37 |
30.08 |
29.62 |
30.13 |
29.92 |
ARC Easy |
47.50 |
46.90 |
50.26 |
48.57 |
50.55 |
46.96 |
48.77 |
48.79 |
LogiQA |
27.24 |
25.55 |
25.86 |
24.37 |
25.32 |
25.12 |
26.40 |
24.30 |
QQP |
49.68 |
55.43 |
50.94 |
50.91 |
51.99 |
53.53 |
49.53 |
51.36 |
WinoGrande |
51.68 |
52.12 |
51.93 |
51.50 |
52.32 |
51.67 |
52.13 |
52.63 |
MultiRC |
51.24 |
51.91 |
50.33 |
52.42 |
52.52 |
54.04 |
52.05 |
53.04 |
Average |
45.82 |
46.43 |
47.15 |
46.52 |
47.29 |
45.57 |
45.58 |
46.28 |
Table 5: Model Index 33-40
Task |
Model 33 |
Model 34 |
Model 35 |
Model 36 |
Model 37 |
Model 38 |
Model 39 |
Model 40 |
Social IQA |
33.48 |
33.28 |
33.35 |
33.29 |
33.63 |
33.61 |
33.21 |
33.61 |
HellaSwag |
38.00 |
40.18 |
43.37 |
37.69 |
32.96 |
32.98 |
37.31 |
37.79 |
PiQA |
65.30 |
66.68 |
69.04 |
66.46 |
62.25 |
60.17 |
65.24 |
65.32 |
OpenBookQA |
29.43 |
30.37 |
30.43 |
27.63 |
26.43 |
26.83 |
27.97 |
28.70 |
Lambada |
26.59 |
31.46 |
31.71 |
30.21 |
18.92 |
20.29 |
28.10 |
28.58 |
SciQ |
79.82 |
80.58 |
82.13 |
80.83 |
76.73 |
77.90 |
79.12 |
79.60 |
COPA |
64.33 |
69.33 |
67.00 |
67.83 |
61.50 |
62.67 |
64.67 |
66.00 |
RACE |
30.03 |
30.16 |
32.47 |
30.49 |
29.27 |
28.12 |
30.11 |
30.21 |
ARC Easy |
48.86 |
49.88 |
52.22 |
48.32 |
44.86 |
45.54 |
48.15 |
48.86 |
LogiQA |
25.91 |
24.30 |
23.35 |
24.96 |
26.19 |
27.68 |
25.47 |
25.37 |
QQP |
56.06 |
56.56 |
52.57 |
56.70 |
52.54 |
48.04 |
49.81 |
57.12 |
WinoGrande |
50.92 |
50.97 |
52.39 |
52.70 |
52.30 |
51.68 |
51.42 |
52.80 |
MultiRC |
53.09 |
49.97 |
52.18 |
49.05 |
53.78 |
52.27 |
51.45 |
55.68 |
Average |
46.29 |
47.21 |
47.86 |
46.63 |
43.95 |
43.67 |
45.54 |
46.90 |
Table 6: Model Index 41-48
Task |
Model 41 |
Model 42 |
Model 43 |
Model 44 |
Model 45 |
Model 46 |
Model 47 |
Model 48 |
Social IQA |
33.49 |
33.43 |
33.07 |
33.28 |
33.44 |
33.08 |
33.78 |
33.17 |
HellaSwag |
34.51 |
37.59 |
42.69 |
37.37 |
38.31 |
38.30 |
39.67 |
41.07 |
PiQA |
62.24 |
65.58 |
68.05 |
66.62 |
66.54 |
65.52 |
66.98 |
67.21 |
OpenBookQA |
27.10 |
28.77 |
28.90 |
28.07 |
28.07 |
27.60 |
31.17 |
29.73 |
Lambada |
22.78 |
26.99 |
31.34 |
29.51 |
27.87 |
29.47 |
30.34 |
32.71 |
SciQ |
77.78 |
80.25 |
79.47 |
80.25 |
80.70 |
79.72 |
81.35 |
81.77 |
COPA |
64.00 |
66.33 |
67.00 |
67.00 |
67.33 |
68.33 |
67.17 |
67.67 |
RACE |
28.33 |
28.82 |
30.78 |
30.80 |
30.08 |
30.24 |
30.24 |
30.67 |
ARC Easy |
45.48 |
48.64 |
51.49 |
46.99 |
48.79 |
48.05 |
49.58 |
49.49 |
LogiQA |
24.83 |
24.96 |
24.76 |
23.25 |
26.06 |
25.55 |
24.32 |
24.68 |
QQP |
50.27 |
54.73 |
53.96 |
57.00 |
53.73 |
51.19 |
57.52 |
56.91 |
WinoGrande |
51.79 |
51.63 |
51.32 |
50.76 |
53.18 |
52.45 |
50.72 |
52.24 |
MultiRC |
54.03 |
53.96 |
48.91 |
50.74 |
53.01 |
50.89 |
47.63 |
53.84 |
Average |
44.35 |
46.28 |
47.06 |
46.28 |
46.70 |
46.18 |
46.96 |
47.78 |
Table 7: Model Index 49-56
Task |
Model 49 |
Model 50 |
Model 51 |
Model 52 |
Model 53 |
Model 54 |
Model 55 |
Model 56 |
Social IQA |
33.53 |
33.74 |
33.37 |
33.41 |
32.96 |
33.88 |
33.75 |
33.79 |
HellaSwag |
39.09 |
35.65 |
38.68 |
36.07 |
37.68 |
38.53 |
35.40 |
40.50 |
PiQA |
66.81 |
64.58 |
65.68 |
63.99 |
65.85 |
65.76 |
64.51 |
66.89 |
OpenBookQA |
29.13 |
27.57 |
28.27 |
29.10 |
29.43 |
28.73 |
28.30 |
29.87 |
Lambada |
30.23 |
26.19 |
30.29 |
30.84 |
29.76 |
29.03 |
28.63 |
30.74 |
SciQ |
79.90 |
80.83 |
78.40 |
80.03 |
81.38 |
80.92 |
77.75 |
82.07 |
COPA |
68.17 |
61.83 |
67.00 |
66.00 |
66.17 |
63.17 |
66.33 |
64.00 |
RACE |
31.42 |
29.35 |
30.41 |
31.08 |
30.77 |
29.73 |
30.80 |
31.42 |
ARC Easy |
49.54 |
47.71 |
49.02 |
47.64 |
48.38 |
49.36 |
46.96 |
51.22 |
LogiQA |
24.99 |
24.58 |
25.32 |
24.91 |
25.17 |
26.22 |
24.63 |
24.91 |
QQP |
54.06 |
56.48 |
50.96 |
56.62 |
56.45 |
53.86 |
53.85 |
53.26 |
WinoGrande |
50.51 |
50.26 |
51.83 |
51.33 |
52.18 |
51.89 |
51.59 |
50.50 |
MultiRC |
50.25 |
54.37 |
50.94 |
52.38 |
51.21 |
55.34 |
54.52 |
50.50 |
Average |
46.74 |
45.63 |
46.17 |
46.42 |
46.72 |
46.65 |
45.92 |
46.90 |
Table 8: Model Index 57-64
Task |
Model 57 |
Model 58 |
Model 59 |
Model 60 |
Model 61 |
Model 62 |
Model 63 |
Model 64 |
Social IQA |
33.24 |
33.30 |
33.56 |
33.54 |
33.42 |
33.84 |
33.32 |
33.55 |
HellaSwag |
41.74 |
39.63 |
35.36 |
38.83 |
38.53 |
36.46 |
38.80 |
36.43 |
PiQA |
68.07 |
67.31 |
64.44 |
66.38 |
66.50 |
64.74 |
66.54 |
64.87 |
OpenBookQA |
29.20 |
29.50 |
28.10 |
27.97 |
27.83 |
27.37 |
28.83 |
27.87 |
Lambada |
31.79 |
31.11 |
27.32 |
30.17 |
28.75 |
26.22 |
30.38 |
26.25 |
SciQ |
80.42 |
79.83 |
80.85 |
79.60 |
78.93 |
80.05 |
79.50 |
78.65 |
COPA |
66.17 |
69.00 |
64.00 |
64.83 |
67.00 |
64.00 |
66.00 |
66.83 |
RACE |
31.39 |
29.82 |
29.67 |
30.08 |
29.98 |
29.46 |
30.37 |
29.19 |
ARC Easy |
51.14 |
49.24 |
47.13 |
47.88 |
48.20 |
47.09 |
49.09 |
46.90 |
LogiQA |
25.19 |
25.93 |
23.68 |
25.17 |
25.70 |
25.52 |
26.50 |
26.65 |
QQP |
55.37 |
54.46 |
52.73 |
53.17 |
59.65 |
58.15 |
57.50 |
55.31 |
WinoGrande |
53.21 |
51.46 |
50.83 |
52.16 |
52.37 |
51.41 |
51.63 |
51.85 |
MultiRC |
53.58 |
52.31 |
52.22 |
53.03 |
50.41 |
52.17 |
52.27 |
51.50 |
Average |
47.73 |
47.15 |
45.38 |
46.37 |
46.71 |
45.88 |
46.98 |
45.84 |