--- license: mit datasets: - sail/regmix-data - sail/regmix-data-sample language: - en --- # Models Trained with Random Mixture This is a collection of 64 language models, each with approximately 1B parameters, trained on different random mixtures of data. This project aims to validate the generalization capabilities of the RegMix approach (https://huggingface.co/papers/2407.01492) from small-scale (e.g., 1M parameters) to large-scale (e.g., 1B parameters) models. ## Key Features - **Model Size**: 64 separate models, each with ~1B parameters - **Training Data**: Random data mixtures on the [RegMix-Data](https://huggingface.co/datasets/sail/regmix-data) dataset - **Purpose**: To validate the effectiveness of RegMix on identifying high-performing data mixture ## Dataset The models were trained using the [RegMix-Data](https://huggingface.co/datasets/sail/regmix-data) dataset, which is split into different domains from The Pile dataset. ## Training Hyperparameters | Hyperparameter | Value | |:---------------|:------| | Batch Size | 1M tokens | | Learning Rate | 4e-4 | | Minimum Learning Rate | 1e-5 | | Learning Rate Schedule | Cosine | | Warmup Ratio | 4% | | Total Tokens | 25B | ## How to Load a Model You can load any model using the corresponding branch with the Hugging Face Transformers library: ```python from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1") tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1") ``` ## Data Mixture The specific data mixture used for training each 1B model can be found in the file `train_config.yaml` in each corresponding model branch. ## Model Variants To access different model variants, simply change the `revision` parameter in the `from_pretrained` method to the desired model index (e.g., "model-index-2", "model-index-3"), and the maxium index is 64. ## Usage Notes - These models are primarily intended for research purposes. - Performance may vary depending on the specific task and domain. ## Citation If you use these models in your research, please cite the RegMix paper: ``` @misc{liu2024regmix, title={RegMix: Data Mixture as Regression for Language Model Pre-training}, author={Qian Liu and Xiaosen Zheng and Niklas Muennighoff and Guangtao Zeng and Longxu Dou and Tianyu Pang and Jing Jiang and Min Lin}, year={2024}, eprint={2407.01492}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.01492}, } ``` For more information about the RegMix methodology and its applications, please refer to the [original paper](https://huggingface.co/papers/2407.01492). ## Performance We evaluated each model using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The performance metric for each task is the average of 0-shot to 5-shot `accnorm` (accuracy normalized, if available) or `acc` (accuracy) scores. ### Table 1: Model Index 1-8 | Task | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | Model 8 | |---------------|---------|---------|---------|---------|---------|---------|---------|---------| | Social IQA | 33.27 | 33.33 | 33.62 | 33.53 | 33.49 | 33.56 | 33.62 | 33.55 | | HellaSwag | 40.58 | 36.86 | 40.58 | 36.06 | 40.07 | 37.85 | 37.93 | 39.59 | | PiQA | 67.29 | 65.14 | 67.97 | 64.66 | 67.03 | 65.36 | 66.00 | 66.55 | | OpenBookQA | 28.63 | 27.87 | 29.33 | 29.10 | 29.23 | 28.33 | 29.13 | 28.73 | | Lambada | 29.17 | 26.86 | 31.55 | 27.11 | 29.16 | 28.92 | 31.53 | 30.92 | | SciQ | 80.68 | 79.98 | 81.05 | 80.80 | 82.40 | 79.88 | 78.67 | 79.70 | | COPA | 70.50 | 63.83 | 69.17 | 65.00 | 67.50 | 66.00 | 66.67 | 68.67 | | RACE | 29.47 | 30.00 | 32.11 | 28.82 | 31.13 | 30.06 | 29.90 | 30.75 | | ARC Easy | 50.03 | 48.72 | 50.01 | 46.64 | 51.06 | 47.46 | 46.75 | 48.39 | | LogiQA | 23.76 | 24.17 | 25.29 | 25.29 | 24.55 | 25.96 | 25.45 | 26.32 | | QQP | 55.71 | 55.90 | 54.84 | 56.52 | 54.01 | 56.34 | 52.35 | 54.20 | | WinoGrande | 51.54 | 51.59 | 51.39 | 50.91 | 53.13 | 52.26 | 51.26 | 51.45 | | MultiRC | 52.65 | 53.39 | 51.89 | 50.92 | 49.03 | 53.09 | 53.64 | 50.23 | | **Average** | **47.18** | **45.97** | **47.60** | **45.80** | **47.06** | **46.54** | **46.38** | **46.85** | ### Table 2: Model Index 9-16 | Task | Model 9 | Model 10 | Model 11 | Model 12 | Model 13 | Model 14 | Model 15 | Model 16 | |---------------|---------|----------|----------|----------|----------|----------|----------|----------| | Social IQA | 33.43 | 33.21 | 33.31 | 33.17 | 33.28 | 32.43 | 33.57 | 33.70 | | HellaSwag | 40.05 | 35.89 | 39.55 | 39.89 | 38.63 | 36.18 | 39.52 | 35.94 | | PiQA | 66.60 | 64.74 | 66.29 | 66.27 | 66.90 | 64.05 | 66.70 | 64.51 | | OpenBookQA | 28.87 | 26.60 | 29.33 | 28.73 | 29.40 | 27.87 | 29.67 | 27.83 | | Lambada | 31.39 | 27.37 | 30.32 | 30.31 | 31.38 | 26.25 | 29.86 | 26.95 | | SciQ | 81.10 | 79.12 | 79.97 | 82.85 | 79.42 | 81.40 | 81.38 | 81.23 | | COPA | 67.00 | 64.50 | 66.83 | 69.50 | 67.33 | 65.83 | 69.50 | 66.33 | | RACE | 30.57 | 29.63 | 30.49 | 30.85 | 30.35 | 28.66 | 31.21 | 29.57 | | ARC Easy | 50.66 | 47.74 | 47.47 | 50.18 | 49.92 | 49.52 | 50.73 | 48.65 | | LogiQA | 23.60 | 25.65 | 26.37 | 23.81 | 25.58 | 26.29 | 25.86 | 25.12 | | QQP | 54.89 | 54.79 | 54.20 | 55.23 | 53.69 | 57.09 | 53.95 | 54.24 | | WinoGrande | 50.83 | 51.84 | 51.05 | 51.83 | 52.12 | 52.00 | 51.01 | 51.82 | | MultiRC | 54.18 | 54.48 | 50.17 | 52.12 | 51.42 | 52.69 | 51.87 | 53.48 | | **Average** | **47.17** | **45.81** | **46.57** | **47.29** | **46.88** | **46.17** | **47.30** | **46.11** | ### Table 3: Model Index 17-24 | Task | Model 17 | Model 18 | Model 19 | Model 20 | Model 21 | Model 22 | Model 23 | Model 24 | |---------------|----------|----------|----------|----------|----------|----------|----------|----------| | Social IQA | 33.89 | 33.31 | 33.53 | 33.38 | 33.75 | 33.24 | 33.56 | 33.71 | | HellaSwag | 38.68 | 39.90 | 34.67 | 37.12 | 37.44 | 36.07 | 42.15 | 34.67 | | PiQA | 66.83 | 67.39 | 63.33 | 64.83 | 65.00 | 63.68 | 67.80 | 62.99 | | OpenBookQA | 28.13 | 30.67 | 28.03 | 29.40 | 27.67 | 27.77 | 29.37 | 25.83 | | Lambada | 28.78 | 28.56 | 24.13 | 29.41 | 27.67 | 28.03 | 33.47 | 24.04 | | SciQ | 79.60 | 78.83 | 77.42 | 78.98 | 78.95 | 78.72 | 81.83 | 79.12 | | COPA | 65.17 | 68.17 | 65.33 | 67.33 | 67.67 | 62.67 | 69.83 | 65.83 | | RACE | 28.74 | 30.03 | 29.76 | 29.49 | 30.77 | 29.76 | 31.21 | 27.91 | | ARC Easy | 48.86 | 49.42 | 47.90 | 48.30 | 47.88 | 46.68 | 50.92 | 45.24 | | LogiQA | 25.91 | 26.34 | 26.24 | 25.76 | 26.11 | 26.24 | 24.17 | 25.91 | | QQP | 53.35 | 53.18 | 50.61 | 51.49 | 54.27 | 54.99 | 52.77 | 55.19 | | WinoGrande | 52.54 | 51.17 | 52.01 | 51.09 | 52.13 | 52.03 | 52.50 | 50.28 | | MultiRC | 51.49 | 52.45 | 55.40 | 54.87 | 51.73 | 49.49 | 50.61 | 50.29 | | **Average** | **46.30** | **46.88** | **45.26** | **46.27** | **46.23** | **45.34** | **47.71** | **44.69** | ### Table 4: Model Index 25-32 | Task | Model 25 | Model 26 | Model 27 | Model 28 | Model 29 | Model 30 | Model 31 | Model 32 | |---------------|----------|----------|----------|----------|----------|----------|----------|----------| | Social IQA | 33.51 | 33.40 | 33.59 | 33.52 | 33.53 | 33.49 | 33.16 | 33.56 | | HellaSwag | 36.75 | 36.97 | 40.81 | 38.25 | 40.28 | 35.71 | 37.37 | 37.39 | | PiQA | 64.09 | 64.74 | 67.97 | 66.15 | 66.88 | 63.84 | 64.47 | 65.05 | | OpenBookQA | 29.47 | 28.70 | 29.57 | 29.77 | 29.50 | 29.13 | 29.47 | 28.00 | | Lambada | 26.69 | 33.00 | 31.60 | 33.08 | 31.49 | 27.69 | 26.99 | 29.54 | | SciQ | 80.03 | 79.17 | 80.12 | 80.22 | 81.92 | 78.23 | 77.42 | 80.87 | | COPA | 67.67 | 65.50 | 69.00 | 65.67 | 68.33 | 63.33 | 64.67 | 67.17 | | RACE | 30.05 | 30.19 | 30.96 | 30.37 | 30.08 | 29.62 | 30.13 | 29.92 | | ARC Easy | 47.50 | 46.90 | 50.26 | 48.57 | 50.55 | 46.96 | 48.77 | 48.79 | | LogiQA | 27.24 | 25.55 | 25.86 | 24.37 | 25.32 | 25.12 | 26.40 | 24.30 | | QQP | 49.68 | 55.43 | 50.94 | 50.91 | 51.99 | 53.53 | 49.53 | 51.36 | | WinoGrande | 51.68 | 52.12 | 51.93 | 51.50 | 52.32 | 51.67 | 52.13 | 52.63 | | MultiRC | 51.24 | 51.91 | 50.33 | 52.42 | 52.52 | 54.04 | 52.05 | 53.04 | | **Average** | **45.82** | **46.43** | **47.15** | **46.52** | **47.29** | **45.57** | **45.58** | **46.28** | ### Table 5: Model Index 33-40 | Task | Model 33 | Model 34 | Model 35 | Model 36 | Model 37 | Model 38 | Model 39 | Model 40 | |---------------|----------|----------|----------|----------|----------|----------|----------|----------| | Social IQA | 33.48 | 33.28 | 33.35 | 33.29 | 33.63 | 33.61 | 33.21 | 33.61 | | HellaSwag | 38.00 | 40.18 | 43.37 | 37.69 | 32.96 | 32.98 | 37.31 | 37.79 | | PiQA | 65.30 | 66.68 | 69.04 | 66.46 | 62.25 | 60.17 | 65.24 | 65.32 | | OpenBookQA | 29.43 | 30.37 | 30.43 | 27.63 | 26.43 | 26.83 | 27.97 | 28.70 | | Lambada | 26.59 | 31.46 | 31.71 | 30.21 | 18.92 | 20.29 | 28.10 | 28.58 | | SciQ | 79.82 | 80.58 | 82.13 | 80.83 | 76.73 | 77.90 | 79.12 | 79.60 | | COPA | 64.33 | 69.33 | 67.00 | 67.83 | 61.50 | 62.67 | 64.67 | 66.00 | | RACE | 30.03 | 30.16 | 32.47 | 30.49 | 29.27 | 28.12 | 30.11 | 30.21 | | ARC Easy | 48.86 | 49.88 | 52.22 | 48.32 | 44.86 | 45.54 | 48.15 | 48.86 | | LogiQA | 25.91 | 24.30 | 23.35 | 24.96 | 26.19 | 27.68 | 25.47 | 25.37 | | QQP | 56.06 | 56.56 | 52.57 | 56.70 | 52.54 | 48.04 | 49.81 | 57.12 | | WinoGrande | 50.92 | 50.97 | 52.39 | 52.70 | 52.30 | 51.68 | 51.42 | 52.80 | | MultiRC | 53.09 | 49.97 | 52.18 | 49.05 | 53.78 | 52.27 | 51.45 | 55.68 | | **Average** | **46.29** | **47.21** | **47.86** | **46.63** | **43.95** | **43.67** | **45.54** | **46.90** | ### Table 6: Model Index 41-48 | Task | Model 41 | Model 42 | Model 43 | Model 44 | Model 45 | Model 46 | Model 47 | Model 48 | |---------------|----------|----------|----------|----------|----------|----------|----------|----------| | Social IQA | 33.49 | 33.43 | 33.07 | 33.28 | 33.44 | 33.08 | 33.78 | 33.17 | | HellaSwag | 34.51 | 37.59 | 42.69 | 37.37 | 38.31 | 38.30 | 39.67 | 41.07 | | PiQA | 62.24 | 65.58 | 68.05 | 66.62 | 66.54 | 65.52 | 66.98 | 67.21 | | OpenBookQA | 27.10 | 28.77 | 28.90 | 28.07 | 28.07 | 27.60 | 31.17 | 29.73 | | Lambada | 22.78 | 26.99 | 31.34 | 29.51 | 27.87 | 29.47 | 30.34 | 32.71 | | SciQ | 77.78 | 80.25 | 79.47 | 80.25 | 80.70 | 79.72 | 81.35 | 81.77 | | COPA | 64.00 | 66.33 | 67.00 | 67.00 | 67.33 | 68.33 | 67.17 | 67.67 | | RACE | 28.33 | 28.82 | 30.78 | 30.80 | 30.08 | 30.24 | 30.24 | 30.67 | | ARC Easy | 45.48 | 48.64 | 51.49 | 46.99 | 48.79 | 48.05 | 49.58 | 49.49 | | LogiQA | 24.83 | 24.96 | 24.76 | 23.25 | 26.06 | 25.55 | 24.32 | 24.68 | | QQP | 50.27 | 54.73 | 53.96 | 57.00 | 53.73 | 51.19 | 57.52 | 56.91 | | WinoGrande | 51.79 | 51.63 | 51.32 | 50.76 | 53.18 | 52.45 | 50.72 | 52.24 | | MultiRC | 54.03 | 53.96 | 48.91 | 50.74 | 53.01 | 50.89 | 47.63 | 53.84 | | **Average** | **44.35** | **46.28** | **47.06** | **46.28** | **46.70** | **46.18** | **46.96** | **47.78** | ## Table 7: Model Index 49-56 | Task | Model 49 | Model 50 | Model 51 | Model 52 | Model 53 | Model 54 | Model 55 | Model 56 | |---------------|----------|----------|----------|----------|----------|----------|----------|----------| | Social IQA | 33.53 | 33.74 | 33.37 | 33.41 | 32.96 | 33.88 | 33.75 | 33.79 | | HellaSwag | 39.09 | 35.65 | 38.68 | 36.07 | 37.68 | 38.53 | 35.40 | 40.50 | | PiQA | 66.81 | 64.58 | 65.68 | 63.99 | 65.85 | 65.76 | 64.51 | 66.89 | | OpenBookQA | 29.13 | 27.57 | 28.27 | 29.10 | 29.43 | 28.73 | 28.30 | 29.87 | | Lambada | 30.23 | 26.19 | 30.29 | 30.84 | 29.76 | 29.03 | 28.63 | 30.74 | | SciQ | 79.90 | 80.83 | 78.40 | 80.03 | 81.38 | 80.92 | 77.75 | 82.07 | | COPA | 68.17 | 61.83 | 67.00 | 66.00 | 66.17 | 63.17 | 66.33 | 64.00 | | RACE | 31.42 | 29.35 | 30.41 | 31.08 | 30.77 | 29.73 | 30.80 | 31.42 | | ARC Easy | 49.54 | 47.71 | 49.02 | 47.64 | 48.38 | 49.36 | 46.96 | 51.22 | | LogiQA | 24.99 | 24.58 | 25.32 | 24.91 | 25.17 | 26.22 | 24.63 | 24.91 | | QQP | 54.06 | 56.48 | 50.96 | 56.62 | 56.45 | 53.86 | 53.85 | 53.26 | | WinoGrande | 50.51 | 50.26 | 51.83 | 51.33 | 52.18 | 51.89 | 51.59 | 50.50 | | MultiRC | 50.25 | 54.37 | 50.94 | 52.38 | 51.21 | 55.34 | 54.52 | 50.50 | | **Average** | **46.74** | **45.63** | **46.17** | **46.42** | **46.72** | **46.65** | **45.92** | **46.90** | ## Table 8: Model Index 57-64 | Task | Model 57 | Model 58 | Model 59 | Model 60 | Model 61 | Model 62 | Model 63 | Model 64 | |---------------|----------|----------|----------|----------|----------|----------|----------|----------| | Social IQA | 33.24 | 33.30 | 33.56 | 33.54 | 33.42 | 33.84 | 33.32 | 33.55 | | HellaSwag | 41.74 | 39.63 | 35.36 | 38.83 | 38.53 | 36.46 | 38.80 | 36.43 | | PiQA | 68.07 | 67.31 | 64.44 | 66.38 | 66.50 | 64.74 | 66.54 | 64.87 | | OpenBookQA | 29.20 | 29.50 | 28.10 | 27.97 | 27.83 | 27.37 | 28.83 | 27.87 | | Lambada | 31.79 | 31.11 | 27.32 | 30.17 | 28.75 | 26.22 | 30.38 | 26.25 | | SciQ | 80.42 | 79.83 | 80.85 | 79.60 | 78.93 | 80.05 | 79.50 | 78.65 | | COPA | 66.17 | 69.00 | 64.00 | 64.83 | 67.00 | 64.00 | 66.00 | 66.83 | | RACE | 31.39 | 29.82 | 29.67 | 30.08 | 29.98 | 29.46 | 30.37 | 29.19 | | ARC Easy | 51.14 | 49.24 | 47.13 | 47.88 | 48.20 | 47.09 | 49.09 | 46.90 | | LogiQA | 25.19 | 25.93 | 23.68 | 25.17 | 25.70 | 25.52 | 26.50 | 26.65 | | QQP | 55.37 | 54.46 | 52.73 | 53.17 | 59.65 | 58.15 | 57.50 | 55.31 | | WinoGrande | 53.21 | 51.46 | 50.83 | 52.16 | 52.37 | 51.41 | 51.63 | 51.85 | | MultiRC | 53.58 | 52.31 | 52.22 | 53.03 | 50.41 | 52.17 | 52.27 | 51.50 | | **Average** | **47.73** | **47.15** | **45.38** | **46.37** | **46.71** | **45.88** | **46.98** | **45.84** |