metadata

library_name: transformers
tags:
  - generated_from_trainer
datasets:
  - kanishka/babylm2-clean-spacy
metrics:
  - accuracy
model-index:
  - name: opt-babylm2-clean-spacy-32k-earlystop_seed-42_1e-3
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: kanishka/babylm2-clean-spacy
          type: kanishka/babylm2-clean-spacy
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.43054403912594785

opt-babylm2-clean-spacy-32k-earlystop_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-clean-spacy dataset. It achieves the following results on the evaluation set:

Loss: 2.9103
Accuracy: 0.4305

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.9107	0.9995	1942	3.9887	0.3269
3.7896	1.9996	3885	3.5236	0.3657
3.3813	2.9997	5828	3.3040	0.3859
3.174	3.9997	7771	3.1921	0.3962
3.0533	4.9998	9714	3.1266	0.4026
2.9768	5.9999	11657	3.0838	0.4071
2.9232	6.9999	13600	3.0550	0.4101
2.8863	8.0	15543	3.0363	0.4122
2.8563	8.9995	17485	3.0208	0.4139
2.8356	9.9996	19428	3.0117	0.4151
2.816	10.9997	21371	3.0030	0.4162
2.8069	11.9997	23314	2.9951	0.4170
2.7941	12.9998	25257	2.9923	0.4175
2.7889	13.9999	27200	2.9888	0.4182
2.7802	14.9999	29143	2.9839	0.4186
2.7802	16.0	31086	2.9821	0.4190
2.7665	16.9995	33028	2.9626	0.4212
2.6908	17.9996	34971	2.9378	0.4247
2.6058	18.9997	36914	2.9145	0.4284
2.505	19.9910	38840	2.9103	0.4305

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.20.0