kanishka
/

opt-babylm2-rewritten-clean-spacy-32k-earlystop-40epochs_seed-42_1e-3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

opt-babylm2-rewritten-clean-spacy-32k-earlystop-40epochs_seed-42_1e-3 / README.md

kanishka's picture

End of training

9c05b7c verified 18 days ago

|

history blame contribute delete

3.64 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	datasets:
	- kanishka/babylm2-rewritten-clean-spacy
	metrics:
	- accuracy
	model-index:
	- name: opt-babylm2-rewritten-clean-spacy-32k-earlystop-40epochs_seed-42_1e-3
	results:
	- task:
	name: Causal Language Modeling
	type: text-generation
	dataset:
	name: kanishka/babylm2-rewritten-clean-spacy
	type: kanishka/babylm2-rewritten-clean-spacy
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.42334742212654364
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# opt-babylm2-rewritten-clean-spacy-32k-earlystop-40epochs_seed-42_1e-3

	This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.9600
	- Accuracy: 0.4233

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 64
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 256
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 32000
	- num_epochs: 40.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|:--------:\|
	\| 5.9216 \| 0.9996 \| 1931 \| 4.0134 \| 0.3253 \|
	\| 3.7977 \| 1.9997 \| 3863 \| 3.5448 \| 0.3639 \|
	\| 3.3887 \| 2.9999 \| 5795 \| 3.3242 \| 0.3841 \|
	\| 3.1805 \| 4.0 \| 7727 \| 3.2082 \| 0.3949 \|
	\| 3.0632 \| 4.9996 \| 9658 \| 3.1432 \| 0.4012 \|
	\| 2.9865 \| 5.9997 \| 11590 \| 3.1010 \| 0.4056 \|
	\| 2.9347 \| 6.9999 \| 13522 \| 3.0715 \| 0.4087 \|
	\| 2.8953 \| 8.0 \| 15454 \| 3.0539 \| 0.4108 \|
	\| 2.8689 \| 8.9996 \| 17385 \| 3.0392 \| 0.4122 \|
	\| 2.8456 \| 9.9997 \| 19317 \| 3.0310 \| 0.4134 \|
	\| 2.8298 \| 10.9999 \| 21249 \| 3.0251 \| 0.4144 \|
	\| 2.817 \| 12.0 \| 23181 \| 3.0175 \| 0.4152 \|
	\| 2.8069 \| 12.9996 \| 25112 \| 3.0119 \| 0.4158 \|
	\| 2.7996 \| 13.9997 \| 27044 \| 3.0060 \| 0.4163 \|
	\| 2.7615 \| 14.9999 \| 28976 \| 3.0038 \| 0.4171 \|
	\| 2.7575 \| 16.0 \| 30908 \| 3.0022 \| 0.4169 \|
	\| 2.7573 \| 16.9996 \| 32839 \| 2.9962 \| 0.4179 \|
	\| 2.7451 \| 17.9997 \| 34771 \| 2.9867 \| 0.4189 \|
	\| 2.7275 \| 18.9999 \| 36703 \| 2.9804 \| 0.4201 \|
	\| 2.7099 \| 20.0 \| 38635 \| 2.9760 \| 0.4208 \|
	\| 2.693 \| 20.9996 \| 40566 \| 2.9683 \| 0.4216 \|
	\| 2.6785 \| 21.9997 \| 42498 \| 2.9666 \| 0.4221 \|
	\| 2.6628 \| 22.9999 \| 44430 \| 2.9646 \| 0.4227 \|
	\| 2.6501 \| 24.0 \| 46362 \| 2.9626 \| 0.4228 \|
	\| 2.6343 \| 24.9996 \| 48293 \| 2.9600 \| 0.4233 \|
	\| 2.6198 \| 25.9997 \| 50225 \| 2.9638 \| 0.4236 \|
	\| 2.604 \| 26.9999 \| 52157 \| 2.9604 \| 0.4240 \|
	\| 2.5876 \| 28.0 \| 54089 \| 2.9601 \| 0.4245 \|


	### Framework versions

	- Transformers 4.45.1
	- Pytorch 2.4.1+cu121
	- Datasets 3.0.1
	- Tokenizers 0.20.0