kanishka
/

opt-babylm2-20-epochs_seed-42_3e-4

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

opt-babylm2-20-epochs_seed-42_3e-4 / README.md

kanishka's picture

End of training

c770f39 verified about 2 months ago

|

history blame contribute delete

2.95 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	datasets:
	- kanishka/babylm2-sentence-tokenized
	metrics:
	- accuracy
	model-index:
	- name: opt-babylm2-20-epochs_seed-42_3e-4
	results:
	- task:
	name: Causal Language Modeling
	type: text-generation
	dataset:
	name: kanishka/babylm2-sentence-tokenized
	type: kanishka/babylm2-sentence-tokenized
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.5192642005255711
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# opt-babylm2-20-epochs_seed-42_3e-4

	This model was trained from scratch on the kanishka/babylm2-sentence-tokenized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.4950
	- Accuracy: 0.5193

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 32
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 32000
	- num_epochs: 20.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|:--------:\|
	\| 2.8017 \| 1.0 \| 21397 \| 2.9043 \| 0.4674 \|
	\| 2.5697 \| 2.0 \| 42794 \| 2.6914 \| 0.4903 \|
	\| 2.4593 \| 3.0 \| 64191 \| 2.5998 \| 0.5009 \|
	\| 2.3962 \| 4.0 \| 85588 \| 2.5532 \| 0.5062 \|
	\| 2.3371 \| 5.0 \| 106985 \| 2.5247 \| 0.5100 \|
	\| 2.3029 \| 6.0 \| 128382 \| 2.5101 \| 0.5121 \|
	\| 2.2663 \| 7.0 \| 149779 \| 2.4970 \| 0.5143 \|
	\| 2.2435 \| 8.0 \| 171176 \| 2.4892 \| 0.5155 \|
	\| 2.2171 \| 9.0 \| 192573 \| 2.4831 \| 0.5163 \|
	\| 2.1902 \| 10.0 \| 213970 \| 2.4811 \| 0.5171 \|
	\| 2.1695 \| 11.0 \| 235367 \| 2.4788 \| 0.5177 \|
	\| 2.1548 \| 12.0 \| 256764 \| 2.4811 \| 0.5182 \|
	\| 2.1307 \| 13.0 \| 278161 \| 2.4788 \| 0.5186 \|
	\| 2.1228 \| 14.0 \| 299558 \| 2.4802 \| 0.5188 \|
	\| 2.0984 \| 15.0 \| 320955 \| 2.4807 \| 0.5190 \|
	\| 2.0845 \| 16.0 \| 342352 \| 2.4828 \| 0.5192 \|
	\| 2.0687 \| 17.0 \| 363749 \| 2.4844 \| 0.5193 \|
	\| 2.0578 \| 18.0 \| 385146 \| 2.4892 \| 0.5193 \|
	\| 2.0413 \| 19.0 \| 406543 \| 2.4918 \| 0.5193 \|
	\| 2.0185 \| 20.0 \| 427940 \| 2.4950 \| 0.5193 \|


	### Framework versions

	- Transformers 4.45.1
	- Pytorch 2.4.1+cu121
	- Datasets 3.0.1
	- Tokenizers 0.20.0