metadata

license: apache-2.0
base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
tags:
  - xcomet_xl_xxl
  - generated_from_trainer
model-index:
  - name: cpo-xcomet-xl_xxl-inc7b-10p-shuff-5e-7-full-tiny
    results: []

cpo-xcomet-xl_xxl-inc7b-10p-shuff-5e-7-full-tiny

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the Unbabel/TowerAligned-v0.1 dataset. It achieves the following results on the evaluation set:

Loss: 2.5861
Nll Loss: 0.9631
Logps/best: -93.9342
Rewards/chosen: -9.3934
Rewards/rejected: -8.9617
Rewards/accuracies: 0.4760
Rewards/margins: -0.4317
Logps/rejected: -89.6171
Logps/chosen: -93.9342
Logits/rejected: -1.8016
Logits/chosen: -1.9358

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 1
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Nll Loss	Logps/best	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
2.9692	0.1063	100	2.8057	1.0649	-103.2722	-10.3272	-9.7340	0.4600	-0.5932	-97.3398	-103.2722	-1.8268	-1.9640
3.0031	0.2127	200	2.7925	1.0588	-102.7081	-10.2708	-9.6860	0.4580	-0.5848	-96.8598	-102.7081	-1.8260	-1.9630
2.7823	0.3190	300	2.7675	1.0456	-101.5052	-10.1505	-9.5824	0.4600	-0.5681	-95.8241	-101.5052	-1.8242	-1.9611
2.8692	0.4254	400	2.7382	1.0320	-100.2503	-10.0250	-9.4771	0.4580	-0.5479	-94.7710	-100.2503	-1.8210	-1.9575
3.1882	0.5317	500	2.7126	1.0203	-99.1754	-9.9175	-9.3884	0.4580	-0.5291	-93.8843	-99.1754	-1.8185	-1.9548
2.8104	0.6381	600	2.6920	1.0101	-98.2426	-9.8243	-9.3098	0.4640	-0.5144	-93.0982	-98.2426	-1.8161	-1.9520
3.127	0.7444	700	2.6735	1.0018	-97.4806	-9.7481	-9.2466	0.4680	-0.5015	-92.4660	-97.4806	-1.8148	-1.9505
2.5488	0.8508	800	2.6599	0.9951	-96.8657	-9.6866	-9.1952	0.4640	-0.4914	-91.9516	-96.8657	-1.8113	-1.9467
2.9106	0.9571	900	2.6461	0.9892	-96.3278	-9.6328	-9.1532	0.4700	-0.4795	-91.5325	-96.3278	-1.8107	-1.9459
2.7349	1.0635	1000	2.6355	0.9845	-95.8978	-9.5898	-9.1181	0.4660	-0.4717	-91.1811	-95.8978	-1.8091	-1.9442
2.607	1.1698	1100	2.6258	0.9802	-95.4924	-9.5492	-9.0851	0.4660	-0.4642	-90.8505	-95.4924	-1.8079	-1.9429
2.5949	1.2762	1200	2.6187	0.9772	-95.2189	-9.5219	-9.0638	0.4660	-0.4581	-90.6378	-95.2189	-1.8067	-1.9415
3.0028	1.3825	1300	2.6133	0.9745	-94.9713	-9.4971	-9.0415	0.4660	-0.4556	-90.4151	-94.9713	-1.8062	-1.9409
2.5891	1.4889	1400	2.6075	0.9720	-94.7468	-9.4747	-9.0242	0.4700	-0.4505	-90.2418	-94.7468	-1.8062	-1.9409
2.5647	1.5952	1500	2.6035	0.9701	-94.5637	-9.4564	-9.0103	0.4640	-0.4460	-90.1033	-94.5637	-1.8044	-1.9389
2.566	1.7016	1600	2.5974	0.9682	-94.3869	-9.4387	-8.9984	0.4660	-0.4403	-89.9837	-94.3869	-1.8037	-1.9382
2.4615	1.8079	1700	2.5960	0.9672	-94.3052	-9.4305	-8.9899	0.4760	-0.4406	-89.8987	-94.3052	-1.8029	-1.9373
2.5336	1.9143	1800	2.5936	0.9662	-94.2071	-9.4207	-8.9834	0.4700	-0.4373	-89.8344	-94.2071	-1.8026	-1.9369
2.7186	2.0206	1900	2.5908	0.9653	-94.1252	-9.4125	-8.9777	0.4820	-0.4349	-89.7766	-94.1252	-1.8021	-1.9364
2.6496	2.1270	2000	2.5912	0.9646	-94.0712	-9.4071	-8.9704	0.4700	-0.4367	-89.7039	-94.0712	-1.8020	-1.9363
2.4786	2.2333	2100	2.5882	0.9642	-94.0305	-9.4031	-8.9690	0.4820	-0.4340	-89.6904	-94.0305	-1.8022	-1.9365
2.5261	2.3396	2200	2.5871	0.9636	-93.9762	-9.3976	-8.9653	0.4720	-0.4323	-89.6528	-93.9762	-1.8015	-1.9357
2.4197	2.4460	2300	2.5855	0.9634	-93.9605	-9.3961	-8.9653	0.4720	-0.4308	-89.6529	-93.9605	-1.8015	-1.9357
2.9723	2.5523	2400	2.5863	0.9633	-93.9504	-9.3950	-8.9631	0.4760	-0.4319	-89.6311	-93.9504	-1.8016	-1.9358
2.4721	2.6587	2500	2.5864	0.9634	-93.9537	-9.3954	-8.9651	0.4740	-0.4302	-89.6514	-93.9537	-1.8014	-1.9356
2.8984	2.7650	2600	2.5856	0.9630	-93.9154	-9.3915	-8.9610	0.4700	-0.4305	-89.6100	-93.9154	-1.8014	-1.9356
3.0422	2.8714	2700	2.5848	0.9630	-93.9148	-9.3915	-8.9617	0.4800	-0.4298	-89.6169	-93.9148	-1.8015	-1.9357
2.6226	2.9777	2800	2.5861	0.9631	-93.9342	-9.3934	-8.9617	0.4760	-0.4317	-89.6171	-93.9342	-1.8016	-1.9358

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1