cpo-xcomet-xl_xxl-inc7b-10p-shuff-1e-7-full-tiny

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the Unbabel/TowerAligned-v0.1 dataset. It achieves the following results on the evaluation set:

Loss: 2.7960
Nll Loss: 1.0602
Logps/best: -102.8461
Rewards/chosen: -10.2846
Rewards/rejected: -9.6988
Rewards/accuracies: 0.4600
Rewards/margins: -0.5858
Logps/rejected: -96.9882
Logps/chosen: -102.8461
Logits/rejected: -1.8264
Logits/chosen: -1.9635

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 1
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Nll Loss	Logps/best	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
2.9687	0.1063	100	2.8066	1.0659	-103.3585	-10.3358	-9.7431	0.4540	-0.5928	-97.4308	-103.3585	-1.8274	-1.9646
3.0173	0.2127	200	2.8063	1.0656	-103.3396	-10.3340	-9.7402	0.4560	-0.5937	-97.4022	-103.3396	-1.8275	-1.9648
2.8267	0.3190	300	2.8046	1.0650	-103.2849	-10.3285	-9.7373	0.4540	-0.5912	-97.3725	-103.2849	-1.8273	-1.9644
2.9404	0.4254	400	2.8046	1.0644	-103.2318	-10.3232	-9.7301	0.4600	-0.5931	-97.3013	-103.2318	-1.8271	-1.9643
3.3065	0.5317	500	2.8002	1.0637	-103.1556	-10.3156	-9.7280	0.4600	-0.5875	-97.2803	-103.1556	-1.8268	-1.9640
2.9333	0.6381	600	2.8021	1.0633	-103.1282	-10.3128	-9.7212	0.4560	-0.5916	-97.2122	-103.1282	-1.8271	-1.9642
3.2698	0.7444	700	2.8006	1.0627	-103.0742	-10.3074	-9.7178	0.4580	-0.5897	-97.1777	-103.0742	-1.8268	-1.9640
2.7002	0.8508	800	2.8003	1.0624	-103.0458	-10.3046	-9.7147	0.4580	-0.5899	-97.1470	-103.0458	-1.8269	-1.9641
3.0848	0.9571	900	2.7984	1.0620	-103.0023	-10.3002	-9.7132	0.4580	-0.5870	-97.1324	-103.0023	-1.8267	-1.9638
2.9243	1.0635	1000	2.7987	1.0617	-102.9805	-10.2980	-9.7086	0.4580	-0.5895	-97.0859	-102.9805	-1.8268	-1.9639
2.7945	1.1698	1100	2.7974	1.0615	-102.9564	-10.2956	-9.7084	0.4580	-0.5872	-97.0842	-102.9564	-1.8267	-1.9638
2.7893	1.2762	1200	2.7979	1.0613	-102.9413	-10.2941	-9.7061	0.4620	-0.5880	-97.0609	-102.9413	-1.8266	-1.9638
3.2162	1.3825	1300	2.7978	1.0611	-102.9208	-10.2921	-9.7039	0.4540	-0.5882	-97.0387	-102.9208	-1.8266	-1.9637
2.8123	1.4889	1400	2.7980	1.0611	-102.9247	-10.2925	-9.7032	0.4580	-0.5893	-97.0320	-102.9247	-1.8266	-1.9637
2.785	1.5952	1500	2.7973	1.0606	-102.8798	-10.2880	-9.6993	0.4560	-0.5887	-96.9928	-102.8798	-1.8265	-1.9636
2.7997	1.7016	1600	2.7952	1.0606	-102.8751	-10.2875	-9.7026	0.4600	-0.5849	-97.0257	-102.8751	-1.8267	-1.9638
2.6655	1.8079	1700	2.7956	1.0605	-102.8628	-10.2863	-9.7005	0.4620	-0.5858	-97.0050	-102.8628	-1.8264	-1.9635
2.7597	1.9143	1800	2.7966	1.0605	-102.8715	-10.2871	-9.6999	0.4540	-0.5872	-96.9991	-102.8715	-1.8267	-1.9637
2.9736	2.0206	1900	2.7955	1.0603	-102.8511	-10.2851	-9.6990	0.4600	-0.5861	-96.9900	-102.8511	-1.8266	-1.9637
2.8977	2.1270	2000	2.7954	1.0603	-102.8550	-10.2855	-9.6990	0.4560	-0.5865	-96.9901	-102.8550	-1.8270	-1.9641
2.7043	2.2333	2100	2.7961	1.0604	-102.8632	-10.2863	-9.6997	0.4560	-0.5867	-96.9967	-102.8632	-1.8264	-1.9635
2.7693	2.3396	2200	2.7951	1.0604	-102.8550	-10.2855	-9.6998	0.4600	-0.5857	-96.9983	-102.8550	-1.8263	-1.9634
2.6632	2.4460	2300	2.7943	1.0602	-102.8407	-10.2841	-9.6989	0.4600	-0.5851	-96.9893	-102.8407	-1.8264	-1.9635
3.2451	2.5523	2400	2.7953	1.0602	-102.8434	-10.2843	-9.6989	0.4580	-0.5855	-96.9885	-102.8434	-1.8264	-1.9635
2.7117	2.6587	2500	2.7955	1.0601	-102.8357	-10.2836	-9.6962	0.4580	-0.5873	-96.9625	-102.8357	-1.8263	-1.9634
3.148	2.7650	2600	2.7967	1.0604	-102.8636	-10.2864	-9.6985	0.4560	-0.5878	-96.9853	-102.8636	-1.8265	-1.9636
3.2951	2.8714	2700	2.7959	1.0602	-102.8490	-10.2849	-9.6981	0.4620	-0.5868	-96.9812	-102.8490	-1.8263	-1.9634
2.8486	2.9777	2800	2.7960	1.0602	-102.8461	-10.2846	-9.6988	0.4600	-0.5858	-96.9882	-102.8461	-1.8264	-1.9635

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1

martimfasantos
/

cpo-xcomet-xl_xxl-inc7b-10p-shuff-1e-7-full-tiny

cpo-xcomet-xl_xxl-inc7b-10p-shuff-1e-7-full-tiny

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for martimfasantos/cpo-xcomet-xl_xxl-inc7b-10p-shuff-1e-7-full-tiny

Evaluation results