iamplus
/

bloomz-7b1-cot-v1

Text Generation

Inference Endpoints

Model card Files Files and versions Community

bloomz-7b1-cot-v1 / README.md

manojpreveen's picture

Update README.md

139589e over 1 year ago

|

history blame contribute delete

1.26 kB

	---
	license: bigscience-openrail-m
	datasets:
	- iamplus/CoT
	---
	First Version of Fine Tuned Bloomz-7B1 model on CoT dataset from Flan Data Collection (v2) (~64k data) using *HF Deepspeed*

	Base Model: bigscience/bloomz-7b1

	Training Details :

	* Epochs: 8
	* Batch Size : 5 instantaneous per device x 2 gradient accumulation steps x 8 gpus = 80
	* Max Length : 1024
	* Weight Decay : 0
	* Learning Rate : 5e-5
	* Learning Rate Scheduler Type : Linear
	* Number of warmup steps : 0
	* Machine : 8xA100 80GB

	Dataset Details :

	Dataset : iamplus/CoT

	Files :
	* cot_fsnoopt.csv
	* cot_fsopt.csv
	* cot_zsnoopt.csv
	* cot_zsopt.csv

	Final Review :
	* The model has just memorized/overfitted on the data and is not working good on the samples outside the training data.
	* Also looks like it has changed the base model weights by too much (catastrophic forgetting).
	* Similar problems with the Epoch 6 model as well.
	* Epoch 2 model couldn't find middle ground and not performing well on training data and not on new data as well and increasing just the Epochs is leading to memorization as stated above.

	Conclusion :
	* Need more quality data for the model to really learn the patterns. Increasing just the epochs with less data only leads to overfitting.