metadata
license: bigscience-openrail-m
datasets:
- iamplus/CoT
First Version of Fine Tuned Bloomz-7B1 model on CoT dataset from Flan Data Collection (v2) (~64k data) using HF Deepspeed
Base Model: bigscience/bloomz-7b1
Training Details :
- Epochs: 8
- Batch Size : 5 instantaneous per device x 2 gradient accumulation steps x 8 gpus = 80
- Max Length : 1024
- Weight Decay : 0
- Learning Rate : 5e-5
- Learning Rate Scheduler Type : Linear
- Number of warmup steps : 0
- Machine : 8xA100 80GB
Dataset Details :
Dataset : iamplus/CoT
Files :
- cot_fsnoopt.csv
- cot_fsopt.csv
- cot_zsnoopt.csv
- cot_zsopt.csv
Final Review :
- The model has just memorized/overfitted on the data and is not working good on the samples outside the training data.
- Also looks like it has changed the base model weights by too much (catastrophic forgetting).
- Similar problems with the Epoch 6 model as well.
- Epoch 2 model couldn't find middle ground and not performing well on training data and not on new data as well and increasing just the Epochs is leading to memorization as stated above.
Conclusion :
- Need more quality data for the model to really learn the patterns. Increasing just the epochs with less data only leads to overfitting.