bloomz-7b1-cot-v1 / README.md
manojpreveen's picture
Duplicate from iamplus/bloomz-7b1-cot-v1
85f4377 verified
metadata
license: bigscience-openrail-m
datasets:
  - iamplus/CoT

First Version of Fine Tuned Bloomz-7B1 model on CoT dataset from Flan Data Collection (v2) (~64k data) using HF Deepspeed

Base Model: bigscience/bloomz-7b1

Training Details :

  • Epochs: 8
  • Batch Size : 5 instantaneous per device x 2 gradient accumulation steps x 8 gpus = 80
  • Max Length : 1024
  • Weight Decay : 0
  • Learning Rate : 5e-5
  • Learning Rate Scheduler Type : Linear
  • Number of warmup steps : 0
  • Machine : 8xA100 80GB

Dataset Details :

Dataset : iamplus/CoT

Files :

  • cot_fsnoopt.csv
  • cot_fsopt.csv
  • cot_zsnoopt.csv
  • cot_zsopt.csv

Final Review :

  • The model has just memorized/overfitted on the data and is not working good on the samples outside the training data.
  • Also looks like it has changed the base model weights by too much (catastrophic forgetting).
  • Similar problems with the Epoch 6 model as well.
  • Epoch 2 model couldn't find middle ground and not performing well on training data and not on new data as well and increasing just the Epochs is leading to memorization as stated above.

Conclusion :

  • Need more quality data for the model to really learn the patterns. Increasing just the epochs with less data only leads to overfitting.