Instruction Tuned GPT-NeoXT-20B model on Instruction Tuning dataset as listed below (~5.2M data) using Colossal AI
Base Model: togethercomputer/GPT-NeoXT-Chat-Base-20B (GPT-NeoXT-Chat-Base-20B-v0.16 - fine-tuned on feedback data)
Training Details :
- Epochs: 4
- Batch Size : 5 instantaneous per device x 1 gradient accumulation steps x 8 gpus = 40
- Block Size : 2020
- Weight Decay : 0
- Learning Rate : 1e-6
- Learning Rate Scheduler Type : Cosine
- Number of warmup steps : 600
- Machine : 8xA100 80GB
Training Data Specifics :
- Labels are similar to Input ids but with "human" responses and pad tokens masked so that they don't contribute during the model's error calculation.
- Block Size is 2020, Multiple instructions are clubbed together in each data.
- "###" is the EOS Token used in the data.
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.