Instruction Tuned GPT-NeoXT-20B model on Instruction Tuning dataset as listed below (~5.2M data) using Colossal AI
Base Model: togethercomputer/GPT-NeoXT-Chat-Base-20B (GPT-NeoXT-Chat-Base-20B-v0.16 - fine-tuned on feedback data)
Training Details :
- Epochs: 5
- Batch Size : 5 instantaneous per device x 1 gradient accumulation steps x 8 gpus = 40
- Block Size : 2020
- Weight Decay : 0
- Learning Rate : 1e-6
- Learning Rate Scheduler Type : Cosine
- Number of warmup steps : 600
- Machine : 8xA100 80GB
Training Data Specifics :
- Labels and Input ids are exactly the same.
- Block Size is 2020, Multiple instructions are clubbed together in each data.
- "###" is the EOS Token used in the data.
- Downloads last month
- 442
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.