language:
- en
license:
- creativeml-openrail-m
tags:
- generated_from_trainer
- text generation
- pytorch
- casual-lm
metrics:
- accuracy
base_model: EleutherAI/pythia-125m-deduped
model-index:
- name: openchatgpt-neox-r1
results: []
openchatgpt-neox-r1
This model is a fine-tuned version of EleutherAI/pythia-125m-deduped on the openchatgpt safe-r1 dataset. It achieves the following results on the evaluation set:
- Loss: 1.3585
- Accuracy: 0.9169
Model description
Finetune based on the inner workings of ChatGPT. I won't elaborate on that. You must have a faint idea of how prompt is made for it to be effective.
This is effectively a schizophrenic idea that met the light of day. Practically a collab of 3 students in a virtual shed.
BTW, Pythia is so much better omg.
Intended uses & limitations
Intended uses & limitations fall in line with OpenAI's. Dataset used consists of safe texts (i.e. not highly sexual/erotica type stuff). NSFW version of the dataset is not planned to exist at the moment.
Keep in mind that this is a 125m version of GPT-NeoX (Pythia). My 1050Ti Mobile couldn't even handle that without gradient thingmabobs, 8BitAdam was also used. If anyone knows how to effectively finetune larger models on free colabs - feel free to let me know. Pile tokenizer also has one downside compared to native GPT-2/3 - Assistant
is not 1 token, but 2.
Training and evaluation data
Data was split in ratio of 95%/5%. Preproccess included removing mentions of OpenAI wherever it was not deemed appropriete (GPT-2 has one of the appropriete mentions). Whole dataset consists of just shy off 3k input-output pairs. One input has multiple outputs (read as: one message has multiple variants of an answer). <<<1% (3 total) are curated lines (i.e. a huge mistake was spotted that needed corrections). At least 3 lines (<<<1% of line count, but more of byte count) are broken.
Heavy bias on IT.
Training procedure
Input and output were straight up concatenated due to the nature of how ChatGPT works.
EOS was being used after the final separator.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
1.1311 | 1.0 | 1377 | 1.3116 | 0.9127 |
0.6691 | 2.0 | 2754 | 1.2978 | 0.9160 |
0.3463 | 3.0 | 4131 | 1.3585 | 0.9169 |
Framework versions
- Transformers 4.25.1
- Pytorch 1.13.1+cu116
- Datasets 2.8.0
- Tokenizers 0.13.2