mrsteyk's picture
Upload 5 files
1b13eca
|
raw
history blame
3.09 kB
metadata
license:
  - creativeml-openrail-m
language:
  - en
tags:
  - generated_from_trainer
  - text generation
  - pytorch
  - casual-lm
metrics:
  - accuracy
model-index:
  - name: openchatgpt-neox-r1
    results: []

openchatgpt-neox-r1

This model is a fine-tuned version of EleutherAI/pythia-125m-deduped on the openchatgpt safe-r1 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3585
  • Accuracy: 0.9169

Model description

Finetune based on the inner workings of ChatGPT. I won't elaborate on that. You must have a faint idea of how prompt is made for it to spit anything that's not garbled mess.

This is effectively a schizophrenic idea that met the light of day. Practically a collab of 3 students in a virtual shed.

BTW, Pythia is so much better omg.

Intended uses & limitations

Intended uses & limitations fall in line with OpenAI's. Dataset used consists of safe texts (i.e. not highly sexual/erotica type stuff). NSFW version of the dataset is not planned to exist at the moment.

Keep in mind that this is a 125m version of GPT-NeoX (Pythia). My 1050Ti Mobile couldn't even handle that without gradient thingmabobs, 8BitAdam was also used. If anyone knows how to effectively finetune larger models on free colabs - feel free to let me know. Pile tokenizer also has one downside compared to native GPT-2/3 - Assistant is not 1 token, but 2.

Training and evaluation data

Data was split in ratio of 95%/5%. Preproccess included removing mentions of OpenAI wherever it was not deemed appropriete (GPT-2 has one of the appropriete mentions). Whole dataset consists of just shy off 3k input-output pairs. One input has multiple outputs (read as: one message has multiple variants of an answer). <<<1% (3 total) are curated lines (i.e. a huge mistake was spotted that needed corrections). At least 3 lines (<<<1% of line count, but more of byte count) are broken.

Heavy bias on IT.

Training procedure

Input and output were straight up concatenated due to the nature of how ChatGPT works.

This time dataset was batched into groups of 2048 tokens. Meaning i got 628/31 groups for training/eval. Maybe that's what made the difference. EOS was also being used after the final separator.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.1311 1.0 1377 1.3116 0.9127
0.6691 2.0 2754 1.2978 0.9160
0.3463 3.0 4131 1.3585 0.9169

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.1+cu116
  • Datasets 2.8.0
  • Tokenizers 0.13.2