Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

AID-Neo-125M - bnb 8bits

Original model description:

language: en license: mit pipeline_tag: text-generation

UPDATE (2023-09-23):

This model is obsolete. Thanks to quantization you can run AI Dungeon 2 Classic (a 1.5B model) under equivalent hardware. See here.


AID-Neo-125M

Model description

This model was inspired by -- and finetuned on the same dataset of -- KoboldAI's GPT-Neo-125M-AID (Mia) model: the AI Dungeon dataset (text_adventures.txt). This was to fix a possible oversight in the original model, which was trained with an unfortunate bug. You could technically consider it a "retraining" of the same model using different software.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0
Downloads last month
7
Safetensors
Model size
125M params
Tensor type
F32
FP16
I8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.