Edit model card
YAML Metadata Error: "datasets[0]" with value "reddit_tifu (subset: short)" is not valid. If possible, use a dataset id from https://hf.co/datasets.

mgfrantz/distilgpt2-finetuned-reddit-tifu

This model was trained to as practice for fine-tuning a causal language model. There was no intended use case for this model besides having some fun seeing how different things might be screwed up.

Data

This model was trained on "short" subset of reddit_tifu dataset. The data was split into 90% train and 10% validation using dataset.train_test_split, with a seed of 0.

To prepare the data for training, the "tldr" and "documents" fields were joined by "\n\n". When multiple items were in the "tldr" or "documents" fields, only the first item was selected for joining. These joined documents were tokenized using the "distilgpt2" tokenizer.

Finally, tokenized texts were concatenated end-to-end and split into blocks of 128 tokens.

TODO: Add a different separation token between documents that can be used to stop generation.

Training

This model was trained in Colab by fine-tuning distilgpt2 for 174390 steps (3 epochs). Default training arguments were used, except for learning_rate=2e-5 and weight_decay=0.01. At the conclusion of training, a training loss of 3.52 and a validation loss of 3.44 were observed.

Downloads last month
19
Safetensors
Model size
88.2M params
Tensor type
F32
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.