metadata

library_name: transformers
license: other
license_name: mrl
inference: false
license_link: https://mistral.ai/licenses/MRL-0.1.md
tags:
  - mergekit
  - merge
  - not-for-all-audiences
language:
  - en
base_model:
  - mistralai/Ministral-8B-Instruct-2410

Ministral-ATFchan-8B (v1)

A test model trained on unfiltered discussions from one specific forum (ATF). It was initially intended to be a text completion model, but it is capable of chatting with a suitable Instruct preset.

Intentional features

Trained on most text-focused sections of ATF, including the Debate sections but excluding the Personal and the TGZ sections.
All custom emoji have been converted to their closest Unicode counterpart.
Quotes have been preserved. Spoilers been converted to <details> HTML tags (they work in SillyTavern).
All external URLs have been left unredacted, but URLs to internal resources (attachments and images) have been converted to a symbolic format.

Known quirks and issues

Usernames have not been redacted due to technical difficulties, and so it can hallucinate the most frequent ones featured in the training data. I might take the model down if I get enough complaints.
The model may "post" attachments that obviously cannot be displayed.
Oftentimes users will quote the user and/or each other back. This can lead to a waste of tokens and compute time. Delete unwanted quotations early in the discussion to prevent this from happening.
Sometimes at the beginning of the conversation users appear to quote messages that don't exist in the prompt history. This occurred due to an issue while scraping the website (skipped thread pages).
When roleplaying, the model might not always be too receptive of OOC commands.

Prompting format

The prompting format was designed to imitate a chan-like arrangement using forum messages, while simultaneously being easy to read on a monochrome text editor.

In retrospect I should have made the line separator shorter (it's inconvenient to manually insert, although it's just one token), possibly omitted seconds from the time and used > instead of ». Additionally, I should have probably added the forum section on top rather than after the thread title, since it would have allowed the model to randomly generate thread titles given an initial forum section.

I might add these changes in a future update, but for now the format is as described below.

Example

<s>Thread title
Forum section » Forum subsection » Forum subsubsection
Labels: Label1, Label2, LabelN
Tags: tag1, tag2, tag3, tagn
----------------------------------------------------------------
%%%% [username1] 2024-10-14 12:23:55

Message content
----------------------------------------------------------------
%%%% [username2] 2024-10-14 12:35:03

In detail

The format the model was trained on follows these specifications:

BOS token <s> followed by the thread title; newline.
Forum section tree, each item separated by ». Subsection and subsubsection are optional. Newline.
[Optional] Comma-separated thread Labels. These are subsection-specific on ATF. Newline.
[Optional] Comma-separated thread tags. These are added by the thread starter. Newline.
Exactly 64 dashes followed by a newline.
Exactly 4 percent signs %%%%; space; username delimited by square brackets; space; ISO date; space; ISO time; two newlines.
Message content; newline.
Repeat from step 5 for every following message.

Small variations from these specifications may be possible without significant changes to the output quality.

SillyTavern preset

Master-import this preset: https://files.catbox.moe/bnp94u.json

Note that this omits the ISO time from the prompting format. Feel free to change the Story String to your preferences. In general, any instruction or direction should be added as a forum message that a hypothetical forum user can understand.

Usage suggestions

Although not recommended, it is possible to use the official Mistral Tekken format. However, names ("Include names") must be included or the model might get confused on who is talking.
Not unexpectedly, the model appears to work better with forum/book-style roleplay and natural conversations (no asterisks/actions or quotation marks).
The forum section tree has a significant influence on the contents of the messages. See further below for specific suggestions.
- It might be possible to invent new forum sections that don't even exist on ATF, with possibly novel results.

Forum section tree examples

You can use these examples as a guide as for what to put in the forum tree line in the prompt.

General » General Talk » Gaming
General » Debates » Serious Business
General » Media » Anime
General » Off Topic
Creative Corner » Stories » Furry Stories
Creative Corner » Roleplay Discussion » Roleplaying
Creative Corner » Writing Discussion

Training details

The model was trained on about 140 MB of text for 2 epochs on a single RTX3090 24GB GPU, up to 8k context, using Unsloth. Packing was used on a portion of the dataset. Due to memory and compute limitations, only a fraction of the available data could be used.

r = 128
lora_alpha = 128
lora_dropout = 0.50
use_gradient_checkpointing = "unsloth"
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
use_rslora = False

per_device_train_batch_size = 1
gradient_accumulation_steps = 1
warmup_steps = 0
max_seq_length = 8192
num_train_epochs = 2
learning_rate = 0.00008
max_grad_norm = 0.0005
weight_decay = 0.01
num_train_epochs = 2
lr_scheduler_type = "constant"
adam_beta1 = 0.95,
adam_beta2 = 0.999,

Merging strategy

Different models have been finetuned on separate forum sections (some grouped), then merged together using the Model Stock method with MergeKit. Model Stock was suggested to be the best performing method in the Baichuan Alignment Technical Report but I haven't personally assessed whether this is true or not. It certainly is convenient to use due to the lack of variables.