Edit model card

Model Description

An uncensored roleplay reasoning Llama 3.2 3B model trained on reasoning data.

It may potentially be a highest scoring RP finetune of Llama 3.2.

It has been trained using improved training code, and gives an improved performance. Here is what inference code you should use:

from transformers import AutoModelForCausalLM, AutoTokenizer

MAX_REASONING_TOKENS = 1024
MAX_RESPONSE_TOKENS = 512

model_name = "piotr25691/thea-rp-3b-25r"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Which is greater 9.9 or 9.11 ??"
messages = [
    {"role": "user", "content": prompt}
]

# Generate reasoning
reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
reasoning_ids = model.generate(**reasoning_inputs, max_new_tokens=MAX_REASONING_TOKENS)
reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("REASONING: " + reasoning_output)

# Generate answer
messages.append({"role": "reasoning", "content": reasoning_output})
response_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response_inputs = tokenizer(response_template, return_tensors="pt").to(model.device)
response_ids = model.generate(**response_inputs, max_new_tokens=MAX_RESPONSE_TOKENS)
response_output = tokenizer.decode(response_ids[0, response_inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("ANSWER: " + response_output)

This Llama model was trained faster than Unsloth using custom training code.

Visit https://www.kaggle.com/code/piotr25691/distributed-llama-training-with-2xt4 to find out how you can finetune your models using BOTH of the Kaggle provided GPUs.

Downloads last month
44
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for piotr25691/thea-rp-3b-25r

Finetuned
(2)
this model

Datasets used to train piotr25691/thea-rp-3b-25r

Collection including piotr25691/thea-rp-3b-25r

Evaluation results