meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D30002

This model is a fine-tuned version of unsloth/meta-llama-3.1-8b-instruct-bnb-4bit on the None dataset.

Model description

This model was trained on Successful episodes of the top 10 model similar to D20001 but instead of using the whole episode as input, each episode was split into conversation pieces.

e.g.

[
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},
]

is split int:

[
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},

and

[
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},
]

Training and evaluation data

After splitting, the dataset contains about 6635 conversation bits accross all games.

The Dataset ID is D30002

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 8
seed: 7331
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
lr_scheduler_warmup_steps: 5
num_epochs: 1

Training results

Framework versions

PEFT 0.12.0
Transformers 4.44.2
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.19.1

clembench-playpen
/

meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D30002

meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D30002

Model description

Training and evaluation data

Training hyperparameters

Training results

Framework versions

Collections including clembench-playpen/meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D30002

Llama-3.2-1B

Llama-3.1-8B

Evaluation results