Edit model card

tinyllama_rm_sentiment_1b

This model is a fine-tuned version of TinyLlama/TinyLlama_v1.1 on https://huggingface.co/datasets/trl-internal-testing/sentiment-trl-style. It achieves the following results on the evaluation set:

  • Loss: 0.6514
  • Accuracy: 0.625

Model description

Trained using:

python trl/examples/scripts/rm/rm.py \
--dataset_name trl-internal-testing/sentiment-trl-style \
--dataset_train_split train \
--dataset_eval_split test \
--model_name_or_path TinyLlama/TinyLlama_v1.1 \
--chat_template simple_concat \
--learning_rate 3e-6 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--gradient_accumulation_steps 1 \
--logging_steps 1 \
--eval_strategy steps \
--max_token_length 1024 \
--max_prompt_token_lenth 1024 \
--remove_unused_columns False \
--num_train_epochs 1 \
--eval_steps 100 \
 --output_dir models/ppo_torchtune/tinyllama/tinyllama_rm_sentiment_1b \
 --push_to_hub

on the "dataset-processor" branch of trl:

git clone -b "dataset-processor" https://github.com/huggingface/trl

Intended uses & limitations

More information needed

Training and evaluation data

https://huggingface.co/datasets/trl-internal-testing/sentiment-trl-style

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-06
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.6033 0.6410 100 0.6514 0.625

Framework versions

  • Transformers 4.42.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
1.03B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for smohammadi/tinyllama_rm_sentiment_1b

Finetuned
(11)
this model

Dataset used to train smohammadi/tinyllama_rm_sentiment_1b