Edit model card

SCoReLoRA: Self-Correct via Reinforcement Learning

SCoReLoRA is an innovative approach to fine-tuning language models using Low-Rank Adaptation (LoRA) combined with reinforcement learning techniques for self-correction. This method aims to improve the model's ability to generate more accurate and refined responses through a two-stage training process.

Features

  • Implements a two-stage training process for self-correction
  • Utilizes reinforcement learning to improve model outputs
  • Compatible with Hugging Face's Transformers library and PEFT
  • Supports quantized models for efficient fine-tuning
  • Includes evaluation metrics for self-correction performance

How It Works

SCoreLora uses a two-stage training process:

  1. Stage I: The model is trained to generate initial responses and then correct them, minimizing the KL divergence between the base model and the fine-tuned model.

  2. Stage II: The model is further trained using reinforcement learning techniques, with rewards based on the quality of self-corrections.

The training process utilizes shaped rewards and KL divergence to balance between improvement and staying close to the original model's behavior.

Evaluation

The implementation includes functions to evaluate the model's self-correction capabilities, measuring metrics such as:

  • Accuracy before and after correction
  • Improvement rate
  • Rate of successful corrections
  • Rate of erroneous corrections

Reference

Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .