Transformers
English
llama
trl
rlhf
Inference Endpoints
stack-llama-2-ggml / README.md
Barnaby Gray
Initial commit
b5468de
metadata
license: bigscience-openrail-m
datasets:
  - lvwerra/stack-exchange-paired
language:
  - en
tags:
  - trl
  - transformers
  - rlhf

Quantized GGML version of Stack-Llama-2

This model can be used directly in all llama.cpp based tools on more modest hardware (eg M1/M2).

From: https://huggingface.co/kashif/stack-llama-2

Stack-Llama-2

DPO fine-tuned Llama-2 7B model. The model is designed to generate human-like responses to questions in Stack Exchange domains of programming, mathematics, physics, and more. For more info check out the blog post and github example.

Uses

Direct Use

  • Long-form question-answering on topics of programming, mathematics, and physics
  • Demonstrating a Large Language Model's ability to follow target behavior of generating answers to a question that would be highly rated on Stack Exchange.

Out of Scope Use

  • Replacing human expertise

Bias, Risks, and Limitations

Recommendations

  • Answers should be validated through the use of external sources.
  • Disparities between the data contributors and the direct and indirect users of the technology should inform developers in assessing what constitutes an appropriate use case.
  • Further research is needed to attribute model generations to sources in the training data, especially in cases where the model copies answers from the training data.

Training Details

Training Data

Original datasets are described in the LLaMA Model Card. Fine-tuning datasets for this model are based on Stack Exchange Paired, which consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more. Specifically:

Traditional Fine-tuning: https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune

DPO Training: https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl

Training Procedure

The model was first fine-tuned on the Stack Exchange question and answer pairs and then fine-tuned via the DPO training procedure using the SFT model as the reference model. It is trained to respond to prompts with the following prompt template:

Question: <Query> 

Answer: <Response>