RichardErkhov's picture
uploaded readme
327645a verified

Quantization made by Richard Erkhov.

Github

Discord

Request more models

gemma-2b-orpo - bnb 8bits

Original model description:

license: other license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms library_name: transformers base_model: google/gemma-2b tags: - trl - orpo - generated_from_trainer model-index: - name: gemma-2b-orpo results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 49.15 name: normalized accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 73.72 name: normalized accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 38.52 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 44.53 source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 64.33 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 13.87 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo name: Open LLM Leaderboard datasets: - alvarobartt/dpo-mix-7k-simplified language: - en

gemma-2b-orpo

This is an ORPO fine-tune of google/gemma-2b with alvarobartt/dpo-mix-7k-simplified.

โšก Quantized version (GGUF): https://huggingface.co/anakin87/gemma-2b-orpo-GGUF

ORPO

ORPO (Odds Ratio Preference Optimization) is a new training paradigm that combines the usually separated phases of SFT (Supervised Fine-Tuning) and Preference Alignment (usually performed with RLHF or simpler methods like DPO).

  • Faster training
  • Less memory usage (no reference model needed)
  • Good results!

๐Ÿ† Evaluation

Nous

gemma-2b-orpo performs well for its size on Nous' benchmark suite.

(evaluation conducted using LLM AutoEval).

Model Average AGIEval GPT4All TruthfulQA Bigbench
anakin87/gemma-2b-orpo ๐Ÿ“„ 39.45 23.76 58.25 44.47 31.32
mlabonne/Gemmalpaca-2B ๐Ÿ“„ 38.39 24.48 51.22 47.02 30.85
google/gemma-2b-it ๐Ÿ“„ 36.1 23.76 43.6 47.64 29.41
google/gemma-2b ๐Ÿ“„ 34.26 22.7 43.35 39.96 31.03

Open LLM Leaderboard

Detailed results can be found here.

By comparison, on the Open LLM Leaderboard, google/gemma-2b-it has an average of 42.75.

Metric Value
Avg. 47.35
AI2 Reasoning Challenge (25-Shot) 49.15
HellaSwag (10-Shot) 73.72
MMLU (5-Shot) 38.52
TruthfulQA (0-shot) 44.53
Winogrande (5-shot) 64.33
GSM8k (5-shot) 13.87

๐Ÿ™ Dataset

alvarobartt/dpo-mix-7k-simplified is a simplified version of argilla/dpo-mix-7k. You can find more information in the dataset card.

๐ŸŽฎ Model in action

Usage notebook

๐Ÿ““ Chat and RAG using Haystack

Simple text generation with Transformers

The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.

# pip install transformers accelerate
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="anakin87/gemma-2b-orpo", torch_dtype=torch.bfloat16, device_map="auto")
messages = [{"role": "user", "content": "Write a rap song on Vim vs VSCode."}]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.7,  top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Training

The model was trained using HF TRL. ๐Ÿ““ Training notebook

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.2.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2