satyamt's picture
Update README.md
83b32dc verified
metadata
license: mit
datasets:
  - argilla/distilabel-intel-orca-dpo-pairs
  - jondurbin/truthy-dpo-v0.1
  - argilla/distilabel-math-preference-dpo
  - argilla/distilabel-capybara-dpo-7k-binarized
language:
  - en
library_name: adapter-transformers
base_model: Technoculture/MT7Bi-sft

Technoculture/MedMerge-6-7b-alpha-dpo

Open LLM Leaderboard

image/png

Model Name ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
Orca-2-7b 78.4 76.1 53.7 52.4 74.2 47.2
LLAMA-2-7b 43.2 77.1 44.4 38.7 69.5 16
MT7Bi-sft 54.1 75.11 - 43.08 72.14 15.54
MedMerge-6-7b 29.52 41.04 - 37.53 59.35 0.91
MedMerge-6-7b-alpha-dpo 54.27 75.6 52.65 43.94 71.03 26.16

Training Details

  • GPU: Nvidia A100 Tensor Core GPU
  • Total Batches: 4266
  • Epochs: 3
  • Duration: 3 hours, 57 minutes, and 00 seconds

DPO Training Dataset Mixture

Dataset Name Original Size(Rows) Ratio Size After Ratio(Rows)
argilla/distilabel-math-preference-dpo 2.4k 1.0 2.4k
argilla/distilabel-intel-orca-dpo-pairs 12.9k 0.5 6.45k
jondurbin/truthy-dpo-v0.1 1.04k 1.0 1.04k
argilla/distilabel-capybara-dpo-7k-binarized 7.5k 0.2 1.5k
Total Size: 11.38k

Training Loss Plot

image/png

Training Loss Smoothed Plot

image/png

For full details of this dpo-training please read our notebook.

Open In Colab