metadata

license: mit
datasets:
  - openai/summarize_from_feedback
  - openai/webgpt_comparisons
  - Dahoas/synthetic-instruct-gptj-pairwise
  - Anthropic/hh-rlhf
  - lmsys/chatbot_arena_conversations
  - openbmb/UltraFeedback
metrics:
  - accuracy
tags:
  - reward_model
  - reward-model
  - RLHF
  - evaluation
  - llm
  - instruction
  - reranking
language:
  - multilingual
  - en
  - ar
  - bg
  - de
  - el
  - es
  - fr
  - hi
  - ru
  - sw
  - th
  - tr
  - ur
  - vi
  - zh
pipeline_tag: text-generation

Pairwise Reward Model for LLMs (PairRM) based on mdeberta-v3-base

This is an attempt to create a multilingual PairRM-Model by applying the training procedure from the original LLM-Blender repository to mdeberta-v3-base.

I have not yet done any real testing apart from some sanity checks with the provided samples from the original PairRM-Model as well as some quick made-up samples.

For additional (usage) information information please refer to the original model.

Citation & Credits

@inproceedings{llm-blender-2023,
    title = "LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion",
    author = "Jiang, Dongfu and Ren, Xiang and Lin, Bill Yuchen",
    booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)",
    year = "2023"
}