--- license: mit datasets: - openai/summarize_from_feedback - openai/webgpt_comparisons - Dahoas/synthetic-instruct-gptj-pairwise - Anthropic/hh-rlhf - lmsys/chatbot_arena_conversations - openbmb/UltraFeedback metrics: - accuracy tags: - reward_model - reward-model - RLHF - evaluation - llm - instruction - reranking language: - multilingual - en - ar - bg - de - el - es - fr - hi - ru - sw - th - tr - ur - vi - zh pipeline_tag: text-generation --- # Pairwise Reward Model for LLMs (PairRM) based on mdeberta-v3-base This is an attempt to create a multilingual [PairRM](https://huggingface.co/llm-blender/PairRM)-Model by applying the training procedure from the original LLM-Blender repository to [mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base). I have not yet done any real testing apart from some sanity checks with the provided samples from the original PairRM-Model as well as some quick made-up samples. For additional (usage) information information please refer to the [original](https://huggingface.co/llm-blender/PairRM) model. ## Citation & Credits ```bibtex @inproceedings{llm-blender-2023, title = "LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion", author = "Jiang, Dongfu and Ren, Xiang and Lin, Bill Yuchen", booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)", year = "2023" } ```