File size: 1,452 Bytes
881b434
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40827b9
881b434
40827b9
 
 
 
 
 
 
 
 
 
 
 
 
 
881b434
 
 
 
 
 
 
 
 
5506b8d
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: mit
datasets:
- openai/summarize_from_feedback
- openai/webgpt_comparisons
- Dahoas/synthetic-instruct-gptj-pairwise
- Anthropic/hh-rlhf
- lmsys/chatbot_arena_conversations
- openbmb/UltraFeedback
metrics:
- accuracy
tags:
- reward_model
- reward-model
- RLHF
- evaluation
- llm
- instruction
- reranking
language:
- multilingual
- en
- ar
- bg
- de
- el
- es
- fr
- hi
- ru
- sw
- th
- tr
- ur
- vi
- zh
pipeline_tag: text-generation
---

# Pairwise Reward Model for LLMs (PairRM) based on mdeberta-v3-base

This is an attempt to create a multilingual [PairRM](https://huggingface.co/llm-blender/PairRM)-Model by applying the training procedure from the original LLM-Blender repository to [mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base).

I have not yet done any real testing apart from some sanity checks with the provided samples from the original PairRM-Model as well as some quick made-up samples.

For additional (usage) information information please refer to the [original](https://huggingface.co/llm-blender/PairRM) model.



## Citation & Credits
```bibtex
@inproceedings{llm-blender-2023,
    title = "LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion",
    author = "Jiang, Dongfu and Ren, Xiang and Lin, Bill Yuchen",
    booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)",
    year = "2023"
}

```