nicholasKluge
commited on
Commit
•
a43e186
1
Parent(s):
3954523
Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ co2_eq_emissions:
|
|
27 |
---
|
28 |
# RewardModel
|
29 |
|
30 |
-
The
|
31 |
|
32 |
The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
|
33 |
|
@@ -48,7 +48,7 @@ This repository has the [source code](https://github.com/Nkluge-correa/Aira) use
|
|
48 |
|
49 |
## Usage
|
50 |
|
51 |
-
Here's an example of how to use the
|
52 |
|
53 |
```python
|
54 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
@@ -125,9 +125,9 @@ and bitching about what the machines do. Score: -10.942
|
|
125 |
|
126 |
## Performance
|
127 |
|
128 |
-
| Acc
|
129 |
-
|
130 |
-
| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel)
|
131 |
|
132 |
* *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.
|
133 |
|
@@ -149,4 +149,4 @@ and bitching about what the machines do. Score: -10.942
|
|
149 |
|
150 |
## License
|
151 |
|
152 |
-
|
|
|
27 |
---
|
28 |
# RewardModel
|
29 |
|
30 |
+
The RewardModel is a [BERT](https://huggingface.co/bert-base-cased) model that can be used to score the quality of a completion for a given prompt.
|
31 |
|
32 |
The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
|
33 |
|
|
|
48 |
|
49 |
## Usage
|
50 |
|
51 |
+
Here's an example of how to use the RewardModel to score the quality of a response to a given prompt:
|
52 |
|
53 |
```python
|
54 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
|
125 |
|
126 |
## Performance
|
127 |
|
128 |
+
| Acc | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) |
|
129 |
+
|----------------------------------------------------------------------|---------------------------------------------------------------------|
|
130 |
+
| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel) | 55.02%* |
|
131 |
|
132 |
* *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.
|
133 |
|
|
|
149 |
|
150 |
## License
|
151 |
|
152 |
+
RewardModel is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
|