jiaqiz commited on
Commit
131cdc4
1 Parent(s): 3e6dd79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ datasets:
18
 
19
 
20
  ## Description:
21
- Llama2-13B-RLHF-RM is a 13 billion parameter language model (with context of up to 4,096 tokens) used as the Reward Model in training [NV-Llama2-70B-RLHF](https://huggingface.co/nvidia/NV-Llama2-70B-RLHF-Chat), which achieves 7.59 on MT-Bench and demonstrates strong performance on academic benchmarks.
22
 
23
  Starting from [Llama2-13B base model](https://huggingface.co/meta-llama/Llama-2-13b), it is first instruction-tuned with a combination of public and proprietary data and then trained on [HH-RLHF dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) with reward modeling objective. Given a conversation with multiple turns between user and assistant, it assigns a preference score on the last assistant turn.
24
 
 
18
 
19
 
20
  ## Description:
21
+ Llama2-13B-RLHF-RM is a 13 billion parameter language model (with context of up to 4,096 tokens) used as the Reward Model in training [NV-Llama2-70B-RLHF-Chat](https://huggingface.co/nvidia/NV-Llama2-70B-RLHF-Chat), which achieves 7.59 on MT-Bench and demonstrates strong performance on academic benchmarks.
22
 
23
  Starting from [Llama2-13B base model](https://huggingface.co/meta-llama/Llama-2-13b), it is first instruction-tuned with a combination of public and proprietary data and then trained on [HH-RLHF dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) with reward modeling objective. Given a conversation with multiple turns between user and assistant, it assigns a preference score on the last assistant turn.
24