NeMo
PyTorch
English
text generation
causal-lm
aklife97 commited on
Commit
26e5ce8
1 Parent(s): 625cc40

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -26,16 +26,16 @@ SteerLM Llama-2 is a 13 billion parameter generative language model based on the
26
 
27
  Key capabilities enabled by SteerLM:
28
 
29
- - Dynamic steering of responses by specifying desired attributes like quality, helpfulness, and toxicity
30
- - Simplified training compared to RLHF techniques like fine-tuning and bootstrapping
31
 
32
  ## Model Architecture and Training
33
  The SteerLM method involves the following key steps:
34
 
35
- 1. Train an attribute prediction model on human annotated data to evaluate response quality
36
- 2. Use this model to annotate diverse datasets and enrich training data
37
- 3. Perform conditioned fine-tuning to align responses with specified combinations of attributes
38
- 4. (Optionally) Bootstrap training through model sampling and further fine-tuning
39
 
40
  SteerLM Llama-2 applies this technique on top of the Llama-2 architecture. It was pretrained on internet-scale data and then customized using [OASST](https://huggingface.co/datasets/OpenAssistant/oasst1) and [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) data.
41
 
 
26
 
27
  Key capabilities enabled by SteerLM:
28
 
29
+ - Dynamic steering of responses by specifying desired attributes like quality, helpfulness, and toxicity.
30
+ - Simplified training compared to RLHF techniques like fine-tuning and bootstrapping.
31
 
32
  ## Model Architecture and Training
33
  The SteerLM method involves the following key steps:
34
 
35
+ 1. Train an attribute prediction model on human annotated data to evaluate response quality.
36
+ 2. Use this model to annotate diverse datasets and enrich training data.
37
+ 3. Perform conditioned fine-tuning to align responses with specified combinations of attributes.
38
+ 4. (Optionally) Bootstrap training through model sampling and further fine-tuning.
39
 
40
  SteerLM Llama-2 applies this technique on top of the Llama-2 architecture. It was pretrained on internet-scale data and then customized using [OASST](https://huggingface.co/datasets/OpenAssistant/oasst1) and [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) data.
41