uer commited on
Commit
6f533f1
1 Parent(s): 0d0a80c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -95,14 +95,14 @@ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
95
 
96
  ```
97
  deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json --dataset_path cluecorpussmall_seq128_dataset.pt \
98
- --vocab_path models/google_zh_vocab.txt \
99
- --config_path models/bert/xlarge_config.json \
100
- --output_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq128_model \
101
- --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
102
- --total_steps 500000 --save_checkpoint_steps 50000 --report_steps 500 \
103
- --learning_rate 2e-5 --batch_size 128 --deep_init \
104
- --whole_word_masking --deepspeed_checkpoint_activations \
105
- --data_processor mlm --target mlm
106
  ```
107
 
108
  Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
@@ -124,15 +124,15 @@ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
124
 
125
  ```
126
  deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json --dataset_path cluecorpussmall_seq512_dataset.pt \
127
- --vocab_path models/google_zh_vocab.txt \
128
- --config_path models/bert/xlarge_config.json \
129
- --pretrained_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq128_model.bin \
130
- --output_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq512_model \
131
- --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
132
- --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 500 \
133
- --learning_rate 5e-5 --batch_size 32 \
134
- --whole_word_masking --deepspeed_checkpoint_activations \
135
- --data_processor mlm --target mlm
136
  ```
137
 
138
  Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
 
95
 
96
  ```
97
  deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json --dataset_path cluecorpussmall_seq128_dataset.pt \
98
+ --vocab_path models/google_zh_vocab.txt \
99
+ --config_path models/bert/xlarge_config.json \
100
+ --output_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq128_model \
101
+ --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
102
+ --total_steps 500000 --save_checkpoint_steps 50000 --report_steps 500 \
103
+ --learning_rate 2e-5 --batch_size 128 --deep_init \
104
+ --whole_word_masking --deepspeed_checkpoint_activations \
105
+ --data_processor mlm --target mlm
106
  ```
107
 
108
  Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
 
124
 
125
  ```
126
  deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json --dataset_path cluecorpussmall_seq512_dataset.pt \
127
+ --vocab_path models/google_zh_vocab.txt \
128
+ --config_path models/bert/xlarge_config.json \
129
+ --pretrained_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq128_model.bin \
130
+ --output_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq512_model \
131
+ --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
132
+ --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 500 \
133
+ --learning_rate 5e-5 --batch_size 32 \
134
+ --whole_word_masking --deepspeed_checkpoint_activations \
135
+ --data_processor mlm --target mlm
136
  ```
137
 
138
  Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints: