Update README.md
Browse files
README.md
CHANGED
@@ -95,14 +95,14 @@ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
|
|
95 |
|
96 |
```
|
97 |
deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json --dataset_path cluecorpussmall_seq128_dataset.pt \
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
```
|
107 |
|
108 |
Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
|
@@ -124,15 +124,15 @@ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
|
|
124 |
|
125 |
```
|
126 |
deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json --dataset_path cluecorpussmall_seq512_dataset.pt \
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
```
|
137 |
|
138 |
Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
|
|
|
95 |
|
96 |
```
|
97 |
deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json --dataset_path cluecorpussmall_seq128_dataset.pt \
|
98 |
+
--vocab_path models/google_zh_vocab.txt \
|
99 |
+
--config_path models/bert/xlarge_config.json \
|
100 |
+
--output_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq128_model \
|
101 |
+
--world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
|
102 |
+
--total_steps 500000 --save_checkpoint_steps 50000 --report_steps 500 \
|
103 |
+
--learning_rate 2e-5 --batch_size 128 --deep_init \
|
104 |
+
--whole_word_masking --deepspeed_checkpoint_activations \
|
105 |
+
--data_processor mlm --target mlm
|
106 |
```
|
107 |
|
108 |
Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
|
|
|
124 |
|
125 |
```
|
126 |
deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json --dataset_path cluecorpussmall_seq512_dataset.pt \
|
127 |
+
--vocab_path models/google_zh_vocab.txt \
|
128 |
+
--config_path models/bert/xlarge_config.json \
|
129 |
+
--pretrained_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq128_model.bin \
|
130 |
+
--output_model_path models/cluecorpussmall_wwm_roberta_xlarge_seq512_model \
|
131 |
+
--world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
|
132 |
+
--total_steps 250000 --save_checkpoint_steps 50000 --report_steps 500 \
|
133 |
+
--learning_rate 5e-5 --batch_size 32 \
|
134 |
+
--whole_word_masking --deepspeed_checkpoint_activations \
|
135 |
+
--data_processor mlm --target mlm
|
136 |
```
|
137 |
|
138 |
Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
|