Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ widget:
|
|
12 |
|
13 |
## DeBERTa: Decoding-enhanced BERT with Disentangled Attention
|
14 |
|
15 |
-
[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder.
|
16 |
|
17 |
Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
|
18 |
|
@@ -40,8 +40,8 @@ We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.
|
|
40 |
```bash
|
41 |
cd transformers/examples/text-classification/
|
42 |
export TASK_NAME=mrpc
|
43 |
-
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge
|
44 |
-
--task_name $TASK_NAME --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 4
|
45 |
--learning_rate 3e-6 --num_train_epochs 3 --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
|
46 |
```
|
47 |
|
|
|
12 |
|
13 |
## DeBERTa: Decoding-enhanced BERT with Disentangled Attention
|
14 |
|
15 |
+
[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.
|
16 |
|
17 |
Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
|
18 |
|
|
|
40 |
```bash
|
41 |
cd transformers/examples/text-classification/
|
42 |
export TASK_NAME=mrpc
|
43 |
+
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge \\
|
44 |
+
--task_name $TASK_NAME --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 4 \\
|
45 |
--learning_rate 3e-6 --num_train_epochs 3 --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
|
46 |
```
|
47 |
|