Update README.md
Browse files
README.md
CHANGED
@@ -17,9 +17,9 @@ tags:
|
|
17 |
|
18 |
## 简介 Brief Introduction
|
19 |
|
20 |
-
|
21 |
|
22 |
-
|
23 |
|
24 |
## 模型分类 Model Taxonomy
|
25 |
|
@@ -37,14 +37,15 @@ avoiding problems in commonly used large generative models such as FLAN. It not
|
|
37 |
|
38 |
### 下游效果 Performance
|
39 |
|
|
|
40 |
**Few-shot**
|
41 |
| Model | eprstmt | csldcp | tnews | iflytek | ocnli | bustm | chid | csl | wsc | Avg |
|
42 |
|------------|------------|----------|-----------|----------|-----------|-----------|-----------|----------|-----------|-----------|
|
43 |
-
|
|
44 |
-
|
|
45 |
-
|
|
46 |
-
| P-tuning |
|
47 |
-
| EFL |
|
48 |
| [UniMC-RoBERTa-110M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese) | 88.64 | 54.08 | 54.32 | 48.6 | 66.55 | 73.76 | 67.71 | 52.54 | 59.92 | 62.86 |
|
49 |
| [UniMC-RoBERTa-330M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-330M-Chinese) | 89.53 | 57.3 | 54.25 | 50 | 70.59 | 77.49 | 78.09 | 55.73 | 65.16 | 66.46 |
|
50 |
| [UniMC-MegatronBERT-1.3B](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese) | **89.278** | **60.9** | **57.46** | 52.89 | **76.33** | **80.37** | **90.33** | 61.73 | **79.15** | **72.05** |
|
@@ -53,10 +54,10 @@ avoiding problems in commonly used large generative models such as FLAN. It not
|
|
53 |
|
54 |
| Model | eprstmt | csldcp | tnews | iflytek | ocnli | bustm | chid | csl | wsc | Avg |
|
55 |
|---------------|-----------|-----------|-----------|-----------|-----------|----------|----------|----------|-----------|-----------|
|
56 |
-
| GPT-
|
57 |
-
| PET-
|
58 |
-
| NSP-BERT | 86.9 | 47.6 | 51 | 41.6 | 37.4 | 63.4 | 52 | **64.4** | 59.4 | 55.96 |
|
59 |
-
| ZeroPrompt | - | - | - | 16.14 | 46.16 | - | - | - | 47.98 | - |
|
60 |
| Yuan1.0-13B | 88.13 | 38.99 | 57.47 | 38.82 | 48.13 | 59.38 | 86.14 | 50 | 38.99 | 56.22 |
|
61 |
| ERNIE3.0-240B | 88.75 | **50.97** | **57.83** | **40.42** | 53.57 | 64.38 | 87.13 | 56.25 | 53.46 | 61.41 |
|
62 |
| [UniMC-RoBERTa-110M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese) | 86.16 | 31.26 | 46.61 | 26.54 | 66.91 | 73.34 | 66.68 | 50.09 | 53.66 | 55.7 |
|
@@ -64,7 +65,6 @@ avoiding problems in commonly used large generative models such as FLAN. It not
|
|
64 |
| [UniMC-MegatronBERT-1.3B](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese) | **88.79** | 42.06 | 55.21 | 33.93 | **75.57** | **79.5** | **89.4** | 50.25 | **66.67** | **64.53** |
|
65 |
|
66 |
|
67 |
-
|
68 |
## 使用 Usage
|
69 |
```shell
|
70 |
git clone https://github.com/IDEA-CCNL/Fengshenbang-LM.git
|
@@ -75,11 +75,11 @@ pip install --editable .
|
|
75 |
|
76 |
```python3
|
77 |
import argparse
|
78 |
-
from fengshen.pipelines.multiplechoice import
|
79 |
|
80 |
|
81 |
total_parser = argparse.ArgumentParser("TASK NAME")
|
82 |
-
total_parser =
|
83 |
args = total_parser.parse_args()
|
84 |
args.pretrained_model_path = 'IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese'
|
85 |
args.learning_rate=2e-5
|
@@ -87,7 +87,7 @@ args.max_length=512
|
|
87 |
args.max_epochs=3
|
88 |
args.batchsize=8
|
89 |
args.default_root_dir='./'
|
90 |
-
model =
|
91 |
|
92 |
train_data = []
|
93 |
dev_data = []
|
|
|
17 |
|
18 |
## 简介 Brief Introduction
|
19 |
|
20 |
+
UniMC 核心思想是将自然语言理解任务转化为 multiple choice 任务,并且使用多个 NLU 任务来进行预训练。我们在英文数据集实验结果表明仅含有 2.35 亿参数的 [ALBERT模型](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-Albert-235M-English)的zero-shot性能可以超越众多千亿的模型。并在中文测评基准 FewCLUE 和 ZeroCLUE 两个榜单中,13亿的[二郎神](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese)获得了第一的成绩。
|
21 |
|
22 |
+
The core idea of UniMC is to convert natural language understanding tasks into multiple choice tasks and use multiple NLU tasks for pre-training. Our experimental results on the English dataset show that the zero-shot performance of a [ALBERT](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-Albert-235M-English) model with only 235 million parameters can surpass that of many hundreds of billions of models. And in the Chinese evaluation benchmarks FewCLUE and ZeroCLUE two lists, 1.3 billion [Erlangshen](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese) won the first result.
|
23 |
|
24 |
## 模型分类 Model Taxonomy
|
25 |
|
|
|
37 |
|
38 |
### 下游效果 Performance
|
39 |
|
40 |
+
|
41 |
**Few-shot**
|
42 |
| Model | eprstmt | csldcp | tnews | iflytek | ocnli | bustm | chid | csl | wsc | Avg |
|
43 |
|------------|------------|----------|-----------|----------|-----------|-----------|-----------|----------|-----------|-----------|
|
44 |
+
| [FineTuning](https://arxiv.org/pdf/2107.07498.pdf)-RoBERTa-110M | 65.4 | 35.5 | 49 | 32.8 | 33 | 60.7 | 14.9 | 50 | 55.6 | 44.1 |
|
45 |
+
| [FineTuning](https://arxiv.org/pdf/2107.07498.pdf)-ERNIE1.0-110M | 66.5 | 57 | 516 | 42.1 | 32 | 60.4 | 15 | 60.1 | 50.3 | 48.34 |
|
46 |
+
| [PET](https://arxiv.org/pdf/2107.07498.pdf)-ERNIE1.0-110M | 84 | 59.9 | 56.4 | 50.3 | 38.1 | 58.4 | 40.6 | 61.1 | 58.7 | 56.39 |
|
47 |
+
| [P-tuning](https://arxiv.org/pdf/2107.07498.pdf)-ERNIE1.0-110M | 80.6 | 56.6 | 55.9 | 52.6 | 35.7 | 60.8 | 39.61 | 51.8 | 55.7 | 54.37 |
|
48 |
+
| [EFL](https://arxiv.org/pdf/2107.07498.pdf)-ERNIE1.0-110M | 76.7 | 47.9 | 56.3 | 52.1 | 48.7 | 54.6 | 30.3 | 52.8 | 52.3 | 52.7 |
|
49 |
| [UniMC-RoBERTa-110M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese) | 88.64 | 54.08 | 54.32 | 48.6 | 66.55 | 73.76 | 67.71 | 52.54 | 59.92 | 62.86 |
|
50 |
| [UniMC-RoBERTa-330M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-330M-Chinese) | 89.53 | 57.3 | 54.25 | 50 | 70.59 | 77.49 | 78.09 | 55.73 | 65.16 | 66.46 |
|
51 |
| [UniMC-MegatronBERT-1.3B](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese) | **89.278** | **60.9** | **57.46** | 52.89 | **76.33** | **80.37** | **90.33** | 61.73 | **79.15** | **72.05** |
|
|
|
54 |
|
55 |
| Model | eprstmt | csldcp | tnews | iflytek | ocnli | bustm | chid | csl | wsc | Avg |
|
56 |
|---------------|-----------|-----------|-----------|-----------|-----------|----------|----------|----------|-----------|-----------|
|
57 |
+
| [GPT](https://arxiv.org/pdf/2107.07498.pdf)-110M | 57.5 | 26.2 | 37 | 19 | 34.4 | 50 | 65.6 | 50.1 | 50.3 | 43.4 |
|
58 |
+
| [PET](https://arxiv.org/pdf/2107.07498.pdf)-RoBERTa-110M | 85.2 | 12.6 | 26.1 | 26.6 | 40.3 | 50.6 | 57.6 | 52.2 | 54.7 | 45.1 |
|
59 |
+
| [NSP-BERT](https://arxiv.org/abs/2109.03564)-110M | 86.9 | 47.6 | 51 | 41.6 | 37.4 | 63.4 | 52 | **64.4** | 59.4 | 55.96 |
|
60 |
+
| [ZeroPrompt](https://arxiv.org/abs/2201.06910)-T5-1.5B | - | - | - | 16.14 | 46.16 | - | - | - | 47.98 | - |
|
61 |
| Yuan1.0-13B | 88.13 | 38.99 | 57.47 | 38.82 | 48.13 | 59.38 | 86.14 | 50 | 38.99 | 56.22 |
|
62 |
| ERNIE3.0-240B | 88.75 | **50.97** | **57.83** | **40.42** | 53.57 | 64.38 | 87.13 | 56.25 | 53.46 | 61.41 |
|
63 |
| [UniMC-RoBERTa-110M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese) | 86.16 | 31.26 | 46.61 | 26.54 | 66.91 | 73.34 | 66.68 | 50.09 | 53.66 | 55.7 |
|
|
|
65 |
| [UniMC-MegatronBERT-1.3B](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese) | **88.79** | 42.06 | 55.21 | 33.93 | **75.57** | **79.5** | **89.4** | 50.25 | **66.67** | **64.53** |
|
66 |
|
67 |
|
|
|
68 |
## 使用 Usage
|
69 |
```shell
|
70 |
git clone https://github.com/IDEA-CCNL/Fengshenbang-LM.git
|
|
|
75 |
|
76 |
```python3
|
77 |
import argparse
|
78 |
+
from fengshen.pipelines.multiplechoice import UniMCPipelines
|
79 |
|
80 |
|
81 |
total_parser = argparse.ArgumentParser("TASK NAME")
|
82 |
+
total_parser = UniMCPipelines.piplines_args(total_parser)
|
83 |
args = total_parser.parse_args()
|
84 |
args.pretrained_model_path = 'IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese'
|
85 |
args.learning_rate=2e-5
|
|
|
87 |
args.max_epochs=3
|
88 |
args.batchsize=8
|
89 |
args.default_root_dir='./'
|
90 |
+
model = UniMCPipelines(args)
|
91 |
|
92 |
train_data = []
|
93 |
dev_data = []
|