Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# BERTNLU
|
2 |
+
|
3 |
+
On top of the pre-trained BERT, BERTNLU use an MLP for slot tagging and another MLP for intent classification. All parameters are fine-tuned to learn these two tasks jointly.
|
4 |
+
|
5 |
+
Dialog acts are split into two groups, depending on whether the values are in the utterances:
|
6 |
+
|
7 |
+
- For dialogue acts whose values are in the utterances, we use **slot tagging** to extract the values. For example, `"Find me a cheap hotel"`, its dialog act is `{intent=Inform, domain=hotel, slot=price, value=cheap}`, and the corresponding BIO tag sequence is `["O", "O", "O", "B-inform-hotel-price", "O"]`. An MLP classifier takes a token's representation from BERT and outputs its tag.
|
8 |
+
- For dialogue acts whose values may not be presented in the utterances, we treat them as **intents** of the utterances. Another MLP takes embeddings of `[CLS]` of a utterance as input and does the binary classification for each intent independently. Since some intents are rare, we set the weight of positive samples as $\lg(\frac{num\_negative\_samples}{num\_positive\_samples})$ empirically for each intent.
|
9 |
+
|
10 |
+
The model can also incorporate context information by setting the `context=true` in the config file. The context utterances will be concatenated (separated by `[SEP]`) and fed into BERT. Then the `[CLS]` embedding serves as context representaion and is concatenated to all token representations in the target utterance right before the slot and intent classifiers.
|
11 |
+
|
12 |
+
|
13 |
+
## Usage
|
14 |
+
|
15 |
+
Follow the instruction under each dataset's directory to prepare data and model config file for training and evaluation.
|
16 |
+
|
17 |
+
#### Train a model
|
18 |
+
|
19 |
+
```sh
|
20 |
+
$ python train.py --config_path path_to_a_config_file
|
21 |
+
```
|
22 |
+
|
23 |
+
The model (`pytorch_model.bin`) will be saved under the `output_dir` of the config file.
|
24 |
+
|
25 |
+
#### Test a model
|
26 |
+
|
27 |
+
```sh
|
28 |
+
$ python test.py --config_path path_to_a_config_file
|
29 |
+
```
|
30 |
+
|
31 |
+
The result (`output.json`) will be saved under the `output_dir` of the config file. Also, it will be zipped as `zipped_model_path` in the config file.
|
32 |
+
|
33 |
+
|
34 |
+
## Performance on unified format datasets
|
35 |
+
|
36 |
+
To illustrate that it is easy to use the model for any dataset that in our unified format, we report the performance on several datasets in our unified format. We follow `README.md` and config files in `unified_datasets/` to generate `predictions.json`, then evaluate it using `../evaluate_unified_datasets.py`. Note that we use almost the same hyper-parameters for different datasets, which may not be optimal.
|
37 |
+
|
38 |
+
<table>
|
39 |
+
<thead>
|
40 |
+
<tr>
|
41 |
+
<th></th>
|
42 |
+
<th colspan=2>MultiWOZ 2.1</th>
|
43 |
+
<th colspan=2>Taskmaster-1</th>
|
44 |
+
<th colspan=2>Taskmaster-2</th>
|
45 |
+
<th colspan=2>Taskmaster-3</th>
|
46 |
+
</tr>
|
47 |
+
</thead>
|
48 |
+
<thead>
|
49 |
+
<tr>
|
50 |
+
<th>Model</th>
|
51 |
+
<th>Acc</th><th>F1</th>
|
52 |
+
<th>Acc</th><th>F1</th>
|
53 |
+
<th>Acc</th><th>F1</th>
|
54 |
+
<th>Acc</th><th>F1</th>
|
55 |
+
</tr>
|
56 |
+
</thead>
|
57 |
+
<tbody>
|
58 |
+
<tr>
|
59 |
+
<td>BERTNLU</td>
|
60 |
+
<td>74.5</td><td>85.9</td>
|
61 |
+
<td>72.8</td><td>50.6</td>
|
62 |
+
<td>79.2</td><td>70.6</td>
|
63 |
+
<td>86.1</td><td>81.9</td>
|
64 |
+
</tr>
|
65 |
+
<tr>
|
66 |
+
<td>BERTNLU (context=3)</td>
|
67 |
+
<td>80.6</td><td>90.3</td>
|
68 |
+
<td>74.2</td><td>52.7</td>
|
69 |
+
<td>80.9</td><td>73.3</td>
|
70 |
+
<td>87.8</td><td>83.8</td>
|
71 |
+
</tr>
|
72 |
+
</tbody>
|
73 |
+
</table>
|
74 |
+
|
75 |
+
- Acc: whether all dialogue acts of an utterance are correctly predicted
|
76 |
+
- F1: F1 measure of the dialogue act predictions over the corpus.
|
77 |
+
|
78 |
+
## References
|
79 |
+
|
80 |
+
```
|
81 |
+
@inproceedings{devlin2019bert,
|
82 |
+
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
|
83 |
+
author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
|
84 |
+
booktitle={Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
|
85 |
+
pages={4171--4186},
|
86 |
+
year={2019}
|
87 |
+
}
|
88 |
+
|
89 |
+
@inproceedings{zhu-etal-2020-convlab,
|
90 |
+
title = "{C}onv{L}ab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems",
|
91 |
+
author = "Zhu, Qi and Zhang, Zheng and Fang, Yan and Li, Xiang and Takanobu, Ryuichi and Li, Jinchao and Peng, Baolin and Gao, Jianfeng and Zhu, Xiaoyan and Huang, Minlie",
|
92 |
+
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
|
93 |
+
month = jul,
|
94 |
+
year = "2020",
|
95 |
+
address = "Online",
|
96 |
+
publisher = "Association for Computational Linguistics",
|
97 |
+
url = "https://aclanthology.org/2020.acl-demos.19",
|
98 |
+
doi = "10.18653/v1/2020.acl-demos.19",
|
99 |
+
pages = "142--149"
|
100 |
+
}
|
101 |
+
```
|