# BERTNLU On top of the pre-trained BERT, BERTNLU use an MLP for slot tagging and another MLP for intent classification. All parameters are fine-tuned to learn these two tasks jointly. Dialog acts are split into two groups, depending on whether the values are in the utterances: - For dialogue acts whose values are in the utterances, we use **slot tagging** to extract the values. For example, `"Find me a cheap hotel"`, its dialog act is `{intent=Inform, domain=hotel, slot=price, value=cheap}`, and the corresponding BIO tag sequence is `["O", "O", "O", "B-inform-hotel-price", "O"]`. An MLP classifier takes a token's representation from BERT and outputs its tag. - For dialogue acts whose values may not be presented in the utterances, we treat them as **intents** of the utterances. Another MLP takes embeddings of `[CLS]` of a utterance as input and does the binary classification for each intent independently. Since some intents are rare, we set the weight of positive samples as $\lg(\frac{num\_negative\_samples}{num\_positive\_samples})$ empirically for each intent. The model can also incorporate context information by setting the `context=true` in the config file. The context utterances will be concatenated (separated by `[SEP]`) and fed into BERT. Then the `[CLS]` embedding serves as context representaion and is concatenated to all token representations in the target utterance right before the slot and intent classifiers. ## Usage Follow the instruction under each dataset's directory to prepare data and model config file for training and evaluation. #### Train a model ```sh $ python train.py --config_path path_to_a_config_file ``` The model (`pytorch_model.bin`) will be saved under the `output_dir` of the config file. #### Test a model ```sh $ python test.py --config_path path_to_a_config_file ``` The result (`output.json`) will be saved under the `output_dir` of the config file. Also, it will be zipped as `zipped_model_path` in the config file. ## Performance on unified format datasets To illustrate that it is easy to use the model for any dataset that in our unified format, we report the performance on several datasets in our unified format. We follow `README.md` and config files in `unified_datasets/` to generate `predictions.json`, then evaluate it using `../evaluate_unified_datasets.py`. Note that we use almost the same hyper-parameters for different datasets, which may not be optimal.
MultiWOZ 2.1 | MultiWOZ 2.1 all utterances | Taskmaster-1 | Taskmaster-2 | Taskmaster-3 | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Model | Acc | F1 | Acc | F1 | Acc | F1 | Acc | F1 | Acc | F1 |
BERTNLU | 74.5 | 85.9 | 59.5 | 80.0 | 72.8 | 50.6 | 79.2 | 70.6 | 86.1 | 81.9 |
BERTNLU (context=3) | 80.6 | 90.3 | 58.1 | 79.6 | 74.2 | 52.7 | 80.9 | 73.3 | 87.8 | 83.8 |