|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- dialogue policy |
|
- task-oriented dialog |
|
|
|
--- |
|
|
|
# lava-policy-multiwoz |
|
|
|
This is the best performing LAVA_kl model from the [LAVA paper](https://aclanthology.org/2020.coling-main.41/) which can be used as a word-level policy module in ConvLab3 pipeline. |
|
|
|
Refer to [ConvLab-3](https://github.com/ConvLab/ConvLab-3) for model description and usage. |
|
|
|
## Training procedure |
|
The model was trained on MultiWOZ 2.0 data using the [LAVA codebase](https://gitlab.cs.uni-duesseldorf.de/general/dsml/lava-public). The model started with VAE pre-training and fine-tuning with informative prior KL loss, followed by corpus-based RL with REINFORCE. |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during SL training: |
|
- y_size: 10 |
|
- k_size: 20 |
|
- beta: 0.1 |
|
- simple_posterior: true |
|
- contextual_posterior: false |
|
- learning_rate: 1e-03 |
|
- max_vocab_size: 1000 |
|
- max_utt_len: 50 |
|
- max_dec_len: 30 |
|
- backward_size: 2 |
|
- train_batch_size: 128 |
|
- seed: 58 |
|
- optimizer: Adam |
|
- num_epoch: 100 with early stopping based on validation set |
|
|
|
The following hyperparameters were used during RL training: |
|
- tune_pi_only: false |
|
- max_words: 100 |
|
- temperature: 1.0 |
|
- episode_repeat: 1.0 |
|
- rl_lr: 0.01 |
|
- momentum: 0.0 |
|
- nesterov: false |
|
- gamma: 0.99 |
|
- rl_clip: 5.0 |
|
- random_seed: 38 |
|
|
|
|