ConvLab
/

lava-policy-multiwoz20

dialogue policy

task-oriented dialog

Model card Files Files and versions Community

lava-policy-multiwoz20 / README.md

nflubis's picture

Update README.md

64a1ea2 almost 2 years ago

|

history blame contribute delete

1.32 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- dialogue policy
	- task-oriented dialog

	---

	# lava-policy-multiwoz

	This is the best performing LAVA_kl model from the [LAVA paper](https://aclanthology.org/2020.coling-main.41/) which can be used as a word-level policy module in ConvLab3 pipeline.

	Refer to [ConvLab-3](https://github.com/ConvLab/ConvLab-3) for model description and usage.

	## Training procedure
	The model was trained on MultiWOZ 2.0 data using the [LAVA codebase](https://gitlab.cs.uni-duesseldorf.de/general/dsml/lava-public). The model started with VAE pre-training and fine-tuning with informative prior KL loss, followed by corpus-based RL with REINFORCE.

	### Training hyperparameters

	The following hyperparameters were used during SL training:
	- y_size: 10
	- k_size: 20
	- beta: 0.1
	- simple_posterior: true
	- contextual_posterior: false
	- learning_rate: 1e-03
	- max_vocab_size: 1000
	- max_utt_len: 50
	- max_dec_len: 30
	- backward_size: 2
	- train_batch_size: 128
	- seed: 58
	- optimizer: Adam
	- num_epoch: 100 with early stopping based on validation set

	The following hyperparameters were used during RL training:
	- tune_pi_only: false
	- max_words: 100
	- temperature: 1.0
	- episode_repeat: 1.0
	- rl_lr: 0.01
	- momentum: 0.0
	- nesterov: false
	- gamma: 0.99
	- rl_clip: 5.0
	- random_seed: 38