Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient Lantao Yu,† Weinan Zhang,†∗ Jun Wang,‡ Yong Yu† †Shanghai Jiao Tong University, ‡University College London {yulantao,wnzhang,yyu}@apex.sjtu.edu.cn,
[email protected] Abstract of sequence increases. To address this problem, (Bengio et al. 2015) proposed a training strategy called scheduled sampling As a new way of training generative models, Generative Ad- (SS), where the generative model is partially fed with its own versarial Net (GAN) that uses a discriminative model to guide synthetic data as prefix (observed tokens) rather than the true the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limi- data when deciding the next token in the training stage. Nev- tations when the goal is for generating sequences of discrete ertheless, (Huszar´ 2015) showed that SS is an inconsistent tokens. A major reason lies in that the discrete outputs from the training strategy and fails to address the problem fundamen- generative model make it difficult to pass the gradient update tally. Another possible solution of the training/inference dis- from the discriminative model to the generative model. Also, crepancy problem is to build the loss function on the entire the discriminative model can only assess a complete sequence, generated sequence instead of each transition. For instance, while for a partially generated sequence, it is non-trivial to in the application of machine translation, a task specific se- balance its current score and the future one once the entire quence score/loss, bilingual evaluation understudy (BLEU) sequence has been generated.