applied sciences Review A Review of Deep Learning Based Speech Synthesis Yishuang Ning 1,2,3,4, Sheng He 1,2,3,4, Zhiyong Wu 5,6,*, Chunxiao Xing 1,2 and Liang-Jie Zhang 3,4 1 Research Institute of Information Technology Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China 2 Department of Computer Science and Technology Institute of Internet Industry, Tsinghua University, Beijing 100084, China 3 National Engineering Research Center for Supporting Software of Enterprise Internet Services, Shenzhen 518057, China 4 Kingdee Research, Kingdee International Software Group Company Limited, Shenzhen 518057, China 5 Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Shenzhen Key Laboratory of Information Science and Technology, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China 6 Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China * Correspondence:
[email protected] Received: 1 August 2019; Accepted: 20 September 2019; Published: 27 September 2019 Abstract: Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more attention. Recent advances on speech synthesis are overwhelmingly contributed by deep learning or even end-to-end techniques which have been utilized to enhance a wide range of application scenarios such as intelligent speech interaction, chatbot or conversational artificial