2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017)
Total Page:16
File Type:pdf, Size:1020Kb
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017) New Orleans, Louisiana, USA 5-9 March 2017 Pages 1-650 IEEE Catalog Number: CFP17ICA-POD ISBN: 978-1-5090-4118-3 1/10 Copyright © 2017 by the Institute of Electrical and Electronics Engineers, Inc All Rights Reserved Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved. *** This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version. IEEE Catalog Number: CFP17ICA-POD ISBN (Print-On-Demand): 978-1-5090-4118-3 ISBN (Online): 978-1-5090-4117-6 ISSN: 1520-6149 Additional Copies of This Publication Are Available From: Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 E-mail: [email protected] Web: www.proceedings.com TABLE OF CONTENTS AASP-L1: SOURCE SEPARATION AASP-L1.1: INFORMED SOURCE SEPARATION VIA COMPRESSIVE GRAPH SIGNAL .............................................. 1 SAMPLING Gilles Puy, Alexey Ozerov, Ngoc Q.K. Duong, Patrick Pérez, Technicolor, France AASP-L1.2: MOTION INFORMED AUDIO SOURCE SEPARATION ..................................................................................... 6 Sanjeel Parekh, Technicolor, France; Slim Essid, Télécom ParisTech, Universite Paris-Saclay, France; Alexey Ozerov, Ngoc Q.K. Duong, Patrick Perez, Technicolor, France; Gaël Richard, Télécom ParisTech, Universite Paris-Saclay, France AASP-L1.3: SUPERVISED MONAURAL SOURCE SEPARATION BASED ON ..................................................................11 AUTOENCODERS Keiichi Osako, Yuki Mitsufuji, Sony Corporation, Japan; Rita Singh, Bhiksha Raj, Carnegie Mellon University, United States AASP-L1.4: AN EM ALGORITHM FOR JOINT SOURCE SEPARATION AND DIARISATION ..................................... 16 OF MULTICHANNEL CONVOLUTIVE SPEECH MIXTURES Dionyssos Kounades-Bastian, Inria, France; Laurent Girin, Grenoble-INP, France; Xavier Alameda-Pineda, University of Trento, Italy; Sharon Gannot, Bar-Ilan University, Israel; Radu Horaud, Inria, France AASP-L1.5: BLIND SOURCE SEPARATION BASED ON INDEPENDENT LOW-RANK .................................................21 MATRIX ANALYSIS WITH SPARSE REGULARIZATION FOR TIME-SERIES ACTIVITY Yoshiki Mitsui, The University of Tokyo, Japan; Daichi Kitamura, SOKENDAI (The Graduate University for Advanced Studies), Japan; Shinnosuke Takamichi, The University of Tokyo, Japan; Nobutaka Ono, National Institute of Information, Japan; Hiroshi Saruwatari, The University of Tokyo, Japan AASP-L1.6: MULTICHANNEL AUDIO SOURCE SEPARATION: VARIATIONAL ........................................................... 26 INFERENCE OF TIME-FREQUENCY SOURCES FROM TIME-DOMAIN OBSERVATIONS Simon Leglaive, Roland Badeau, Gaël Richard, LTCI, Télécom ParisTech, Université Paris-Saclay, France AASP-L2: NON-NEGATIVE AUDIO MODELING AASP-L2.1: OVERLAPPING SOUND EVENT DETECTION WITH SUPERVISED .......................................................... 31 NONNEGATIVE MATRIX FACTORIZATION Victor Bisot, Slim Essid, Gaël Richard, Télécom ParisTech, France AASP-L2.2: SUPERVISED GROUP NONNEGATIVE MATRIX FACTORISATION WITH ..............................................36 SIMILARITY CONSTRAINTS AND APPLICATIONS TO SPEAKER IDENTIFICATION Romain Serizel, Université de Lorraine, LORIA, UMR 7503, France; Victor Bisot, Slim Essid, Gaël Richard, LTCI, CNRS, Télécom ParisTech, Université Paris - Saclay, France AASP-L2.3: TRACKING METRICAL STRUCTURE CHANGES WITH SPARSE-NMF .................................................... 41 Elio Quinton, Ken O’Hanlon, Simon Dixon, Mark B. Sandler, Queen Mary University of London, United Kingdom AASP-L2.4: DRUM EXTRACTION IN SINGLE CHANNEL AUDIO SIGNALS USING .................................................... 46 MULTI-LAYER NON NEGATIVE MATRIX FACTOR DECONVOLUTION Clement Laroche, Télécom ParisTech, France; Helene Papadopoulos, CNRS-LSS, France; Matthieu Kowalski, Université Paris Sud / LSS, France; Gaël Richard, Télécom ParisTech, France AASP-L2.5: INTERFERENCE REDUCTION IN MUSIC RECORDINGS COMBINING .................................................. 51 KERNEL ADDITIVE MODELLING AND NON-NEGATIVE MATRIX FACTORIZATION Delia Fano Yela, Sebastian Ewert, Queen Mary University of London, United Kingdom; Derry Fitzgerald, Cork Institute of Technology, Ireland; Mark B. Sandler, Queen Mary University of London, United Kingdom xxxiii AASP-L2.6: COMPLEX NMF WITH THE GENERALIZED KULLBACK-LEIBLER ....................................................... 56 DIVERGENCE Hirokazu Kameoka, NTT Corporation, Japan; Hideaki Kagami, Masahiro Yukawa, Keio University, Japan AASP-L3: DEEP LEARNING FOR SOURCE SEPARATION AND ENHANCEMENT I AASP-L3.1: DEEP CLUSTERING AND CONVENTIONAL NETWORKS FOR MUSIC ................................................... 61 SEPARATION: STRONGER TOGETHER Yi Luo, Zhuo Chen, Columbia University, United States; John R. Hershey, Jonathan Le Roux, Mitsubishi Electric Research Laboratories, United States; Nima Mesgarani, Columbia University, United States AASP-L3.2: DNN-BASED SPEECH MASK ESTIMATION FOR EIGENVECTOR ............................................................ 66 BEAMFORMING Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf, TU Graz, Austria AASP-L3.3: RECURRENT DEEP STACKING NETWORKS FOR SUPERVISED SPEECH .............................................71 SEPARATION Zhong-Qiu Wang, DeLiang Wang, The Ohio State University, United States AASP-L3.4: COLLABORATIVE DEEP LEARNING FOR SPEECH ENHANCEMENT: A ................................................76 RUN-TIME MODEL SELECTION METHOD USING AUTOENCODERS Minje Kim, Indiana University Bloomington, United States AASP-L3.5: DNN-BASED SOURCE ENHANCEMENT SELF-OPTIMIZED BY ................................................................. 81 REINFORCEMENT LEARNING USING SOUND QUALITY MEASUREMENTS Yuma Koizumi, Kenta Niwa, NTT Corporation, Japan; Yusuke Hioka, University of Auckland, New Zealand; Kazunori Kobayashi, NTT Corporation, Japan; Yoichi Haneda, The University of Electro-Communications, Japan AASP-L3.6: A NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO ........................................................ 86 MODELS Paris Smaragdis, Shrikant Venkataramani, University of Illinois, United States AASP-L4: SOUND FIELDS AASP-L4.1: ANALYTICAL APPROACH TO 2.5D SOUND FIELD CONTROL USING A ................................................. 91 CIRCULAR DOUBLE-LAYER ARRAY OF FIXED-DIRECTIVITY LOUDSPEAKERS Takuma Okamoto, National Institute of Information and Communications Technology, Japan AASP-L4.2: PERCEPTUAL EVALUATION OF A MULTIBAND ACOUSTIC CROSSTALK ............................................96 CANCELER USING A LINEAR LOUDSPEAKER ARRAY Christoph Hohnerlein, Technical University of Berlin, Germany; Jens Ahrens, Chalmers University of Technology, Sweden AASP-L4.3: SOUND FIELD ESTIMATION USING TWO SPHERICAL MICROPHONE ...............................................101 ARRAYS Satoru Emura, NTT Corporation, Japan AASP-L4.4: TIME OF ARRIVAL DISAMBIGUATION USING THE LINEAR RADON .................................................. 106 TRANSFORM Youssef El Baba, Friedrich-Alexander University, Germany; Andreas Walther, Fraunhofer Institute for Integrated Circuits, Germany; Emanuël Habets, Friedrich-Alexander University, Germany AASP-L4.5: LISTENING-AREA-INFORMED SOUND FIELD REPRODUCTION BASED .............................................111 ON CIRCULAR HARMONIC EXPANSION Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari, The University of Tokyo, Japan xxxiv AASP-L4.6: ONLINE SECONDARY PATH MODELLING IN WAVE-DOMAIN ACTIVE ...............................................116 NOISE CONTROL Wen Zhang, The Australian National University, Australia; Christian Hofmann, Michael Büerger, Friedrich-Alexander University Erlangen-Nurnberg, Germany; Thushara Abhayapala, The Australian National University, Australia; Walter Kellermann, Friedrich-Alexander University Erlangen-Nurnberg, Germany AASP-L5: DEEP LEARNING FOR AUDIO CONTENT ANALYSIS AASP-L5.1: DEEP RANKING: TRIPLET MATCHNET FOR MUSIC METRIC LEARNING ......................................... 121 Rui Lu, Kailun Wu, Tsinghua University, China; Zhiyao Duan, University of Rochester, United States; Changshui Zhang, Tsinghua University, China AASP-L5.2: A COMPARISON OF DEEP LEARNING METHODS FOR ENVIRONMENTAL ........................................126 SOUND DETECTION Juncheng Li, Robert Bosch LLC, United States; Wei Dai, Florian Metze, Carnegie Mellon University, United States; Shuhui Qu, Stanford University, United States; Samarjit Das, Robert Bosch LLC, United States AASP-L5.3: CNN ARCHITECTURES FOR LARGE-SCALE AUDIO CLASSIFICATION ...............................................131 Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, Kevin Wilson, Google Inc., United States AASP-L5.4: CNN-LTE: A CLASS OF 1-X POOLING CONVOLUTIONAL NEURAL