The Voxceleb Speaker Recognition Challenge (Voxsrc) Workshop 2019
Total Page:16
File Type:pdf, Size:1020Kb
The VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 2019 Joon Son Chung and Andrew Zisserman Introduction VoxSRC • Creation of the VoxCeleb dataset • Overview of the speaker recognition challenge Datasets: VoxCeleb2 VoxSRC a large-scale audio-visual dataset of human speech 150,000+ YouTube videos of 7000+ different celebrity speakers 1 million+ utterances 2000+ hours of video Chung, J. S., Nagrani, A., & Zisserman, A., VoxCeleb2: Deep Speaker Recognition. INTERSPEECH, 2018 Clips from the same identity VoxSRC YouTube videos are a great source VoxSRC • Multi-speaker environments • Varying audio quality and background channel noise Red Carpet Interviews • Freely available Studio Interviews Outdoor and pitch Interviews Fully Automated Pipeline VoxSRC Aim: Automatically obtain audio segments of speakers from videos uploaded to YouTube To do this we need to solve the following: • When is a person speaking? Done using Active Speaker Verification (ASV) • Which speaker is the celebrity that we want? Done using Face Verification Fully Automated Pipeline VoxSRC Elon Musk Download videos Face detection Felicity Jones Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking Face Verification Who is the speaker? match match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB When is a person speaking? Face verification VoxCeleb database Fully Automated Pipeline VoxSRC 1. Candidate List Elon Musk Download videos Face detection Felicity Jones Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking Face Verification match match AcActivetive sp eSpeakeraker ve Verificationrification VOXCELEB Face verification VoxCeleb database 1. Candidate List VoxSRC • Celebrities are the ideal choice – many ‘interview’ videos • 7000+ identities, ranging from actors and sportspeople to entrepreneurs A.J. Buckley Alex Trebek Andy Samberg Ben Whishaw Bruno Ganz Charles Dance Cliff Curtis Danny McBride A.R. Rahman Alexa Davalos Aneurin Barnard Beth Grant Bruno Mars Charles S. Dutton Clive Owen Danny Pino Aamir Khan Alexander Siddig Ang Lee Bethany Mota Burt Reynolds Charlie Day Colin Donnell Darren Criss Aaron Tveit Alexandra Daddario Angela Kinsey Betty White CCH Pounder Charlotte Gainsbourg Constance Zimmer Dave Bautista Aaron Yoo Alexandra Roach Anne Hathaway Bill Nighy Caitriona Balfe Chazz Palminteri Corbin Bleu Dave Foley Abbie Cornish Alexz Johnson Ansel Elgort Bill Pullman Caity Lotz Chelsea Handler Corey Stoll David Abigail Breslin Alfre Woodard Anthony Anderson Billie Joe Armstrong Callan McAuliffe Cher Cory Monteith Attenborough Abigail Spencer Alice Eve Anthony Mackie Bingbing Li Callie Thorne Cheryl Ladd Costas Mandylor David Cassidy Adam Beach Alicja Bachleda Anthony Rapp Blair Underwood Cameron Boyce Chi McBride Cote de Pablo David Faustino Adam Brody Alison Arngrim Anton Yelchin Blake Michael Candace Cameron Bure Chiwetel Ejiofor Cristin Milioti David Giuntoli Adam Copeland Alison Pill Antonio Cupo Blake Shelton Candice Accola Chloe Bennet Cyndi Lauper David Harewood Adam Driver Allison Williams Arden Cho Bob Barker Candice Patton Chloe Dykstra Dakota Fanning David Henrie Adrianne Curry Amanda Seyfried Armand Assante Bobby Cannavale Caroline Rhea Chris Colfer Damian Lewis David Jason Adrianne Palicki Amaury Nolasco Armie Hammer Bonnie Wright Carolyn Hennesy Chris Hemsworth Damon Lindelof David Koechner Agyness Deyn Amber Riley Asa Butterfield Booboo Stewart Carrie Ann Inaba Chris Lowell Damon Wayans David Kross Aidan Turner America Ferrera Ashley Greene Brad Paisley Carrie Fisher Chris Martin Dan Aykroyd David Letterman Ajay Devgn Amitabh Bachchan Ashley Jensen Bradley Steven Perry Carrie Underwood Chris Messina Dan Fogler David Lyons Akshay Kumar Amy Poehler Audra McDonald Breckin Meyer Casey Affleck Chris Pine Dan Stevens David Mamet Alain Delon Amy Schumer Audrina Patridge Brenda Blethyn Casey Wilson Christian Kane Dana Delany David Mazouz Alan Alda Ana Gasteyer Avan Jogia Brendan Gleeson Cassandra Peterson Christina Hendricks Dane Cook David Morrissey Alan Cumming Andre Braugher B.J. Novak Brett Davern Caterina Murino Christina Ricci Danica McKellar David Morse Alan Rickman Andrea Riseborough Barbara Eden Brian Blessed Caterina Scorsone Christopher Mintz- Daniel Auteuil David Oyelowo Alan Tudyk Andrew Dice Clay Barbara Hershey Brian CoxBrian Dennehy Catherine Hardwicke Plasse Daniel Craig David Schwimmer Alba Rohrwacher Andrew Garfield Bear Grylls Bridget Moynahan Cedric the Entertainer Cierra Ramirez Daniel Dae Kim David Suchet Aldis Hodge Andrew Lee Potts Bellamy Young Bridget Regan Celia Imrie Cilla Black Daniel Tosh David Tennant Alex Borstein Andrew Rannells Ben Feldman Brit Marling Chace Crawford Cillian Murphy Danielle Campbell David Warner Alex Kingston Andrew Scott Ben McKenzie Brooke Burke-Charvet Chadwick Boseman Cindy Williams Danielle Panabaker David Wenham Alex Pettyfer Andy Richter Ben Stiller Bruce Boxleitner Charice Claudia Black Danny Dyer David Zayas • Names are taken from the existing VGGFace2* dataset *Cao, Qiong, et al. "Vggface2: A dataset for recognising faces across pose and age." Automatic Face & Gesture Recognition, 2018. VoxSRC Fully Automated Pipeline Elon Musk Download videos Face detection Felicity Jones 2. Data Extraction Audio feature MFCC extraction Face tracking Audio featureextract iextractionon Face detection and Tracking Face Detection Landmark Detection Face Verification match match ActActiveive spe Speakeraker ve rVerificationification VOXCELEB Face verification VoxCeleb database 14 2. Face Tracking and Landmark Detection VoxSRC VoxSRC Fully Automated Pipeline Elon Musk Download videos Face detection Felicity Jones Audio feature Face tracking Audio featureextract iextractionon Face detection and Tracking Face Verification SyncNet match match ActActiveive spe Speakeraker ve rVerificationification VOXCELEB 3. Active Speaker Detection Face verification VoxCeleb database 3. Active Speaker Detection - SyncNet VoxSRC small distance if synchronised large distance if not synchronised Chung, J. S., and Zisserman, A. "Out of time: automated lip sync in the wild." Asian Conference on Computer Vision, 2016. 3. Active Speaker Detection VoxSRC VoxSRC Fully Automated Pipeline Elon Musk Download videos Face detection Felicity Jones Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking Face Verification match match AcActivetive sp Speakereaker ve Verificationrification VOXCELEB 4. Face verification Face verification VoxCeleb database VGGFace classification score VoxSRC 4. Face Verification 7000 score vector for each identity SE-ResNet-50 CNN Pre-trained Cao, Qiong, et al. "Vggface2: A dataset for recognising faces across pose and age." Automatic Face & Gesture Recognition, 2018. 4. Face Verification VoxSRC Jones = 0.00 Jones = 0.00 Jones = 0.99 High thresholds – no manual intervention VoxSRC 1 0.9 0.8 n o i s i c e r 0.7 P 0.6 Active speaker verification Face verification 0.5 0.5 0.6 0.7 0.8 0.9 1 Recall Fully Automated Pipeline VoxSRC Elon Musk Download videos Face detection Felicity Jones Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking Face Verification Who is the speaker? match match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB When is a person speaking? Face verification VoxCeleb database The VoxCeleb1 Dataset VoxSRC 22,496 YouTube videos of 1,251 different celebrity speakers Nagrani, A., Chung, J. S., and Zisserman, A. VoxCeleb: A large-scale Speaker Identification Dataset. INTERSPEECH 2017. The VoxCeleb2 Dataset VoxSRC 150,480 YouTube videos of 7000+ different celebrity speakers Chung, J. S., Nagrani, A. and Zisserman, A. VoxCeleb2: Deep Speaker Recognition. INTERSPEECH 2018. The VoxCeleb Speaker Recognition Challenge VoxSRC 2019 VoxSRC • The goal of this challenge is to probe how well current methods can recognize speakers from speech obtained 'in the wild’. • A new dataset has been collected for evaluation. • VoxCeleb1 test sets are used for validation. VoxSRC 2019 test set VoxSRC • 500+ speakers, 19K utterances, 208K pairs • Collected using a similar pipeline to VoxCeleb • From YouTube videos of celebrities that do not appear in the VoxCeleb datasets • ~90% of the impostor pairs are from same gender • All utterances are at least 4 seconds in length VoxSRC 2019 test set VoxSRC • Manual verification of all speech segments • In addition, annotators pay particular attention to examples whose speaker embeddings are far from cluster centres VoxSRC 2019 - Tracks VoxSRC • Fixed: Participants can train only on the VoxCeleb2 dev dataset for which we have already released speaker verification labels. • Open: Participants can use the VoxCeleb datasets and any other data (including that which is not publicly released) except the challenge's test data..