The VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 2019
Joon Son Chung and Andrew Zisserman Introduction VoxSRC
• Creation of the VoxCeleb dataset
• Overview of the speaker recognition challenge Datasets: VoxCeleb2 VoxSRC a large-scale audio-visual dataset of human speech
150,000+ YouTube videos of 7000+ different celebrity speakers
1 million+ utterances
2000+ hours of video
Chung, J. S., Nagrani, A., & Zisserman, A., VoxCeleb2: Deep Speaker Recognition. INTERSPEECH, 2018 Clips from the same identity VoxSRC YouTube videos are a great source VoxSRC
• Multi-speaker environments
• Varying audio quality and background channel noise
Red Carpet Interviews • Freely available
Studio Interviews Outdoor and pitch Interviews Fully Automated Pipeline VoxSRC
Aim: Automatically obtain audio segments of speakers from videos uploaded to YouTube
To do this we need to solve the following:
• When is a person speaking? Done using Active Speaker Verification (ASV)
• Which speaker is the celebrity that we want? Done using Face Verification Fully Automated Pipeline VoxSRC
Elon Musk Download videos Face detection Felicity Jones
Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking
Face Verification Who is the speaker?
match match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB When is a person speaking?
Face verification VoxCeleb database Fully Automated Pipeline VoxSRC
1. Candidate List
Elon Musk Download videos Face detection Felicity Jones
Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking
Face Verification
match
match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB
Face verification VoxCeleb database 1. Candidate List VoxSRC • Celebrities are the ideal choice – many ‘interview’ videos
• 7000+ identities, ranging from actors and sportspeople to entrepreneurs
A.J. Buckley Alex Trebek Andy Samberg Ben Whishaw Bruno Ganz Charles Dance Cliff Curtis Danny McBride A.R. Rahman Alexa Davalos Aneurin Barnard Beth Grant Bruno Mars Charles S. Dutton Clive Owen Danny Pino Aamir Khan Alexander Siddig Ang Lee Bethany Mota Burt Reynolds Charlie Day Colin Donnell Darren Criss Aaron Tveit Alexandra Daddario Angela Kinsey Betty White CCH Pounder Charlotte Gainsbourg Constance Zimmer Dave Bautista Aaron Yoo Alexandra Roach Anne Hathaway Bill Nighy Caitriona Balfe Chazz Palminteri Corbin Bleu Dave Foley Abbie Cornish Alexz Johnson Ansel Elgort Bill Pullman Caity Lotz Chelsea Handler Corey Stoll David Abigail Breslin Alfre Woodard Anthony Anderson Billie Joe Armstrong Callan McAuliffe Cher Cory Monteith Attenborough Abigail Spencer Alice Eve Anthony Mackie Bingbing Li Callie Thorne Cheryl Ladd Costas Mandylor David Cassidy Adam Beach Alicja Bachleda Anthony Rapp Blair Underwood Cameron Boyce Chi McBride Cote de Pablo David Faustino Adam Brody Alison Arngrim Anton Yelchin Blake Michael Candace Cameron Bure Chiwetel Ejiofor Cristin Milioti David Giuntoli Adam Copeland Alison Pill Antonio Cupo Blake Shelton Candice Accola Chloe Bennet Cyndi Lauper David Harewood Adam Driver Allison Williams Arden Cho Bob Barker Candice Patton Chloe Dykstra Dakota Fanning David Henrie Adrianne Curry Amanda Seyfried Armand Assante Bobby Cannavale Caroline Rhea Chris Colfer Damian Lewis David Jason Adrianne Palicki Amaury Nolasco Armie Hammer Bonnie Wright Carolyn Hennesy Chris Hemsworth Damon Lindelof David Koechner Agyness Deyn Amber Riley Asa Butterfield Booboo Stewart Carrie Ann Inaba Chris Lowell Damon Wayans David Kross Aidan Turner America Ferrera Ashley Greene Brad Paisley Carrie Fisher Chris Martin Dan Aykroyd David Letterman Ajay Devgn Amitabh Bachchan Ashley Jensen Bradley Steven Perry Carrie Underwood Chris Messina Dan Fogler David Lyons Akshay Kumar Amy Poehler Audra McDonald Breckin Meyer Casey Affleck Chris Pine Dan Stevens David Mamet Alain Delon Amy Schumer Audrina Patridge Brenda Blethyn Casey Wilson Christian Kane Dana Delany David Mazouz Alan Alda Ana Gasteyer Avan Jogia Brendan Gleeson Cassandra Peterson Christina Hendricks Dane Cook David Morrissey Alan Cumming Andre Braugher B.J. Novak Brett Davern Caterina Murino Christina Ricci Danica McKellar David Morse Alan Rickman Andrea Riseborough Barbara Eden Brian Blessed Caterina Scorsone Christopher Mintz- Daniel Auteuil David Oyelowo Alan Tudyk Andrew Dice Clay Barbara Hershey Brian CoxBrian Dennehy Catherine Hardwicke Plasse Daniel Craig David Schwimmer Alba Rohrwacher Andrew Garfield Bear Grylls Bridget Moynahan Cedric the Entertainer Cierra Ramirez Daniel Dae Kim David Suchet Aldis Hodge Andrew Lee Potts Bellamy Young Bridget Regan Celia Imrie Cilla Black Daniel Tosh David Tennant Alex Borstein Andrew Rannells Ben Feldman Brit Marling Chace Crawford Cillian Murphy Danielle Campbell David Warner Alex Kingston Andrew Scott Ben McKenzie Brooke Burke-Charvet Chadwick Boseman Cindy Williams Danielle Panabaker David Wenham Alex Pettyfer Andy Richter Ben Stiller Bruce Boxleitner Charice Claudia Black Danny Dyer David Zayas • Names are taken from the existing VGGFace2* dataset
*Cao, Qiong, et al. "Vggface2: A dataset for recognising faces across pose and age." Automatic Face & Gesture Recognition, 2018. VoxSRC Fully Automated Pipeline
Elon Musk Download videos Face detection Felicity Jones 2. Data Extraction Audio feature MFCC extraction Face tracking Audio featureextrac textractionion Face detection and Tracking Face Detection Landmark Detection
Face Verification
match
match ActActiveive sp eSpeakeraker ve rVerificationification VOXCELEB
Face verification VoxCeleb database 14 2. Face Tracking and Landmark Detection VoxSRC VoxSRC Fully Automated Pipeline
Elon Musk Download videos Face detection Felicity Jones
Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking
Face Verification
SyncNet match
match ActActiveive sp eSpeakeraker ve rVerificationification VOXCELEB
3. Active Speaker Detection Face verification VoxCeleb database 3. Active Speaker Detection - SyncNet VoxSRC
small distance if synchronised
large distance if not synchronised
Chung, J. S., and Zisserman, A. "Out of time: automated lip sync in the wild." Asian Conference on Computer Vision, 2016. 3. Active Speaker Detection VoxSRC VoxSRC Fully Automated Pipeline
Elon Musk Download videos Face detection Felicity Jones
Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking
Face Verification
match
match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB 4. Face verification VGGFace classification score Face verification VoxCeleb database VoxSRC 4. Face Verification
7000 score vector for each identity
SE-ResNet-50 CNN
Pre-trained
Cao, Qiong, et al. "Vggface2: A dataset for recognising faces across pose and age." Automatic Face & Gesture Recognition, 2018. 4. Face Verification VoxSRC
Jones = 0.00 Jones = 0.00 Jones = 0.99 High thresholds – no manual intervention VoxSRC
1
0.9
0.8
n
o
i
s
i
c
e r 0.7
P
0.6 Active speaker verification Face verification
0.5 0.5 0.6 0.7 0.8 0.9 1 Recall Fully Automated Pipeline VoxSRC
Elon Musk Download videos Face detection Felicity Jones
Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking
Face Verification Who is the speaker?
match match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB When is a person speaking?
Face verification VoxCeleb database The VoxCeleb1 Dataset VoxSRC
22,496 YouTube videos of 1,251 different celebrity speakers
Nagrani, A., Chung, J. S., and Zisserman, A. VoxCeleb: A large-scale Speaker Identification Dataset. INTERSPEECH 2017. The VoxCeleb2 Dataset VoxSRC
150,480 YouTube videos of 7000+ different celebrity speakers
Chung, J. S., Nagrani, A. and Zisserman, A. VoxCeleb2: Deep Speaker Recognition. INTERSPEECH 2018. The VoxCeleb Speaker Recognition Challenge VoxSRC 2019 VoxSRC
• The goal of this challenge is to probe how well current methods can recognize speakers from speech obtained 'in the wild’. • A new dataset has been collected for evaluation. • VoxCeleb1 test sets are used for validation. VoxSRC 2019 test set VoxSRC
• 500+ speakers, 19K utterances, 208K pairs • Collected using a similar pipeline to VoxCeleb • From YouTube videos of celebrities that do not appear in the VoxCeleb datasets • ~90% of the impostor pairs are from same gender • All utterances are at least 4 seconds in length VoxSRC 2019 test set VoxSRC
• Manual verification of all speech segments • In addition, annotators pay particular attention to examples whose speaker embeddings are far from cluster centres VoxSRC 2019 - Tracks VoxSRC
• Fixed: Participants can train only on the VoxCeleb2 dev dataset for which we have already released speaker verification labels. • Open: Participants can use the VoxCeleb datasets and any other data (including that which is not publicly released) except the challenge's test data.