<<

The VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 2019

Joon Son Chung and Andrew Zisserman Introduction VoxSRC

• Creation of the VoxCeleb dataset

• Overview of the speaker recognition challenge Datasets: VoxCeleb2 VoxSRC a large-scale audio-visual dataset of human speech

150,000+ YouTube videos of 7000+ different celebrity speakers

1 million+ utterances

2000+ hours of video

Chung, J. S., Nagrani, A., & Zisserman, A., VoxCeleb2: Deep Speaker Recognition. INTERSPEECH, 2018 Clips from the same identity VoxSRC YouTube videos are a great source VoxSRC

• Multi-speaker environments

• Varying audio quality and background channel noise

Red Carpet Interviews • Freely available

Studio Interviews Outdoor and pitch Interviews Fully Automated Pipeline VoxSRC

Aim: Automatically obtain audio segments of speakers from videos uploaded to YouTube

To do this we need to solve the following:

• When is a person speaking?  Done using Active Speaker Verification (ASV)

• Which speaker is the celebrity that we want?  Done using Face Verification Fully Automated Pipeline VoxSRC

Elon Musk Download videos Face detection Felicity Jones

Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking

Face Verification Who is the speaker?

match match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB When is a person speaking?

Face verification VoxCeleb database Fully Automated Pipeline VoxSRC

1. Candidate List

Elon Musk Download videos Face detection Felicity Jones

Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking

Face Verification

match

match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB

Face verification VoxCeleb database 1. Candidate List VoxSRC • Celebrities are the ideal choice – many ‘interview’ videos

• 7000+ identities, ranging from actors and sportspeople to entrepreneurs

A.J. Buckley Alex Trebek Andy Samberg Ben Whishaw Bruno Ganz Charles Dance Cliff Curtis Danny McBride A.R. Rahman Alexa Davalos Aneurin Barnard Beth Grant Bruno Mars Charles S. Dutton Clive Owen Danny Pino Aamir Khan Alexander Siddig Ang Lee Bethany Mota Burt Reynolds Charlie Day Colin Donnell Darren Criss Aaron Tveit Alexandra Daddario Betty White CCH Pounder Constance Zimmer Dave Bautista Aaron Yoo Alexandra Roach Anne Hathaway Caitriona Balfe Chazz Palminteri Corbin Bleu Dave Foley Abbie Cornish Alexz Johnson Ansel Elgort Bill Pullman Caity Lotz Chelsea Handler Corey Stoll David Abigail Breslin Anthony Anderson Billie Joe Armstrong Callan McAuliffe Attenborough Abigail Spencer Alice Eve Anthony Mackie Bingbing Li Callie Thorne Cheryl Ladd Costas Mandylor David Cassidy Adam Beach Alicja Bachleda Anthony Rapp Blair Underwood Cameron Boyce Chi McBride Cote de Pablo David Faustino Adam Brody Alison Arngrim Anton Yelchin Blake Candace Cameron Bure Chiwetel Ejiofor Cristin Milioti David Giuntoli Adam Copeland Alison Pill Antonio Cupo Blake Shelton Candice Accola Chloe Bennet Cyndi Lauper David Harewood Adam Driver Allison Williams Arden Cho Bob Barker Candice Patton Chloe Dykstra Dakota Fanning David Henrie Adrianne Curry Amanda Seyfried Bobby Cannavale Caroline Rhea Damian Lewis David Jason Adrianne Palicki Amaury Nolasco Armie Hammer Bonnie Wright Carolyn Hennesy Chris Hemsworth Damon Lindelof Agyness Deyn Amber Riley Asa Butterfield Booboo Stewart Carrie Ann Inaba Chris Lowell Damon Wayans David Kross Aidan Turner America Ferrera Ashley Greene Brad Paisley Carrie Fisher Chris Martin Dan Aykroyd David Letterman Ajay Devgn Amitabh Bachchan Ashley Jensen Bradley Steven Perry Carrie Underwood Chris Messina Dan Fogler David Lyons Akshay Kumar Audra McDonald Breckin Meyer Casey Affleck Chris Pine Dan Stevens Amy Schumer Audrina Patridge Casey Wilson Christian Kane David Mazouz Ana Gasteyer Avan Jogia Cassandra Peterson Christina Hendricks Dane Cook David Morrissey Alan Cumming B.J. Novak Brett Davern Caterina Murino Danica McKellar Barbara Eden Brian Blessed Caterina Scorsone Christopher Mintz- Alan Tudyk Andrew Dice Clay Brian CoxBrian Dennehy Catherine Hardwicke Plasse Daniel Craig David Schwimmer Alba Rohrwacher Bear Grylls Bridget Moynahan Cedric the Entertainer Cierra Ramirez Daniel Dae Kim Aldis Hodge Andrew Lee Potts Bellamy Young Bridget Regan Cilla Black Daniel Tosh David Tennant Alex Borstein Andrew Rannells Ben Feldman Brit Marling Chace Crawford Cillian Murphy Danielle Campbell David Warner Alex Kingston Andrew Scott Ben McKenzie -Charvet Chadwick Boseman Cindy Williams Danielle Panabaker David Wenham Alex Pettyfer Andy Richter Ben Stiller Bruce Boxleitner Charice Claudia Black David Zayas • Names are taken from the existing VGGFace2* dataset

*Cao, Qiong, et al. "Vggface2: A dataset for recognising faces across pose and age." Automatic Face & Gesture Recognition, 2018. VoxSRC Fully Automated Pipeline

Elon Musk Download videos Face detection Felicity Jones 2. Data Extraction Audio feature  MFCC extraction Face tracking Audio featureextrac textractionion Face detection and Tracking  Face Detection  Landmark Detection

Face Verification

match

match ActActiveive sp eSpeakeraker ve rVerificationification VOXCELEB

Face verification VoxCeleb database 14 2. Face Tracking and Landmark Detection VoxSRC VoxSRC Fully Automated Pipeline

Elon Musk Download videos Face detection Felicity Jones

Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking

Face Verification

SyncNet match

match ActActiveive sp eSpeakeraker ve rVerificationification VOXCELEB

3. Active Speaker Detection Face verification VoxCeleb database 3. Active Speaker Detection - SyncNet VoxSRC

small distance if synchronised

large distance if not synchronised

Chung, J. S., and Zisserman, A. "Out of time: automated lip sync in the wild." Asian Conference on Computer Vision, 2016. 3. Active Speaker Detection VoxSRC VoxSRC Fully Automated Pipeline

Elon Musk Download videos Face detection Felicity Jones

Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking

Face Verification

match

match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB 4. Face verification  VGGFace classification score Face verification VoxCeleb database VoxSRC 4. Face Verification

7000 score vector for each identity

SE-ResNet-50 CNN

Pre-trained

Cao, Qiong, et al. "Vggface2: A dataset for recognising faces across pose and age." Automatic Face & Gesture Recognition, 2018. 4. Face Verification VoxSRC

Jones = 0.00 Jones = 0.00 Jones = 0.99 High thresholds – no manual intervention VoxSRC

1

0.9

0.8

n

o

i

s

i

c

e r 0.7

P

0.6 Active speaker verification Face verification

0.5 0.5 0.6 0.7 0.8 0.9 1 Recall Fully Automated Pipeline VoxSRC

Elon Musk Download videos Face detection Felicity Jones

Audio feature Face tracking Audio featureextrac textractionion Face detection and Tracking

Face Verification Who is the speaker?

match match ActActiveive sp eSpeakeraker ve Verificationrification VOXCELEB When is a person speaking?

Face verification VoxCeleb database The VoxCeleb1 Dataset VoxSRC

22,496 YouTube videos of 1,251 different celebrity speakers

Nagrani, A., Chung, J. S., and Zisserman, A. VoxCeleb: A large-scale Speaker Identification Dataset. INTERSPEECH 2017. The VoxCeleb2 Dataset VoxSRC

150,480 YouTube videos of 7000+ different celebrity speakers

Chung, J. S., Nagrani, A. and Zisserman, A. VoxCeleb2: Deep Speaker Recognition. INTERSPEECH 2018. The VoxCeleb Speaker Recognition Challenge VoxSRC 2019 VoxSRC

• The goal of this challenge is to probe how well current methods can recognize speakers from speech obtained 'in the wild’. • A new dataset has been collected for evaluation. • VoxCeleb1 test sets are used for validation. VoxSRC 2019 test set VoxSRC

• 500+ speakers, 19K utterances, 208K pairs • Collected using a similar pipeline to VoxCeleb • From YouTube videos of celebrities that do not appear in the VoxCeleb datasets • ~90% of the impostor pairs are from same gender • All utterances are at least 4 seconds in length VoxSRC 2019 test set VoxSRC

• Manual verification of all speech segments • In addition, annotators pay particular attention to examples whose speaker embeddings are far from cluster centres VoxSRC 2019 - Tracks VoxSRC

• Fixed: Participants can train only on the VoxCeleb2 dev dataset for which we have already released speaker verification labels. • Open: Participants can use the VoxCeleb datasets and any other data (including that which is not publicly released) except the challenge's test data.