Evaluating Spoken Dialogue Processing for Time-Offset Interaction
Total Page:16
File Type:pdf, Size:1020Kb
Evaluating Spoken Dialogue Processing for Time-Offset Interaction David Traum Kallirroi Georgila Ron Artstein Anton Leuski The work depicted here was sponsored by the U.S. Army. Statements and opinions expressed do not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred. Outline . What is Time-offset interaction & “New Dimensions in Testimony” . Data collection . System Architecture . System Evaluation . ASR . Classification . User Impact 2 The Big Idea: changing how we communicating through space and time . Space (can be 2-way, interactive) . Time (so far non-interactive, . Semaphore maybe periodic) . Telegraph . Writing on paper/stone/tablets . Radio/Telephone . Audio Recordings . Video Conference . Film . Virtual worlds (e.g., 2nd life) . Electronic media . 3D video conference . Time-offset Interaction . Mostly interactive 3 Science Fiction/Fantasy imaginings of Conversations with Historical People Star Trek: Savage Curtain Holodeck: Star Trek TNG: Descent Harry Potter Portraits Headmaster portraits are capable of interaction with the living world. The headmaster or headmistress is painted before they die. When the portrait is completed, it is kept in a cupboard in the castle, and the headmaster or headmistress can teach their Hawking: " Wrong again, Albert!.” portrait to act and behave Kirk:" I cannot conceive it like themselves. Additionally, possible that Abraham Lincoln they can impart specific could have actually been information and knowledge reincarnated. And yet his that can be shared down the kindness, his gentle wisdom, centuries with their his humor, everything about successors. (Pottermore him is so right. website) 4 I Robot (2005) 5 From Fiction to Fact: ICT Question-answering characters Also, elsewhere: - Synthetic Interview (Marinelli & Stevens 1998) - August (Gustafson et al 1999) - Ben Franklin’s Ghost (Sloss and Watzman, 2005) Twins 2010 TLAC-XL 2003 SGT STAR 2007 SGT Blackwell 2004 6 Testbed: New Dimensions in Testimony (NDT) Project Goal: History as Intimate Conversation . Record Holocaust Survivor Testimony Now: . Holocaust survivors average age is >80 . New Dimensions . Interactive . Talk to the survivor . Get answers to your questions . Immersive . 3-D display . Image-based relighting 7 NDT Project Interaction Challenges . Can automated question-answering dialogue work with a real (octogenarian) person, not actors and authors? . Can we know what to record in one or a small number of recording sessions to reach adequate coverage? . Can automated ASR and NLU work well enough to facilitate a good experience? 8 NDT Project Interaction Challenges . Can this work with a real (octogenarian) person, not actors and authors? . Yes, Artstein et al IUI 2014 . Can we know what to record in one or a small number of recording sessions to reach adequate coverage? . yes, Artstein et al FLAIRS 2015 . Can automated ASR and NLU work well enough to facilitate a good experience? . Yes: this paper 9 Today Show May 11 2015 (excerpt from http://www.today.com/video/hologram-of-matt-created-for-3-d-time-capsule-443841091842 ) 10 11 Preparing for First Recording . Personnel Training . Script development . Selecting Interviewers . What questions should be . Stephen Smith asked? . Unfamiliar Adults . Top 100 questions to survivors . Children of different ages . Theme areas . Interaction issues . Previous testimony . No interruption . rest pose, . NJ Questions (note cards) . Full context . Offtopic taxonomy . restate question? . Off-topics 12 Wizard of Oz Data Collection & Testing . Test. content coverage, training classifier, ASR & language models. 4 locations (ICT, SFI, LA Museum of the Holocaust, New Roads School), over 120 participants, 1350 utterances. 13 Analysis of Coverage of First Recording Session . Sampling 3 question sets, separately annotated by two annotators . Inter-annotator reliability 0.82 Krippendorf’s alpha . Analysis of coverage of first answer set 14 Second Recording . Script motivated by repeat questions without good answers . Missing content not captured in first session . Result: 600 additional clips . including some in other languages (Polish, Hebrew) 15 System Architecture VHMsg / ActiveMQ bus AcquireSpeech NPCEditor Video Player Pocket Sphinx Google JLogger ASR Chrome ASR 16 ASR Tools . AcquireSpeech front-end . Multiple ASR Engines . PocketSphinx ASR . AppleDictation . Google Chrome ASR 17 General language models vs. domain-specific language models . Purely domain-specific language models (LMs) cannot recognize out-of-domain utterances . User input: can i send this audio to kallirroi . ASR output: can i some decided to tell early . General LMs do not perform well with domain-specific words or utterances . User input: why is your name pinchas . ASR output: why is your name pink us 18 Local ASR vs. ASR on the cloud . Local ASR: easier to control . ASR on the cloud: internet connection is required, privacy issues, do not have to worry about installation 19 ASR systems that we compared Trainable Trainable acoustic ASR systems language Local? models? models? Google Chrome ASR X X X Apple Dictation X X CMU PocketSphinx 20 NPCEditor: Classification Algorithm (Leuski et al Sigdial 2006) 21 NPCEditor (Leuski & Traum 2011): New Additions . Extended off-topic policy . Performance Report Generation . Please repeat . I do not know . I cannot answer . Why don’t you ask me about … . Let me tell you about … 22 Data Collection & Testing with full system . Iterations of collecting data, annotating, retraining system . 3 Locations: . ICT . Shoah Foundation . Museum of Tolerance . Approximately 75 participants . Over 4800 utterances collected 23 Automatically Generate Question & Response Set from Logs (plus manual transcription) Utterance ID Question Text ASR Response ID Response Text Code Best Response 201410221221- how old were you when the war HOW OLD WERE YOU 00104_age_war_s i was eight years old 0035 started WHEN THE WAR tart.mp4 STARTED 201410271858- what is your favorite prayer WHAT IS YOUR 01559_favorite_comy favorite color is blue 0279 FAVORITE PRAYER lor_blue.mp4 201410291315- did you ever give up hope when DID YOU EVER GIVE UP 00517_say_again. can you just repeat that 0320 you were in the concentration HOPE WHEN YOUR IN mp4 camp THE CONCENTRATION CAMP 201410241315- tell me about your childhood TELL ME ABOUT YOUR 00971_hat_from_ the hat from my childhood 0004 CHILDHOOD childhood.mp4 well it goes a long long way back my wife sorry my mother and i used to go... 201411051342- how old are your children HOW OLD ARE YOUR 00590_say_again. could you just repeat that 0062 CHILDREN mp4 24 Answer Quality Coding Scheme . 4: one of the best set of answers available . 3: ok answer, but there are better answers available . 2: on the same topic, some coherence with question, but not really an answer . 1: completely irrelevant For off-topic answers (e.g.,“I am afraid I cannot answer that question”) . 5: there is an answer that would be better than an off-topic in this spot . 6: there is no good answer (off-topic is optimal answer) 25 Annotations Utterance ID Question Text ASR Response ID Response Text Code Best Response HOW OLD WERE YOU 2014102212 how old were you when the war WHEN THE 00104_age_war_start. 21-0035 started WAR STARTED mp4 i was eight years old 4 00104_age_war_start.mp4 WHAT IS YOUR 2014102718 FAVORITE 01559_favorite_color_ 58-0279 what is your favorite prayer PRAYER blue.mp4 my favorite color is blue 1 01152_favorite_prayer.mp4 DID YOU EVER GIVE UP HOPE WHEN YOUR IN did you ever give up hope when THE 2014102913 you were in the concentration CONCENTRATI 15-0320 camp ON CAMP 00517_say_again.mp4 can you just repeat that 5 00891_why_not_give_up.mp4 the hat from my childhood well it goes a TELL ME long long way back my 2014102413 ABOUT YOUR 00971_hat_from_child wife sorry my mother 15-0004 tell me about your childhood CHILDHOOD hood.mp4 and i used to go... 3 01578_life_before_the_war.mp4 HOW OLD ARE 2014110513 YOUR could you just repeat 42-0062 how old are your children CHILDREN 00590_say_again.mp4 that 6 OFFTOPIC 26 How important was the second recording? • Sources for best responses • Results: 95% of utterances have a direct response 27 Evaluation . ASR . Classification . User Experience/Impact 28 Google Chrome ASR vs. Apple Dictation vs. CMU PocketSphinx . 350 sentences . Word Error Rate (percentage of insertions, deletions, and substitutions) . Google Chrome ASR: 5:09% . Apple Dictation: 18.85% . CMU PocketSphinx (domain-specific language model trained on the NPCEditor’s plist training data): 26.73% 29 Examples of ASR output User Input Google Chrome ASR Output Apple Dictation Output CMU PocketShpinx Output hello pinchas hello pinterest hello princess hello pinchas where is lodz where is lunch where is lunch where is lodz were you in majdanek were you in my dannic were you in my donick were you in majdanek were you in kristallnacht were you and krystal knox where you went kristallnacht were you when kristallnacht from how do you feel about the nazis how do you feel about the nazis how do you feel about the how do you feel of the nazis today today notsi’s today today did you serve in the army did you serve in the army he served in the army did you certain the army have you ever lived in israel have you ever lived in israel that ever lived in israel are you ever live in a israel what’s your favorite restaurant what’s your favorite restaurant what’s your favorite restaurant what’s your favorite rest shot 30 Classifier Evaluation 31 Effect of ASR Errors on Classification 32 Beta-testing at Illinois Holocaust Museum and Education Center 33 Letter from Museum Docent Doris Lazarus to Pinchas https://sfi.usc.edu/blog/doris-lazarus/meeting-pinchas Dear Pinchas, Although we are meeting for the first time today, I feel like we are old friends. I have been one of the few lucky docents at the Museum that were trained to share you with our visitors. Every time I have presented your demo, the audience response has been powerful. Your warm, engaging manner has connected with each and every visitor.