Speech Recognition in ELT: the Impact on Teachers and Students
Total Page:16
File Type:pdf, Size:1020Kb
Speech Recognition in ELT: the impact on teachers and students Michael Carrier IATEFL Harrogate, 2014 Contents 1-What is ASR? What is it not? 2-How does it work? 3-How is it being used? Examples… 4-How can we use it in class? 5-ASR and Sp2SpT translation 6-Using Sp2SpT in class 7-ASR auto-marking of speech 8-Future trends 1-What is ASR? What is it not? • Automated Speech Speech recognition has Recognition (ASR) converts come of age. It is accurate audio streams into text, but and part of everyday life, and does not analyse it powering automatic semantically. translation and testing • The ASR output cannot assess systems. meaning or coherence • ASR is not the same as Natural What impact will this have on Language Processing ELT and how should we develop appropriate • ASR is flawed but improving pedagogical model, and rapidly prepare teachers for the • ASR is based on corpora and application of speech finding matching patterns in recognition to our data classrooms? Small vocabulary / many-users These systems are ideal for automated telephone answering. The users can speak with a great deal of variation in accent and speech ASR & ELT patterns, and the system will still understand them most of the time. • Speech recognition, also • History of failure. However, usage is limited to a small referred to as speech-to-text or number of predetermined commands voice recognition, is and inputs, such as basic menu • ASR facilitates auto-responseoptions or to numbers. communicative technology that recognizes interactions in the classroom, where students can speech, allowing voice to serve use their tablets (in pairs) to speak or write as the "main interface between Large vocabulary / limited-users the human and the computer". responses to a task andThese get ansystems instant work correction best in a or formative assessment. business environment where a small number of users will work with the • ASR also facilitates newprogram. ways to work on phonology • Voice recognition can refer to products that need to be and accent - using IBM'sWhile programme these systems 'Reading work with a good degree of accuracy (85 percent or trained to recognize a Companion' for examplehigher. with an expert user) and have specific voice, or those vocabularies in the tens of thousands products used in automated call • Automatic translation. Thereof words, are you already must train mobile them to workapps that allow students to speakbest with into a small a phone number or of tabletprimary centers that are capable of users. The accuracy rate will fall recognizing a limited and instantly hear the spokendrastically translation. with any other user. vocabulary from any user. • These 'speech-to-speech' systems are mainly accurate in narrow domains (eg domestic or tourist language) but are likely to impact on students' motivation and expectations of learning English. • ASR facilitates computer-based automatic marking of ELT examinations - both written and spoken exams. Cambridge University has set up a new institute, ALTA, to research this and is trialling auto-marking Cambridge ELT exams. 2-How does it work? 2-How does it work? Speech recognition engines require: an acoustic model, which is created by taking audio recordings of Small vocabulary / many-users speech and their transcriptions These systems are ideal for (taken from a speech corpus), and automated telephone answering. The 'compiling' them into a statistical users can speak with a great deal of representations of the sounds that variation in accent and speech patterns, and the system will still make up each word (through a understand them most of the time. process called 'training'). However, usage is limited to a small number of predetermined commands a language model or grammar file. A and inputs, such as basic menu language model is a file containing options or numbers. the probabilities of sequences of words. A grammar is a much smaller file Large vocabulary / limited-users containing sets of predefined These systems work best in a business environment where a small combinations of words. Language number of users will work with the models are used for dictation program. applications, whereas grammars are used in desktop command and While these systems work with a good degree of accuracy (85 percent or control or telephony interactive higher with an expert user) and have voice response (IVR) type vocabularies in the tens of thousands applications. of words, you must train them to work best with a small number of primary users. The accuracy rate will fall drastically with any other user. Works? Part-of-speech tags used: MD modal auxiliary (can, should, will) Markov models NC cited word (hyphenated after regular tag) NN singular or mass noun Vocabulary base NN$ possessive singular noun Corpora NNS plural noun NNS$ possessive plural noun Language modelling NP proper noun or part of name phrase NP$ possessive proper noun Context dependency NPS plural proper noun Accuracy criteria: NPS$ possessive plural proper noun NR adverbial noun (home, today, west) • Vocabulary size and confusability OD ordinal numeral (first, 2nd) • Speaker dependence vs. independence PN nominal pronoun (everybody, nothing) PN$ possessive nominal pronoun • Isolated, discontinuous, or continuous speech PP$ possessive personal pronoun (my, our) • Task and language constraints second (nominal) possessive pronoun PP$$ • Read vs. spontaneous speech (mine, ours) singular reflexive/intensive personal PPL • Adverse conditions pronoun (myself) plural reflexive/intensive personal PPLS pronoun (ourselves) objective personal pronoun (me, him, it, PPO them) Siri SS Activity – correct Siri How Siri works… 1 - The sounds of your speech were immediately 2 - The signal from your connected phone was encoded into a compact digital form that preserves relayed wirelessly through a nearby cell tower and its information. back to your Internet Service Provider where it communicated with a server in the cloud, loaded with a series of models honed to comprehend 3 - Simultaneously, your speech was evaluated language. locally, on your device. A recognizer installed on 4 - The server compares your speech against a your phone communicates with that server in the statistical model to estimate, based on the cloud to gauge whether the command can be best sounds you spoke and the order in which you handled locally -- such as if you had asked it to spoke them, what letters might constitute it. (At the play a song on your phone -- or if it must connect same time, the local recognizer compares your to the network for further assistance. (If the local speech to an abridged version of that statistical recognizer deems its model sufficient to process model.) For both, the highest-probability estimates your speech, it tells the server in the cloud that it is get the go-ahead. no longer needed: "Thanks very much, we're OK here.") 5 - Based on these opinions, your speech -- now understood as a series of vowels and consonants - 6 - If there is enough confidence in this result, the - is then run through a language model, which computer determines that your intent is to send an estimates the words that your speech is comprised SMS, Erica Olssen is your addressee (and of. Given a sufficient level of confidence, the therefore her contact information should be pulled computer then creates a candidate list of from your phone's contact list) and the rest is your interpretations for what the sequence of words in actual note to her -- your text message magically your speech might mean. appears on screen, no hands necessary. Reflection What is the impact of this for teachers in the classroom? What is the impact on teachers need for training and development to be able to use this technology in the classroom and adapt to its use in examinations? 3-How is it being used? Applications of ASR Dictation • Telephony Voice search • In-car systems Pronunciation • Military Translation • Healthcare • Education • Disability support – vision- impaired, RSI etc Nuance today announced that Samsung’s new GALAXY Gear wearable device and Samsung GALAXY Note 3 integrate Nuance’s voice and language capabilities as part of Samsung’s expanding lineup of Dragon S-Voice powered devices. Today’s announcement also marks the first use of Pros and cons Nuance’s voice and intelligent systems- based technology into the wearables category as part of a larger expansion of Nuance Cloud Services. Samsung integrates Nuance’s voice technology across handsets, tablets, TVs and now wearables. Nuance’s voice technology enables an incredibly intuitive and natural interface. Nuance has been at the forefront of revolutionizing devices to create intelligent systems through voice, Activity text and gesture-based technologies that Use IOS app to do chinese whispers are transforming the way we access our content. Together, Nuance and Samsung create a simple, effortless and personalized mobile experience for Android that understands, learns and adapts to the preferences of the consumer. Google Voice Search • Ask your questions out loud and get answers spoken back whether you are out and about or sitting at your desk. Just tap the mic on the Google search bar and speak up. This works on the Google Search App for iOS, Android and Chrome browsers for laptops and desktops. Other ASR apps Not just Siri… Google Voice Search Google Voice Typing Vlingo Nuance's Dragon Go! True Knowledge's Evi voice assistant Samsung S Voice Microsoft's TellMe Android's Speaktoit The Knowledge Graph is a Knowledge Graph knowledge base used by Google to enhance its search engine's search results with semantic- Conversational Search: search information Singhal stated, "A computer you can talk to? And it will answer gathered from a wide everything you ask it? Little did I know, I would grow up to variety of sources. become the person responsible for building my dream for the entire world." Conversational search technology was It provides structured and then featured and Singhal introduced the term "hot- detailed information wording" to describe search without the need for an about the topic in interface, whereby the user simply prompts the Google addition to a list of links search engine by stating, "OK Google." to other sites.