Scientific Learning Reading Assistant™: Cmu Sphinx Technology in a Commercial Educational Software Application

SCIENTIFIC LEARNING READING ASSISTANT™: CMU SPHINX TECHNOLOGY IN A COMMERCIAL EDUCATIONAL SOFTWARE APPLICATION Valerie L. Beattie, Ph.D. Scientific Learning Corporation 300 Frank H. Ogawa Plaza, Suite 600, Oakland, CA 94612 www.scilearn.com [email protected] ABSTRACT research[6]. However, given public school class sizes and budget constraints, it is not possible for teachers or other The Reading Assistant™ software application is a reading trained professionals to provide sufficient practice with this tutor designed to provide the support and practice needed to kind of one-on-one help and feedback to students while improve reading skills. CMU Sphinx speech recognition they read. Speech recognition technology has the potential technology is used to provide the guided oral reading to fill this gap, but the challenges are also considerable. On experience that is key to building reading fluency and the one hand, the speech recognition software must ability. The application listens and follows along as a accommodate the acoustic variability from a wide range of student reads the displayed text, providing feedback and user ages, accents, reading rates, and noise conditions. At help if the student gets stuck or makes an error. the same time the application must strive to identify In many ways this application of speech recognition ‘undesirable’ variations in the user’s reading of the text, i.e. technology is different from typical commercial reading errors. applications, and we have made many modifications to the The development of the Reading Assistant application CMU Sphinx technology to adapt it to the needs of the began in 2000 with the foundation of Soliloquy Learning, Reading Assistant application. The goal for our application Inc. In 2008 Soliloquy Learning was acquired by Scientific is what we term ‘reading verification’: rather than trying Learning Corporation and the Reading Assistant product to determine what the user said, our goal is to determine continued its development as part of Scientific Learning’s whether the user read the presented text, and how well it suite of educational software products. Since its was read. In addition, the majority of users are children, commercial launch, we estimate that more than 50,000 requiring the development of custom acoustic models. students have used the Reading Assistant application. 2. THE READING ASSISTANT APPLICATION Index Terms— CMU Sphinx-2, CMU PocketSphinx, children’s speech, reading tutor Reading Assistant is a comprehensive reading program designed to build vocabulary, comprehension, and fluency. 1. BACKGROUND In the most recent release, Reading Assistant Expanded Edition, the user progresses through a library of reading The application of speech recognition technology to enable selections, grouped by reading level. reading tutor and other educational software has been an For each selection, the user first previews a selection, active area of research and development over the past then reads it silently, and answers guided reading several years. Notable efforts in this arena include CMU’s questions. At this point he or she also has the option to Project LISTEN[1,2], IBM’s Watch-me!-read[3], SRI’s listen to a model narration of the selection, and to use the EduSpeak®[4], and reading tutor work in CSLR (now glossary capabilities to get context-sensitive definitions for CLEAR) at the University of Colorado[5]. key terms. Figure 1 shows a page of a selection while in Guided oral reading is recommended as reading the preview mode. instruction practice and its value is supported by Figure 1 - Preview Mode Page Spread in Reading Assistant Expanded Edition Next, using a headset microphone, the student reads The software also provides a wide range of reports to the selection aloud. Sphinx speech recognition technology teachers, and administrators. These include measuring is used to interpret the reading. When the application performance and tracking progress over time in different detects that the student is unable to read a word, or has areas (fluency, comprehension strategies). Teachers can made a significant error, the software will intervene with listen to recorded samples of oral reading and see words the visual and/or audio prompting. The application will first students had difficulty with. Reports can be generated at highlight the word so the student can try again, and then, if the student, class, grade, school or even district level. needed, it will read the word to the student. Due to the importance of this reading practice, Reading Assistant 3. THE READING VERIFICATION TASK Expanded Edition requires users to read a selection two or three times. After reading a selection, a user may listen to As mentioned previously, the application of speech his or her reading of the selection. recognition technology to the guided oral reading process in In the final step, users take a quiz to measure their Reading Assistant is different from a typical speech comprehension of the material. The quiz questions may be recognition application. This difference has motivated multiple choice, true/false, or ask the student to select the many of the modifications made to the recognition sentence or paragraph that communicates a particular idea. technology, and necessitated the development of application The student is given feedback on their performance in components outside of the recognition technology that each of the task areas (preview, read aloud, and quiz). The manage the recognizer configuration, data processing, user is also given a review list of words, including words position tracking, and user interaction functionality. Figure that required prompting (intervention) during the oral 2 illustrates the architecture of the guided oral reading reading, as well as words that the software determined had functionality in Reading Assistant. more subtle errors or hesitations associated with them. Figure 2 - Guided Oral Reading Components user read a word, we also measure whether it was read Fundamentally this is a verification task where we wish fluently – i.e. without inappropriate hesitations or false to determine whether or not the student read out loud each starts[8]. Therefore timing information from the word of the text correctly. This influences both how the recognition process, and not just word sequence, is recognizer is configured for this task and how we measure important and must be accurate. performance. In terms of configuration, we know the exact In addition we have implemented a hierarchy of word text the user is supposed to be reading and we track importance which is used both in the configuration of the progress through this text during the oral reading. recognizer and in the processing of recognition results. In Therefore we can reasonably limit the number of words terms of comprehension, some words (such as articles and from the text that we include in the language model to prepositions) are less likely to be critical to meaning. words in a relatively small window of text around where the Readers are expected to know these short common words user is currently reading. At the same time, the user may well. Because they are short and less important to misread words and interject unrelated speech or non-speech meaning, these words are often de-emphasized or mumbled sounds. Therefore the language model configuration at any in read text, and often misrecognized. Therefore we may given point in the text consists of a combination of words not prompt or stop the user even if we do not get a correct local to that point in the text, plus ‘competition’ elements. recognition on a word in this category. Whether or not a Competition elements include word foils[7] (models of word should be placed in this category depends both on text mispronunciations or partial pronunciations of the word), reading level and on context, since at lower levels readers noise models, and context-independent phoneme models. may still be learning some of these words. The guided oral reading task, with the goal of In addition, some application and recognizer changes promoting fluency in reading, also influences the design of were motivated by the difficult acoustic environments in the application. Aside from whether the recognizer heard a schools and the challenges of obtaining a high quality audio signal, especially when the users are children. A running the recognizer on an embedded device, and the significant amount of effort has gone into a microphone ability to evaluate fully continuous models. check capability which detects signal quality issues[9] and instructs the user regarding correct microphone headset 5. MODIFICATIONS TO SPHINX placement. Finally, it is worth briefly discussing performance Developing the Reading Assistant application for the measurement for this application and the guided oral education market required many modifications to the reading activity. Since this is a verification task, the most Sphinx software. These changes were largely driven by the useful metrics are: requirements of the education market as well as by the • false negative rate: the percentage of the time unique needs of the Reading Assistant application. we prompt or intervene on a word that was read First of all, the source code had to be ported to the correctly Apple Macintosh platform because Macs represent a • false positive rate: the percentage of the time significant share of the education market and install base. we do not prompt or intervene on a word that The work required was mainly in developing the audio was read incorrectly input and output component for this platform. In our experience, the fundamental usability Another area of modifications was to make memory requirement is that the false negative rate be kept very low, use, management, and load time more efficient for typically 1% or less on average across a corpus of test data. recognizer configuration data. In particular, the way Higher false negative rates lead to frustration and detract Reading Assistant creates and uses language models from the goal of building and promoting fluency.

Scientific Learning Reading Assistant™: Cmu Sphinx Technology in a Commercial Educational Software Application

Speech Recognition in Java Breandan Considine Jetbrains, Inc

Multimodal HALEF: an Open-Source Modular Web-Based Multimodal Dialog Framework

Systematic Review on Speech Recognition Tools and Techniques Needed for Speech Application Development

Open Source German Distant Speech Recognition: Corpus and Acoustic Model

Hidden Voice Commands

German Speech Recognition: a Solution for the Analysis and Processing of Lecture Recordings

Bootstrapping Development of a Cloud-Based Spoken Dialog System in the Educational Domain from Scratch Using Crowdsourced Data

Using Open Source Speech Recognition Software Without an American Accent

Automatic Derivation of Grammar Rules Used for Speech Recognition in Dialogue Systems

End User License Agreement

Making an Effective Use of Speech Data for Acoustic Modeling

Title, Modify