Automating Error Frequency Analysis Via the Phonemic Edit Distance Ratio
Total Page:16
File Type:pdf, Size:1020Kb
JSLHR Research Note Automating Error Frequency Analysis via the Phonemic Edit Distance Ratio Michael Smith,a Kevin T. Cunningham,a and Katarina L. Haleya Purpose: Many communication disorders result in speech corresponds to percent segments with these phonemic sound errors that listeners perceive as phonemic errors. errors. Unfortunately, manual methods for calculating phonemic Results: Convergent validity between the manually calculated error frequency are prohibitively time consuming to use in error frequency and the automated edit distance ratio was large-scale research and busy clinical settings. The purpose excellent, as demonstrated by nearly perfect correlations of this study was to validate an automated analysis based and negligible mean differences. The results were replicated on a string metric—the unweighted Levenshtein edit across 2 speech samples and 2 computation applications. distance—to express phonemic error frequency after left Conclusions: The phonemic edit distance ratio is well hemisphere stroke. suited to estimate phonemic error rate and proves itself Method: Audio-recorded speech samples from 65 speakers for immediate application to research and clinical practice. who repeated single words after a clinician were transcribed It can be calculated from any paired strings of transcription phonetically. By comparing these transcriptions to the symbols and requires no additional steps, algorithms, or target, we calculated the percent segments with a combination judgment concerning alignment between target and production. of phonemic substitutions, additions, and omissions and We recommend it as a valid, simple, and efficient substitute derived the phonemic edit distance ratio, which theoretically for manual calculation of error frequency. n the context of communication disorders that affect difference between a series of targets and a series of pro- speech production, sound error frequency is often ductions. As a basic clinical estimation, accuracy may be I linked to the severity of the condition. Error frequency expressed as the simple percentage of words in a reason- measures have clinical value across a wide variety of patient ably challenging speech sample that a speaker produces populations, including children with phonological disorders, without phonemic or phonetic errors (Duffy et al., 2017; childhood apraxia of speech (AOS), craniofacial disorders, Haley, Jacks, de Riesthal, Abou-Khalil, & Roth, 2012). and a host of acquired and developmental neurogenic Whereas the metric excels in providing a quick snapshot communication disorders. In our laboratory, we use error of the severity of the speech impairment, it falls short in frequency as a severity index for acquired AOS and aphasia its lack of resolution. with phonemic paraphasia (APP). Disorder severity can Precision increases with whole-word phonetic tran- vary dramatically, particularly for AOS, and sound error scription, but once transcribed, the summary is not straight- frequency therefore assumes a broad range of values (Haley, forward. Phoneme-level analysis has been used to derive Jacks, Richardson, & Wambaugh, 2017). Whereas subjective metrics such as percent consonants or phonemes correct impressions provide preliminary severity estimates, precise (Shriberg, Austin, Lewis, McSweeny, & Wilson, 1997; measurements are preferable for comparison and documen- Shriberg & Kwiatkowski, 1982) and the proportion of tation purposes alike. omission, addition, and substitution errors that a person Phonemic error frequency in AOS and APP can be makes when speaking (Bislick, McNeil, Spencer, Yorkston, estimated through a variety of strategies that index the & Kendall, 2017; Haley, Bays, & Ohde, 2001; Haley et al., 2012; Miller, 1995; Odell, McNeil, Rosenbek, & Hunter, 1990, 1991; Scholl, McCabe, Heard, & Ballard, 2018). aDivision of Speech and Hearing Sciences, Department of Allied These manual phonemic error counts are the unfortunate Health Sciences, University of North Carolina, Chapel Hill antithesis of word accuracy in their inefficiency—they are Correspondence to Katarina L. Haley: [email protected] time consuming, attention demanding, and potentially Editor-in-Chief: Julie Liss vulnerable to coding error. Additionally, they depend on Editor: Stephanie Borrie coding decisions that require consideration of the phonetic Received October 23, 2018 Revision received January 21, 2019 Accepted April 3, 2019 Disclosure: The authors have declared that no competing interests existed at the time https://doi.org/10.1044/2019_JSLHR-S-18-0423 of publication. Journal of Speech, Language, and Hearing Research • Vol. 62 • 1719–1723 • June 2019 • Copyright © 2019 American Speech-Language-Hearing Association 1719 Downloaded from: https://pubs.asha.org Margaret Forbes on 07/25/2019, Terms of Use: https://pubs.asha.org/pubs/rights_and_permissions context and the phonetic relationships between alternative (Levenshtein, 1966). This algorithm generates a sum of the segments. This complexity increases the risk of error and fewest number of operations that can transform one string reduces utility for settings where analysis time is limited. In into another. The operations are defined as omissions, our case, inefficient quantification is prohibitive in large-scale additions, and substitutions—the same phonemic error studies that involve hundreds of speech samples. Application categories we track manually when we calculate phonemic is equally challenging in clinical settings, where productivity error frequency. requirements limit time to process assessment results. Because the edit distance is suitable for strings of any One of the main challenges with manual error coding form, such as text, numbers, shapes, and item clusters, it is that it is not always clear how to properly segment utter- has been implemented through many different algorithms ances and align equivalent segment strings with one another. and employed extensively in the natural sciences. Applica- This problem is particularly evident for speakers who make tions have also been productive in the area of speech, but numerous errors (Haley, Cunningham, Eaton, & Jacks, primarily within computational linguistics (Gooskens & 2018; Haley, Jacks, & Cunningham, 2013). To demonstrate Heeringa, 2004; Yang & Castro, 2008) and automatic speech segmentation challenges, consider the following example recognition (Kruskal & Liberman, 1999; Schluter, Nussbaum- (see Table 1). The target is the word octopus (/ɑktəpʊs/). Thom, & Ney, 2011). To answer questions about phonetic The speaker produces /təbɛn/, and Coder A decides that variation and similarity, custom weights or costs have usually the production has no phonetic relationship to the target, been assigned to the operations based on phonetic features thereby coding all of its segments as substitutions and the and prevalence in a referenced speech corpus. With weights two missing phonemes as omissions, resulting in seven total appropriate to the sample and purpose, the magnitude of phonemic errors. Coder B, on the other hand, recognizes similarity between alternative productions can potentially thecommonsyllable/tə/, as well as the shared characteristics be expressed with good precision. Despite the demonstrated of /pʊs/ and /bɛn/. In these syllables, the first sounds are power of edit distance metrics in other fields, comparatively bilabial stops, the second sounds are produced with approxi- few applications have been used to analyze speech disorders. mately the same tongue height, and the third sounds are A recent study used a dynamic cost model from computa- alveolar continuants. As a result, Coder B links this produc- tional linguistics to answer a question about speech develop- tion to the target and consequently considers only the final ment in a communication disorder (Faes, Gillis, & Gillis, three segments to be substitutions, reducing the total number 2016). The research team used a weighted Levenshtein of errors to five. These instances in which some correspon- distance to quantify similarity between words produced by dence can be made between target and production have the children with cochlear implants and a reference standard potential to create unnecessary discrepancy and introduce and then compared the magnitude of this similarity to the ample opportunity for human judgment differences and corresponding value for age-matched, normally hearing errors. Whereas algorithms are available to achieve rule- peers to identify differences in the developmental trajectory. based phoneme alignment based on maximal phonetic For the purpose of calculating phonemic error fre- similarity (Covington, 1996), a simpler solution is preferable quency, the simpler unweighted Levenshtein edit distance for the basic purpose of estimating phonemic error frequency. appears to be a more practical choice. The minimum num- Ideally, phonetically trained clinicians and coders should ber of addition, omission, and substitution operations that be able to allocate their effort to the transcription process separates two strings of phonemes should, in concept, express itself and rely on automated processes for segment alignment frequency of phonemic errors. With this algorithm, alignment and error quantification. of transcribed phonemes would occur as a natural conse- Fundamentally, phonetic transcriptions are character quence of simply selecting the smallest number of operations strings, and error coding requires that these strings be com- for the transformation. Applied to the example