A Review of Automated Speech and Language Features for Assessment of Cognitive and Thought Disorders Rohit Voleti, Student Member, IEEE, Julie M
Total Page:16
File Type:pdf, Size:1020Kb
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING (JSTSP), SPECIAL ISSUE 1 A Review of Automated Speech and Language Features for Assessment of Cognitive and Thought Disorders Rohit Voleti, Student Member, IEEE, Julie M. Liss, Visar Berisha, Member, IEEE, Abstract—It is widely accepted that information derived from Initial analyzing speech (the acoustic signal) and language production Prompt (words and sentences) serves as a useful window into the health of an individual’s cognitive ability. In fact, most neuropsychological Record testing batteries have a component related to speech and language Subject where clinicians elicit speech from patients for subjective evalu- Response ation across a broad set of dimensions. With advances in speech signal processing and natural language processing, there has been ASR or Manual recent interest in developing tools to detect more subtle changes Transcription in cognitive-linguistic function. This work relies on extracting a set of features from recorded and transcribed speech for objective assessments of speech and language, early diagnosis of Natural Language Speech Signal neurological disease, and tracking of disease after diagnosis. With Processing Processing an emphasis on cognitive and thought disorders, in this paper Lexical Complexity Pause Rate we provide a review of existing speech and language features used in this domain, discuss their clinical application, and Syntactic Complexity Prosody & Rate highlight their advantages and disadvantages. Broadly speaking, the review is split into two categories: language features based on Semantic Coherence Articulation natural language processing and speech features based on speech signal processing. Within each category, we consider features Machine that aim to measure complementary dimensions of cognitive- Learning linguistics, including language diversity, syntactic complexity, Model semantic coherence, and timing. We conclude the review with a proposal of new research directions to further advance the field. Index Terms—cognitive linguistics, vocal biomarkers, Clincal Alzheimer’s disease, schizophrenia, thought disorders, natural Decision language processing Fig. 1: Overview of the process of using natural language I. INTRODUCTION processing and speech signal processing for extraction of speech and language features for clinical decision-making. ARLY detection of neurodegenerative disease and Example language features include lexical complexity, syntac- episodes of mental illness that impact cognitive function E tic complexity, semantic coherence, etc. Example of acoustic (henceforth, cognitive disorder and thought disorder, respec- arXiv:1906.01157v2 [cs.CL] 5 Nov 2019 speech features include pause rate, prosody, articulation, etc. tively) is a major goal of current research trends in speech and language processing. These afflictions have both significant societal and economic impacts on affected individuals. It is estimated that approximately one in six adults in the United gross domestic product [3]. The ability to identify early signs States lives with some form of mental illness, according and symptoms is critical for the development of interventions to the National Institute of Mental Health (NIMH), totaling to impact progression or episodes. 44.6 million people in 2016 [1]. In the United States alone, Many aspects of cognitive and thought disorders are man- some estimate that the economic burden of mental illness ifest in the way speech is produced and what is said. Irre- is approximately $1 trillion annually [2]. The World Health spective of the underlying disease or condition, the analysis Organization estimates that, in 2015, the global burden of of speech and language can provide insight to the underlying Alzheimer’s disease and dementia equaled 1:1% of the global neural function. This has motivated current research trends in quantitative speech and language analytics, with the hope of R. Voleti is with the School of Electrical, Computer, & Energy Engineering, eventually developing clinically-validated algorithms for better Arizona State University, Tempe, AZ, 85281 USA e-mail: [email protected] Manuscript received 16 May 2019; revised 23 Aug 2019; revised 24 Oct diagnosis, prediction, and characterization of these conditions. 2019; Accepted 26 Oct 2019 This has both long-term and nearer-term potential for impact. J-STSP-AAHD-00183-2019 © 2019 IEEE IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING (JSTSP), SPECIAL ISSUE 2 Category Subcategory Features or Methods Used Cognitive & Thought Disorder(s) Assessed Text-based Lexical features Bag of words vocabulary analysis Semantic dementia (SD) [4]; Alzheimer’s disease (AD) [5] Linguistic Inquiry & Word Count (LIWC) [6] Mild cognitive impairment (MCI) [7], schizophrenia (Sz/Sza) [8]; AD [9] Lexical Diversity (TTR, MATTR, BI, HS, etc.) AD [5], [10], [11]; primary progressive aphasia (PPA) [12], Sz/Sza [13], bipolar disorder (BPD) [13] Lexical Density (content density, idea density, P -density) MCI [14]; AD [5], [11], [9]; PPA [12]; chronic traumatic encephalopathy (CTE) [15]; Sz/Sza [13], BPD [13] Part-of-speech (POS) tagging MCI [14]; PPA [12]; AD [5], [9]; Sz/Sza [16], [17], [18] Syntactical features Constituency-based parse tree scores (Yngve [19], Frazier [20]) AD [5], [9]; MCI [14]; PPA [12] Dependency-based parse tree scores MCI [14] Speech graphs and attributes Sz/Sza [21], [22]; BPD [21], [22]; MCI [23] Semantic features Word & sentence embeddings: - LSA [24] Sz/Sza [25], [16], [17] - Neural word embeddings (word2vec [26], GloVe [27], etc.) Sz/Sza [28], [18], [13]; BPD [13] - Neural sentence embeddings (SIF [29], InferSent [30], etc.) Sz/Sza [18], [13]; BPD [13] Topic modeling: - LDA [31] Sz/Sza [28], [8] - Vector-space topic modeling with neural networks AD [32], [33]; MCI [33] Semantic role labeling [34] Sz/Sza [28] Pragmatics Sentiment analysis Sz/Sza [28] Acoustic Prosodic features Temporal (pause rate, phonation rate, voiced durations, etc.) MCI [14], [35], [36], [37]; AD [35], [38]; Sz/Sza [39] Frontotemporal lobal degeneration (FTLD) [40] Fundamental frequency (F0) and trajectory AD [38]; BPD [41] Loudness and energy AD [38] Emotional content AD [38] Spectral features Formant trajectories (F1, F2, F3, etc.) PPA [42]; AD [5] Spectral centroid [43] AD [38] MFCC statistics [44] PPA [42]; AD [5] Vocal quality Jitter, shimmer, harmonic-to-noise ratio (HNR) PPA [42]; AD [5], [38] ASR-related Phone-level detection of filled pauses & temporal features MCI [36], [37] Improving WER for clinical data AD [45], [9]; neurodegenerative dementia (ND) [46], [47], [48] TABLE I: Summary of work covered in this review, including textual and acoustic features to assess cognitive and thought disorders that are automatically extracted from speech and language samples using NLP and speech signal processing. Note that in the abbreviations in the table, “Sz” refers to schizophrenia and “Sza” refers to the related schizoaffective disorder, which are considered together for simplicity in this summary. In the long term, there is the potential for new diagnostics for clinical applications [2]. Objective analysis of this sort and early intervention for improved treatment outcomes and has the potential to overcome some of the limitations asso- reduced economic burden. In the nearer term, there is the ciated with the current state-of-the-art for improved diagno- potential for improving the efficiency of clinical trials evalu- sis, prediction, and characterization of cognitive and thought ating new drugs. It is generally accepted that early enrollment disorders. A high-level block diagram of existing methods in clinical trials for evaluation of new drugs maximizes the in clinical-speech analytics is shown in Figure 1. Patients chances of showing that a drug is successful [49], [50]. In provide speech samples via a speech elicitation task. This addition, adopting endpoints that are more sensitive to change could be passively collected speech, patient interviews, or means that these studies can be powered with fewer partic- recorded neuropsychological batteries. The resulting speech is ipants. Digital endpoints collected frequently have recently transcribed, using either automatic speech recognition (ASR) garnered interest in this domain [51]. or manual transcription, and a set of speech and language In this review, we limit our focus to speech and language features are extracted that aim to measure different aspects of analysis for cognitive and thought disorders in the context of cognitive-linguistic change. These features become the input neurodegenerative disease and mental illness. In keeping with of a machine learning model that aims to predict a dependent the Diagnostic and Statistical Manual of Mental Disorders variable of interest, e.g. detection of clinical conditions or (DSM-5) [52] classification system, “cognitive disorders” refer assessment of social skills [53], [14], [13]. to disturbances in memory and cognition, whereas “thought Perhaps the most important parts of the analysis frame- disorders” refer to the inability to produce and sustain coherent work in Figure 1 are the analytical methods used to extract communication. Thus, Alzheimer’s disease (the most common clinically-relevant features from the samples. With a focus on form of dementia) is an example of a cognitive disorder, while cognitive and thought disorders, we provide a survey of the schizophrenia is an example of a mental illness that presents existing literature and common speech and language features as a thought