Low-Power Audio Keyword Spotting Using Tsetlin Machines
Total Page:16
File Type:pdf, Size:1020Kb
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 January 2021 doi:10.20944/preprints202101.0621.v1 Article Low-Power Audio Keyword Spotting using Tsetlin Machines Jie Lei 1, Tousif Rahman 1, Rishad Shafik 1*, Adrian Wheeldon 1, Alex Yakovlev 1, Ole-Christoffer Granmo 2, Fahim Kawsar 3, and Akhil Mathur 3, 1 Microsystems Research Group, School of Engineering, Newcastle University, NE1 7RU, UK; [email protected], [email protected], Rishad.Shafi[email protected], [email protected], [email protected]. 2 Centre for AI Research (CAIR), University of Agder, Kristiansand, Norway; [email protected] 3 Pervasive Systems Centre, Nokia Bell Labs Cambridge, UK; [email protected], [email protected] * Correspondence: Rishad.Shafi[email protected]; 1 Abstract: The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technolo- 2 gies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy 3 efficiency, memory footprint and system complexity of current Neural Network (NN) powered 4 AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning au- 5 tomata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant 6 reduction in parameter requirements and choosing logic over arithmetic based processing, the TM 7 offers new opportunities for low-power KWS while maintaining high learning efficacy. In this 8 paper we explore a TM based keyword spotting (KWS) pipeline to demonstrate low complexity 9 with faster rate of convergence compared to NNs. Further, we investigate the scalability with 10 increasing keywords and explore the potential for enabling low-power on-chip KWS. 11 Keywords: Speech Command; Keyword Spotting; MFCC; Tsetlin Machine; Learning Automata; 12 Pervasive AI; Machine Learning; Artificial Neural Network; Citation: Lei, J.; Rahman, T.; Shafik, 13 1. Introduction R.; Wheeldon, A; Yakovlev, A.; Granmo, Ole.; Kawsarand; F., 14 Continued advances in Internet of Things (IoT) and embedded system design have Mathur,A Low-Power Audio 15 allowed for accelerated progress in artificial intelligence (AI) based applications [1]. AI Keyword Spotting using Tsetlin 16 driven technologies utilizing sensory data have already had a profoundly beneficial im- Machines. J. Low Power Electron. 17 pact to society, including those in personalized medical care [2], intelligent wearables [3] Appl. 2021, 1, 0. https://doi.org/ 18 as well as disaster prevention and disease control [4]. 19 A major aspect of widespread AI integration into modern living is underpinned Received: 20 by the ability to bridge the human-machine interface, viz. through sound recognition. Accepted: 21 Current advances in sound classification have allowed for AI to be incorporated into Published: 22 self-driving cars, home assistant devices and aiding those with vision and hearing 23 impairments [5]. One of the core concepts that has allowed for these applications is 24 through using KWS[6]. Selection of specifically chosen key words narrows the training Publisher’s Note: MDPI stays neu- 25 tral with regard to jurisdictional data volume thereby allowing the AI to have a more focused functionality [7]. claims in published maps and insti- 26 With the given keywords, modern keyword detection based applications are usually tutional affiliations. 27 reliant on responsive real-time results [8] and as such, the practicality of transitioning 28 keyword recognition based machine learning to wearable, and other smart devices is 29 still dominated by the challenges of algorithmic complexity of the KWS pipeline, energy Copyright: © 2021 by the authors. 30 efficiency of the target device and the AI model’s learning efficacy. Submitted to J. Low Power Electron. 31 Appl. for possible open access The algorithmic complexity of KWS stems from the pre-processing requirements publication under the terms and 32 of speech activity detection, noise reduction, and subsequent signal processing for conditions of the Creative Commons 33 audio feature extraction, gradually increasing application and system latency [7]. When Attribution (CC BY) license (https:// 34 considering on-chip processing, the issue of algorithmic complexity driving operational creativecommons.org/licenses/by/ 35 latency may still be inherently present in the AI model [7,9]. 4.0/). Version January 28, 2021 submitted to J. Low Power Electron. Appl. https://www.mdpi.com/journal/jlpea © 2021 by the author(s). Distributed under a Creative Commons CC BY license. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 January 2021 doi:10.20944/preprints202101.0621.v1 Version January 28, 2021 submitted to J. Low Power Electron. Appl. 2 of 21 36 AI based speech recognition often offload computation to a cloud service. However, 37 ensuring real-time responses from such a service requires constant network availability 38 and offers poor return on end-to-end energy efficiency [10]. Dependency on cloud 39 services also leads to issues involving data reliability and more increasingly, user data 40 privacy [11]. 41 Currently most commonly used AI methods apply a neural network (NN) based 42 architecture or some derivative of it in KWS [8,9,12,13] (see Section Section 5.1 for 43 a relevant review). The NN based models employ arithmetically intensive gradient 44 descent computations for fine-tuning feature weights. The adjustment of these weights 45 require a large number of system-wide parameters, called hyperparameters, to balance 46 the dichotomy between performance and accuracy [14]. Given that these components, 47 as well as their complex controls are intrinsic to the NN model, energy efficiency has 48 remained challenging [15]. 49 To enable alternative avenues toward real-time energy efficient KWS, low-complexity 50 machine learning (ML) solutions should be explored. A different ML model will elimi- 51 nate the need to focus on issues NN designers currently face such as optimizing arith- 52 metic operations or automating hyper-parameter searches. In doing so, new methodolo- 53 gies can be evaluated against the essential application requirements of energy efficiency 54 and learning efficacy, 55 The challenge of energy efficiency is often tackled through intelligent hardware- 56 software co-design techniques or a highly customized AI accelerator, the principal goal 57 being to exploit the available resources as much as possible. 58 To obtain adequate learning efficacy for keyword recognition the KWS-AI pipeline 59 must be tuned to adapt to speech speed and irregularities, but most crucially it must be 60 able to extract the significant features of the keyword from the time-domain to avoid 61 redundancies that lead to increased latency. 62 Overall, to effectively transition keyword detection to miniature form factor de- 63 vices, there must be a conscious design effort in minimizing the latency of the KWS-AI 64 pipeline through algorithmic optimizations and exploration of alternative AI models, 65 development of dedicated hardware accelerators to minimize power consumption, and 66 understanding the relationships between specific audio features with their associated 67 keyword and how they impact learning accuracy. 68 This paper establishes an analytical and experimental methodology for addressing 69 the design challenges mentioned above. A new automata based learning method called 70 the Tsetlin machine (TM) is evaluated in the KWS-AI design in place of the traditional 71 perceptron based NNs. The TM operates through deriving propositional logic that 72 describes the input features [16]. It has shown great potential over NN based models in 73 delivering energy frugal AI application while maintaining faster convergence and high 74 learning efficacy [17–19] Through exploring design optimizations utilizing the TM in 75 the KWS-AI pipeline we address the following research questions: 76 • How effective is the TM at solving real-world KWS problems? 77 • Does the Tsetlin Machine scale well as the KWS problem size is increased? 78 • How robust is the Tsetlin Machine in the KWS-AI pipeline when dealing with 79 dataset irregularities and overlapping features? 80 This initial design exploration will uncover the relationships concerning how the Tsetlin 81 Machine’s parameters affect the KWS performance, thus enabling further research into 82 energy efficient KWS-TM methodology. 83 1.1. Contributions 84 The contributions of this paper are as follows: 85 • Development of a pipeline for KWS using the TM. 86 • Using data encoding techniques to control feature granularity in the TM pipeline. 87 • Exploration of how the Tsetlin Machine’s parameters and architectural components 88 can be adjusted to deliver better performance. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 January 2021 doi:10.20944/preprints202101.0621.v1 Version January 28, 2021 submitted to J. Low Power Electron. Appl. 3 of 21 89 1.2. Paper Structure 90 The rest of the paper is organized as follows: Section2 offers an introduction to the 91 core building blocks and hyper-parameters of the Tsetlin Machine. Through exploring 92 the methods of feature extraction and encoding process blocks, the KWS-TM pipeline 93 is proposed in Section 3.3. We then analyze the effects of manipulating the pipeline 94 hyper-parameters in Section 4 showing the Experimental Results. We examine the effects 95 of changing the number of Mel-frequency cepstrum coefficientss (MFCCs) generated, the 96 granularity of the encoding and the the robustness of the pipeline through the impact of 97 acoustically similar keywords. We then apply our understanding of the Tsetlin Machines 98 attributes to optimize performance and energy expenditure through Section 4.5. 99 Through the related works presented in Section 5.2, we explore the current research 100 progress on AI powered audio recognition applications and offer an in-depth look at the 101 key component functions of the TM. We summarize the major findings in Section6 and 102 present the direction of future work in Section7. 103 2. A Brief Introduction to Tsetlin Machine 104 The Tsetlin Machine is a promising, new ML algorithm based on formulation of 105 propositional logic [16].