Reflections on Artificial Intelligence, Speech Technology
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings from FONETIK 2019 Stockholm, June 10–12, 2019 Reflections on artificial intelligence, speech technology, brain imaging and phonetics Björn Lindblom Phonetics laboratory, Stockholm University, Sweden [email protected] Whither speech research? his blackboard: ‘What I cannot create I Broadly defined, the field of spoken lan- do not understand!’ (Gleick 1993). Let guage research is undergoing rapid us interpret this quote as a requirement change as a result of several factors. to be met by all good science. Let us call Computer technology can be used to it the Feynman criterion. If we apply it simulate interactive human speech com- to speech research, what do we find? munication in spectacularly diverse and First let us consider some recent realistic ways. achievements in speech technology. Since the 90’s sophisticated meth- On an almost daily basis, we hear ods have become increasingly available about new, often spectacular, artificial for imaging the real-time processes that intelligence (AI) developments. For in- take place in the brain during speaking, stance, some car models now have auto- listening and learning to speak. pilots. Medical diagnoses and treatments It would seem that the present situa- can be drastically improved thanks to tion offers speech research yet another vast data sets and clever methods that opportunity to expand its experimental detect patterns in the data that escape toolbox and orient itself towards neuro- even the most experienced expert. Fur- science with a view to acquiring a more thermore, law breakers get more easily profound understanding of the many caught with the aid of clever facial forms of language use under normal as recognition algorithms. well as disordered conditions. Already And then there is AlphaGo - a pro- many investigators have done so and gram that has beaten the world’s top hu- have contributed significant new man players of Go - a much more com- knowledge (Guenther 2016). plex game than chess (Mozur 2017). Its Our field presents a diverse mosaic training consisted in learning not only of research interests and special exper- from records of human games but from tise. Has time come for unification? discovering, on its own, winning and How could such interdisciplinary inte- losing strategies by playing innumerable gration be brought about? times against itself. The subject matter of a discipline The list of potential benefits of AI – tends to defined by the questions it asks, and disadvantages - is long (Foer 2018, but there are of course other factors such Friedman 2019). as tradition, ineffective administration Part of the story is AI-based speech and obsolete educational programs that technology. create obstacles. Recently we learned that attempts to Those are some of the questions and improve communication for paralyzed issues that need to be urgently addressed patients are under way and show consid- and dealt with today. erable promise. One such project (Akbar et al 2019) used a Brain Computer Inter- Speech technology and AI face technique to reconstruct input There is an anecdote about Richard speech signals from a population of Feynman, the eminent physicist, who evoked neural activity in the human au- had the following sentence written on ditory cortex. 139 Proceedings from FONETIK 2019 Stockholm, June 10–12, 2019 The investigators used a deep neural deep neural networks learn to a great ex- network to make direct estimates of tent on their own by accumulating and speech synthesizer parameters. This comparing a massive number of exam- model achieved high subjective and ob- ples of successes and failures –in part jective intelligibility scores on a digit found in records of human data, in part recognition allowing the authors to con- from information generated by the algo- clude that reconstructing speech from rithm itself. the human auditory cortex offers a Such results take steps towards re- speech neuroprosthetic for direct com- moving a major hurdle in the modeling munication with a paralyzed patient’s of human cognition. This hurdle is cre- brain. The technique is said to work un- ated by the fact that humans ‘know more der both overt and covert conditions. than they can tell’ - Polanyi’s Paradox. If this becomes a characteristic also of The IBM debater project (2019) computers, it would imply a major This project has produced a computer breakthrough for AI. Machines would program that recently participated in a then have become even more human- debate on ‘whether preschools should be like. Are we already there? subsidized’. Its performance was based There is an extremely serious down- on fifteen minutes of preparation, devel- side to these developments. ‘Not know- oping an argument in favor of subsidies, ing exactly what deep learning algo- producing a text and then giving an oral rithms do’ is problematic. It feeds right presentation of it. A large human audi- into people’s worst fears about future ence attended the event and was found to computers evolving to surpass and dom- prefer the human participant, but the inate their human masters. simulated debater came close to win- The ‘secrecy’ of the modeling ning. should also disqualify them as scientific Listening to this AI debater one is tools. They are unquestionably great im- struck by its ability to ‘reason’ and pro- itators. But imitation is hardly enough. If duce fluent and natural sounding speech these frameworks do not allow us to (see references). The big news is not that study the observed behavior, what use the human debater had won the debate. would they be to linguists, phoneticians, It is that the IBM debater had given its speech pathologists and communication human competitor a good run for his engineers? money – despite the complexity of the Instead of helping us understand our task. I cannot blame those who respond own brains, AI projects would create an by thinking that these developments are additional problem perhaps even more nothing short of extraordinary. I agree. forbidding: Understanding artificial But we have to ask: Where do we brains. place them within general science? What are the implications for research on Explanation: Holy Grail of research speech and language learning? Exactly Reasoning from first principles goes how does the debater do it? It meets the back to Aristotle. We can illustrate this Feynman criterion, but is imitation time-honored method by taking a look at enough? the acoustic theory of speech production AI people admit that they do not (Fant 1960, Stevens 1998). know exactly how these systems work At first one might assume that the (McAfee & Brynjolfsson 2016). But term ‘formant’ refers to an empirical no- they know enough to attribute the suc- tion derived from observations and anal- cess of AlphaGo (and other projects) to yses of large corpora of recorded speech, the use of deep learning networks. These but it is not. Anyone who has struggled models make it possible to abandon with making formant frequency meas- what has been used so far, viz. explicitly urements from short-term spectra and specified, computational rules. Instead, spectrograms knows that voice quality, 140 Proceedings from FONETIK 2019 Stockholm, June 10–12, 2019 nasalization and a high fundamental fre- human speech. Rather they hide what quency often make that exercise diffi- they do. cult. Those and other factors also cause Consequently they leave a big piece errors for automatic methods such as of the scientific task undone and might LPC whose results must be checked force upon us the epiphenomenal task of manually for accuracy. It is true that in- understanding - not only our own brains verse filtering can provide solutions to - but also artificial brains and their such problems – provided that you know deeply embedded organization. what the shape of the residue glottal pulse is supposed to look like. Neurobiology of speech We love formants, but they can be Students of language and speech are not very elusive! the only ones trying to get used to new The ‘formant’ is an independently technology. Neuroscience - now attract- motivated concept deduced from phys- ing our attention - is itself in fact a cur- ics. It is not a product of a data-driven rent instructive example: The Human approach. Rather the acoustic theory of Brain Project (HBP) is a gigantic EU- speech rests on the following first prin- sponsored effort in which some 500 sci- ciples (Beranek 1954): Newton’s second entists from over 100 universities partic- law, Boyle’s law and the Conservation of ipate (2012). mass. They provide the foundation for To motivate such a large project, the the wave equation that can be solved for applicants argue that time has come to arbitrary vocal tract configurations bring medicine, neuroscience and com- (Flanagan 1965). When the vocal tract puting together in response to the great- shape is known its resonances (for- est challenge of the 21st century: the un- mants) can be identified and their fre- derstanding of the human brain. Their quencies can be calculated. application underscores that, despite sig- When someone like me (with a lib- nificant progress neuroscience in recent eral arts education) invokes first princi- years, the field still suffers from frag- ples, it is not an expression of ‘physics mentation. envy’, nor a wish to reduce everything to To open the door to new treatments physics. of brain diseases – more than 500 have [I would perhaps be willing to plead been identified so far – the application guilty of ‘science envy’ because science states that it will be necessary to show is undoubtedly a good thing to be striv- how the ‘parts fit together in a single ing towards]. multi-level system’. A two-way process There are no absolute explanations, is envisioned. New computing technolo- only deeper and deeper accounts. Indi- gies will be discovered as more is vidual fields choose their explanans learned about the brain, and conversely, principles at different levels.