<<

Buenos Aires – 5 to 9 September, 2016 st for the 21 Century…

PROCEEDINGS of the 22 nd International Congress on Acoustics

Communication Acoustics: Paper ICA2016-187

The advent of Communication Acoustics in retrospect

Jens Blauert

Ruhr-Universität Bochum, , [email protected]

Abstract

Communication Acoustics is a cover label for those aspects of acoustics that involve relations between the classical fields of acoustics and the information and communication technologies. The usage of the term started around 1974, but it took 42 year until it finally became an explicit topic at the International Congress of Acoustics, namely, here in Buenos Aires at the ICA 2016. In the current talk, the history of Communication Acoustics will be recalled, considering the roles of electro-acoustics, auditory perception and audio-signal processing in the course of the de- velopment of this field. In this context, two areas of application will be taken as examples to dis- cuss the essence of Communication Acoustics, namely, (a) Virtual-Reality (VR) generation and (b) Computational Auditory-Scene Analysis (CASA) ─ both dealing with parametric representa- tions of auditory scenes. In both of these fields the trend can identified of including more explicit knowledge as well as learning algorithms into Communication-Acoustics systems and their com- ponents. For this purpose, proficiency in computational symbol processing is required in terms of scientific craftsmanship, besides pure signal-processing skills

Keywords: communication acoustics; communication acoustics, definition of; communication acoustics, history of;

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

The advent of Communication Acoustics in retrospect

1 Introduction This paper does not present scientific results. It is, in essence, a subjective report by an eye- witness, namely, the author himself, on how he experienced the advent of Communication Acoustics. He had written a PhD thesis and an inaugural thesis on spatial hearing in the 60 th and was, in 1974, appointed professor in Bochum, Germany, with teaching obligations in elec- trical-field and network theory. When he disclosed to his faculty colleagues that he intended to start a research program in Perceptual Acoustics, they were quite concerned as they did not accept sensory perception as a topic of scientific research. The relevance of this field for the information technologies was not yet recognized, although this author had already proposed the basic idea of perceptual coding [1] at that time. Although it is long established that Acoustics has two aspects to it, the physical, see [2], and the perceptual one, see [3] ─ in fact, the word Acoustics derived from the ancient Greek word for "to hear" (AKOÝEIN ... ak’u:in) ─ engineers had strong reservation regarding the perceptual side of it. Thus, the term Perceptual Acoustics was not accepted, nor was the term Communication Acoustics (sic!), for a research field in engineering, and we had to settle for Electroacoustics. A took years until this attitude changed, and finally the Institute of Communication Acoustics in Bochum was officially established ─ the first of its kind in those days. Nevertheless, this put this author into the position of recognizing his own profes- sional activities as being flanked by two important milestones of modern acoustics, with Com- munication Acoustics right in the middle between the two.

2 Milestone #1 ─ Electroacoustics When this author received his basic university education, most academic teachers in Acoustics were specialized in Electroacoustics ─ for the reason that Acoustics had now taken advantage

Figure 1: Milestone #1 − Electrical engineering joined forces with Acoustics of technologies from electrical engineering. However, this could only happen at a large scale after the independent invention of the vacuum triode in by Robert von Lieben and Lee de Forest

2

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

[4], although important communication-technology-related inventions (telephone, telegraphone, photographophone) had been made much earlier (Fig. 1). But only now a device was finally available for amplifying “weak” currents. This paved the way for developing applications for a broader public, such as radio, television, public-address systems, and many relevant military applications (e.g., radar). Consequently, the adoption of electrical-engineering technologies by classical acoustics (physical acoustics and perceptual acoustics) marks a milestone of modern acoustics and lead to an enormous upswing in the field.

3 Milestone #2 ─ Communication Acoustics At the beginning of the 60 th , laboratory computers became available, and it was most likely M. Schroeder at Bell Labs, who started their application for acoustic-signal processing at a lar- ger scale. After having listened to his famous talk at the Tokyo ICA (Fig. 2) many of us realized that this will shape the future of acoustics. This author, by the way, spend all his Bochum start- up money (about 600,000 $ in today’s value) for acquiring an 8-bit computer with a one-screen DOS system and 16k (!) floppy disks. For this new field in Acoustics, which developed from an integration of physics, electrical engineering, computers, and perception, the term Communica- tion Acoustics was soon accepted. An operational definition reads as follows: “Communication Acoustics deals with those areas of acoustics which relate to the modern communication and information sciences and technologies.” At least two comprehensive books are meanwhile available in print [5, 6].

Figure 2: Milestone #2 ─ Computers and digital signal processing entered the game

Looking at the essence of communication-acoustics research, a foremost task appears to be the analysis and synthesis of auditory objects and scenes, and their representation in paramet- ric form [7, 8]. This leads to two prominent application areas, namely, computational auditory- scene analysis (CASA) and generation of (so-called) virtual reality (VR). We start here with the discussion of the schematic of a bimodal (audio-tactile) VR generator (Fig. 3). Controlled by a world-model, the system renders acoustic and tactile stimuli to the human observer. Thereby it continuously receives information from position trackers [9] mounted on head and hand of the observer. This makes the system interactive ─ what is an important feature, as only interactive

3

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

Figure 3: Schematic of a bimodal VR generator [8]

systems provide VR ─ what makes them more than just displays. But note that interactivity re- quires fast processing speeds, ideally processing in almost real time. The world model includes parametric representation of the space to be generated, typically based on a tray-tracing [10] or image-source model [11], or a combination of the two [12]. The acoustic signals are presented via headphones [13], and for the tactile rendering a special data glove is needed that employs tactile and thermal actuators. In many applications of virtual-reality generators it is aimed at exposing the observers to situ- ations such that they feel perceptively “present” in them. This is especially important for sce- narios in which the observers are supposed to act intuitively ─ as they would do in a respective real environment. Human–system interfaces which base on the principle of virtual reality have the potency of simplifying human–system interaction considerably. One may think of teleopera- tion systems, design systems and dialog systems in this context, also of computer games. The effort involved in creating perceptual presence is task-depending and depends on user require- ments. For example, for vehicle simulators the perceptual requirements are less stringent than for virtual control rooms for engineers. In general, virtual reality must appear sufficiently “plausible” to the observer in order to provide perceptual presence. Since VR systems are just about to enter the consumer market, various solutions to this problem can be expected.

The schematic shown in Fig. 3 houses, as its core, a “world model”. This is basically a repo- sitory that contains detailed descriptions of the space and of all objects which are to exist in the virtual realty. In one layer of the world model, termed application, rules are listed which regulate the interaction of the virtual objects with respect to the specific applications intended. Further, a central-control layer collects the reactions of the subjects which use the virtual-reality interactively and prompts the system to render appropriate responses. It goes without saying that, in order to render suitable stimuli to the human observer, the system has to make decision based on the actual situation and the tasks assigned to them. Depending on the tasks, this re- quires specific world knowledge and/or capabilities for autonomous learning in order to enable

4

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

suitable cognitive functions. Indeed, current implementation of such systems can be distin- guished by the level of intelligence and knowledge that they are furnished with.

We shall now discuss the second representative application area of Communication Acoustics as announced above, namely, systems for computational auditory-scene analysis (CASA). As an example a system is taken, the architecture of which was originally conceptualized in Bochum [14, 15] and, after substantial refinement, was adopted by both the AABB A initiative [16] and the EU-project “TWO !E ARS ” < www.twoears.eu >. Fig. 5 provides a block diagram of it.

Figure 4: Architecture of an advanced CASA system [15]

5

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

(A) The input signals to the systems are given by the two ear signals as recorded from a dummy head (head-and-torso simulator). The head is mounted on a robotic platform and is movable in 3 degrees of freedom (2 translatory ones plus head rotation). (B) Filters that mimic the middle ears (sloppy band-pass filters) follow. (C) The next processing step consists of a simulation of the two cochlea (spectral decomposition in critic bands, compression, generation of neural spikes and their probability of appearance as a function of time). (D) The outputs of the two cochlea modules enter a binaural processor that computes binaural activity, providing information on binaural attributes, such as loudness, pitch, interaural arrival-time differences (ITDs) and interaural level differences (ILD). Monau- ral processing is considered in parallel. (E) Here the output of module (D) is visualized as a time-variant binaural-activity map with the coordinates time, laterality, and intensity. There is evidence that similar representa- tions exist in biological systems. (F) The information from the binaural-activity map is analyzed to the end of extracting rele- vant features from it, suited for the further analysis. Since Gestalt rules are considered in this process, this stage is labelled “Gestalt experts”. Please note that here a transition from signal processing to symbol processing takes place, since features are denoted by labels. (G) From the feature sets rendered by stage (F), “proto-events” are formed by applying ap- propriate rule sets and machine-learning procedures. The output of this stage is evi- dence for specific events being identified, including confidence data on whether they actually occur. This stage is thus denoted “event experts”. (H) All information available from lower stages on proto-events, the features that characterize them, and the respective confidence intervals, are stored on a “black- board”. The backboard consists of various graphical models, in which these items and their mutual probabilistic relationships are stored. (I) The blackboard is not only accessible from the lower model stages, but also from a stage on top of it that consists of a set of expert programs, each of which is knowledge- able with regard to specific scenes and/or tasks ─ for instance, search-and-rescue sce- narios, or the assessment of quality of aural experience in multi-channel loudspeaker settings. The experts do not only have specific world knowledge but also know the rules which govern their specific scenes and tasks. Under the control of a scheduler program – so-to-say, the chairperson of the experts – the experts check the information available on the blackboard, try to make sense out of it, and infer back into it. (K) Finally the blackboard puts out a task-specific response, for example, a scene descrip- tion or a quality judgement.

6

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

An important property of the architecture plotted in Fig. 4 is that it allows for feedback loops in the course of processing. We know that feedback also exist in biological system. The following feedback loops are currently discussed. Some of them are cognitively controlled [15].

‒ To improve localization accuracy, head movements are performed, properly controlled by mimicking human strategies when exploring aural scenes. ‒ Feedback is used to change processing parameters in stages B─D, like adjusting au- ditory-filter bandwidths, changing spectral weights in combining information across filters, adjustment of operating points of temporal adaptation processes, or providing ad- ditional information to support auditory-stream segregation. ‒ On the “cognitive” level of our model, feedback can be integrated by treating the graph- ical models as active blackboard architecture. Higher level processes in application- specific subsystems – such as an expert on scene analysis – can set variables accord- ing to their specific intentions, and after an inference in the graphical model has been carried out accordingly, it will be visible how higher-level feedback corresponds with the rules and observations of the system – and what implications can be drawn from it. ‒ Feedback has to be employed in sound-quality assessment, where auditory percepts are to be compared to internal references that represent listeners’ preferences [16, 17, 18].

While former binaural models for scene analysis consisted of model corresponding to the stages A─E, leaving it to human experts to further analyse and judged on the basis of the binaural- activity maps [8], the approach reported in Fig. 5 includes cognitive processing. The goal is to finally substitute the human expert. Yet, although it has to be admitted that the realization of the cognitive part is still in its early stage, it is clear to see that the new approach contains element that reach beyond the scope of today’s Communication Acoustics ─ and thus marks a further milestone of this field.

4 Milestone #3 ─ Communication Acoustics becomes cognitive! From what has been discussed above, it becomes clear that Communication Acoustics is about to break the limits of acoustic-signal processing. New topics are joining the traditional mix of physics, sensory perception, electrical engineering, and acoustic-signal processing. In the first place, the systems get more intelligent. This means that they are now equipped with cognitive functions, or in other words, “brains” are implemented on them. Referring to our two examples, namely, VR and CASA, the good news is that different systems may share knowledge, for in- stance, as regards ground-truth data. Fig. 6 illustrates that in the CASA system such ground- truth, for instance, regarding auditory scenes, is embedded in the experts, while in the VR gen- erator it is part of the world model. i

7

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

Figure 6: Milestone #3: Communication Acoustics becomes cognitive! (a) CASA system, cognitive part, (b) VR generator

A further issue is that systems with inherent knowledge need be able to adapt their knowledge according to the specific situations and tasks at stake. Machine-learning techniques [19] are increasingly applied for this purpose. Fortunately, cognitive functions as well as learning algo- rithms are basically independent of specific sensory modalities. Further, starting from the as- sumption that human beings form their perceptual world in an active explorative process and use all their senses to acquire information for this purpose, it goes without saying that advanced CASA system have to consider multi- and cross-modal information. This strongly supports a current demand in technology, that is, the development of multimodal and multi-media applications.

This third milestone is rather an ongoing process and can be associated with a general trend in the information and communication technologies. Actually, in some sub-areas of Communica- tion Acoustics it has already been passed by.

For example, modern speech-recognition systems incorporate information such as domain knowledge, semantic networks, language models, word models, grammatical, syntactic, phonotactic and phonetic models, being represented in form of rules, fuzzy logics, transition probabilities, look-up tables, dictionaries, and so on. It was probably also in speech technology that it first became obvious that, for task of speech recognition, bottom-up (signal-driven) processing does not suffice but had to be complemented by top-down (hypotheses-driven) procedures, and consequently, the modelling of functions which are located more centrally in the human nervous system. An important task in this context is the collection of data and the

8

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

training of the intelligent systems. It is not far from the mark to expect the upcoming Big-Data technologies to initiate a major boost in this respect.

A further major challenge for CASA is the assignment of meaning to auditory objects [20, 21]. Human beings do not react according to what they perceive, but rather, they react on the grounds of what their percepts mean to them in their current action-specific, emotional and cognitive situation. Again it is to be expected that the Big-Data technologies will contribute essentially to this kind of tasks, that is, assigning meaning to objects and scenes.

5 Conclusions Modern information, communication and control systems frequently contain components which deal with the analysis and synthesis of auditory scenes, whereby these components are com- monly "embedded" in more complex systems. To evaluate their function in isolation is often im- possible or it leads to irrelevant results. Thus, sophisticated test-beds are needed for this purpose. In any case, Communication Acoustics represents an integral constituent of the modern information technologies and should be seen and rated in the context of these. Taking these current trends into account, it is obvious that Communication Acoustics will have to open the doors for new relevant topics, such as cognition, multimodality, and interactivity. It certainly would not survive as a stringently bounded discipline. This also means that just 42 years after the introduction of the term Communication Acoustics, the meaning of it will change considerably towards a broader concept. Students that aim at working their way into Communication Acoustics are strongly advised to acquire skills not only in signal processing but also in symbol processing. Further, besides on acoustics, EE and perceptual acoustics, they should keep an open eye on machine learning, cognitive psychology, cognitive physiology and, last but not least, on modern trends in robotics.

Acknowledgments The author thanks his former PhD students ─ for a complete list see [5]. The compilation of this paper was supported by the EU-Project “Two!Ears” ( FP7-ICT-2013-C-#618075).

References

[1] Blauert, J., Trittart, P., (1975), Ausnutzung von Verdeckungseffekten bei der Sprachkodierung (Exploiting masking in speech coding), Fortschr. Akustik, DAGA'75, 377−380, Physiker-Verlag, Weinheim, Germany

[2] Lord Rayleigh (J.W. Strutt) (1869, 1877) The theory of sound, Vols. 1, 2. MacMillan, New York

[3] Von Helmholtz, H. (1863) Die Lehre von den Tonempfindungen als physiologische Grundlage für die Theorie der Musik (Sensation of tone als a physiological basis for the theory of music) Vieweg und Sohn, Braunschweig, Germany

[4] Bosch, B. (2001) Lee de Forest – “Vater des Radios” (Lee de Forest – “father of radio”). Funk Gesch. 24:5–22 and 24:57–73

9

nd 22 International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016

st Acoustics for the 21 Century…

[5] Blauert, J. (ed.), Communication Acoustics, Springer, Berlin−Heidelberg

[6] Pullki, V., Karjalainen, M. (2015) Communication Acoustics: An introduction to speech, audio and (2015), Wiley, Hoboken NJ

[7] Blauert, J. (2002) Instrumental analysis and synthesis of auditory scenes: “Communication nd Acoustics”, Proc. 22 Int. Conf. Audio Engr. Soc. Virtual Synthesis, Entertainment and Audio, 387-395, Audio Engr. Soc, New York NY

[8] Blauert, J. (2005), Analysis and synthesis of auditory scenes, in: J. Blauert (ed.), Communica- tion Acoustics, 1−26, Springer, Berlin−Heidelberg

[9] Börger,G., Blauert J, Laws, P. (1977) Stereophone Kopfhörerwiedergabe mit Steuerung be- stimmter Übertragungsfaktoren durch Kopfdrehbewegungen (Stereophonic headphone repro- duction with variations of specific transfer factors by head rotations. Acustica 39:22–26

[10] Krokstadt, A., Strøm, S., Sørsdahl, S. (1968) Calculating the acoustical room response by use of a ray-traycing technique. J. Sound Vibr. 8:118–125

[11] Allen, J. B., Berkley, D. A. (1979) Image method for efficiently simulating small-room acoustics, J. Acoust. Soc. Am. 65, 943−950

[12] Lehnert, H. (1992) Binaurale Raumsimulation: Ein Computermodell zur Erzeugung virtueller auditiver Umgebungen (A computer model for the generation of auditory virtual environments). Doct diss, Ruhr-Univ. Bochum, Shaker, , Germany

[13] Hammershøi, D., Møller, H. (2005) Binaural technique: Basic methods for recording, synthesis

and reproduction, Chap. 9 in: Blauert, J. (ed.), Communication Acoustics, Springer, Berlin−Heidelberg−New York NY

th [14] Blauert, J. (1999) Binaural auditory models: architectural considerations. Proc 18 Danavox Symp. 189–206. Scanticon, Kolding, Denmark

[15] Blauert. J. and Obermayer, K. (2012), Rückkopplungswege in Binauralmodellen (Feedback loops in binaural models), Fortschr. Akust. DAGA’12, 2015–2016, Dtsch. Ges. Akustik, Berlin, Germany

[16] Blauert, J., Braasch, J., Buchholz, J., Colburn, H.S., Jekosch, U., Kohlrausch, A., Mourjopoulos, J., Pulkki, V. and Raake, A. (2010), Aural assessment by means of binaural algorithms – the AABB A project. In: Buchholz, J.M., Dau, T., Dalsgaard, J.C. & Poulsen, T. (eds.) Binaural Pro- nd cessing and Spatial Hearing, Proc. 2 Int. Symp. Auditory & Audiolog. Res. − ISAAR’09, 113– 124, Danavox Jubilee Foundation , Ballerup, Denmark

[17] Raake, A., Wierstorf, H., Blauert, J. (2014), A case for TWO !E ARS in audio-qualiy assessment. th Proc. 7 FORUM ACUSTICUM , Paper SS16-19, Krakòw, Poland

[18] Raake, A., Blauert, J. (2013), Comprehensive modeling of the formation process of sound-qual- ity. Proc. QoMEX 2013 . Klagenfurt, Austria

[19] Blauert, J., Kolossa, D. Obermayer, & Adiloglu, K. (2013) Further challenges and the road ahead. In J. Blauert (ed.), The technology of binaural listening, 477—502. Springer, Berlin– Heidelberg–New York−Dordrecht−London, and ASA Press, New York NY

[20] Jekosch, U. (2005) Assigning meaning to – Semiotics in the context of product sound design In: Blauert, J. (ed.), Communication Acoustics, 193−221, Springer, Berlin−Heidelberg

[21] Jekosch, U. (1999) Meaning in the context of sound quality (1999), Acta Acustica united with Acustica 85:681−684

10