Date: Tue, 17 Jun 1997 11:17:59 +0200 (MET DST)

Total Page:16

File Type:pdf, Size:1020Kb

Date: Tue, 17 Jun 1997 11:17:59 +0200 (MET DST)

Mobile Intelligent MultiMedia

Research in the field of new interactive and "intelligent" systems has been developed in a number of R&D laboratories and universities around the world. This paper reviews recent projects undertaken at one of the leading European research centers. By Paul Mc Kevitt, Paul Dalsgaard, and Jorgen Bach Andersen

Keywords: artificial intelligence, Intelligent MultiMedia, IntelliMedia 2000+, IntelliMedia, Mobile computing, Natural Language Processing, NLP, spoken dialogue systems, SuperinformationhighwayS

1. Introduction

The area of MultiMedia is growing rapidly and has various meanings from various points of view. MultiMedia can be separated into at least two areas: (1) (traditional) MultiMedia and (2) Intelligent MultiMedia (IntelliMedia). The former area is focussing on the display or optimal presentation of text, voice, sound and video/graphics with possibly touch and virtual reality linked in. However, the computer is not involved in "intelligent" interpretation of the meaning of the content of what it is presenting. It is not possible for the user to ask questions about the content of what the computer is presenting because the computer usually has no description of what is in a given video or speech or sounds.

IntelliMedia, which involves the computer processing and interpretation of perceptual input from speech, text and visual images in terms of its semantics and reacting to it is much more complex. It requires integrating research from Engineering, Computer , Artificial Intelligence and Cognitive Sciences. This is the newest area of MultiMedia research which has seen an upsurge over the last few years and one where most universities do not yet have much expertise (see Mc Kevitt 1995, 1996, 1997). With IntelliMedia systems people have the ability to interact in spoken dialogues with machines querying about the semantics of what is being presented and even their gestures and body language can be interpreted.

The mobile computing aspects of IntelliMedia are particularly relevant. This enables users to interact with perceptual speech and image data at remote sites and where that data can be integrated and processed at some central source with the possibility of results being relayed back to the user. The increase in bandwidth for wired and wireless networks and the proliferation of hand-held devices (e.g. NOKIA 9000 communicator) (footnote: NOKIA 9000 communicator is a trademark of NOKIA) and computers (see Bruegge and Bennington 1996, Rudnicky et al. 1996, Smailagic and Siewiorek 1996) brings this possibility even closer even with today's separation of information interchange into different data and voice channels. Future introduction of information exchange over a common data/voice channel is expected to further

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 1/9 extend mobile computing, and we will undoubtedly see the emergence of totally new applications with impressive performance.

Applications of mobile IntelliMedia are numerous including data fusion during emergencies, remote maintenance, remote medical assistance, distance teaching and internet web browsing. One can imagine mobile offices where one can transfer money from/to your bank account, order goods and tickets even while car cruising. Our application research demonstrator is a system which provides information on building usage and which people are located in which offices. People interact with the system through spoken dialogue and pointing at plans (or model) of a building and asking questions like "Who's in this office?" A wireless mobile version of the demonstrator is one where the user has a wearable computer and head-mounted display and becomes immersed in the context by walking around the said building. There are also applications within virtual reality and a feel for these is given in IEEE Spectrum (1997). SuperinformationhighwayS which have massive stores of information in MultiMedia forms require more intelligent means of information retrieval, where "less" means "more", through spoken dialogue and other methods (see Mc Kevitt 1997).

The large recent proliferation of conferences on wireless technologies, mobile MultiMedia, computing and networking indicates the surge of interest in this area. There has been rapid convergence of computing and telecommunications technologies in the past few years (see IEEE Spectrum 1996). Although there has been great interest in the communication of traditional MultiMedia over wired and wireless networks there has been less attention given to Intelligent MultiMedia which requires greater bandwidth.

All over the world there is activity concerning wideband wireless networks. In Europe it is dominated by the UMTS (Universal Mobile Telecommunication Services), and the trend is that there can be wide area coverage with the order of 150 kb/s and up to 2 Mb/s with less mobility in restricted areas, like a building. These activities would take place in the 2 GHz region, where some bandwidth could be available. Other proposals discuss the use of wideband CDMA (Code Division Multiple Access) with datarates in the same range as UMTS. When there is talk of much larger bandwidths and rates, like several hundreds of Mb/s, it is necessary to go to higher carrier frequencies in the millimeter range, where propagation coverage is more limited, and where there are some economic constraints for the moment. There is no doubt, however, that in the beginning of the next century we shall see a major bandwidth expansion for mobility, in the beginning in limited areas. The bandwidths mentioned above should be sufficient for the first generation systems (see Roehle 1997).

2. IntelliMedia 2000+ Speech is the most natural means of communication between humans and also between humans and computers. Speech (and other input modalities) has to be processed up to the semantic level in order to enable the computer to "understand" the intended interaction of the human user - i.e. the computer system must intelligently resolve possible ambiguities and application oriented questions by being able to perform grammatical analyses and consult a database which is able to support decisions -- and finally future systems will undoubtedly be requested for availability anywhere and at any time. Continuous speech recognition system application demonstrators that can recognise, understand and react upon specific and limited vocabularies have been available for some time now (see Fraser and Dalsgaard 1996, Larsen 1997).

CPK is a research centre which is financially supported by the Danish Technical Research Council and the Faculty of Science and Technology at Aalborg University (see http://www.cpk.auc.dk/CPK). Four relevant research groups exist within this faculty in the Institute of Electronic Systems, each of them covering expertise which together is necessary for building up intelligent MultiMedia systems. The four research groups are Computer Science (CS), Medical Informatics (MI), Laboratory of Image

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 2/9 Analysis (LIA) and Center for PersonKommunikation (CPK) each of them contributing knowledge to platforms for specification, learning, integration and interactive applications, expert systems and decision taking, image/vision processing, and spoken language processing/sound localisation.

The functionality of our research and software platforms is designed with focus on specification of new MultiMedia/MultiModal applications not necessarily by engineering experts but rather those having a professional background in an application domain. So, for example a travel agent would be able to use our platforms to build a spoken dialogue system for answering questions about flight information without needing to know the ins and outs of the platform itself. To that end a generic MultiMedia specification and design platform has been implemented (see Baekgaard 1994, Dalsgaard and Baekgaard 1994) and tested in a number of applications which were operated by spoken input only. This platform is now being redesigned into a second generation MultiMedia/MultiModal platform to be used for building up a number of application domain demonstrators.

Aalborg University, Denmark has already initiated IntelliMedia 2000+ which involves research with the production of a number of real-time demonstrators showing examples of IntelliMedia applications and to set-up a new education scheme with a new Master's degree in IntelliMedia and a nation-wide MultiMedia Network which is concerned with technology transfer to industry. IntelliMedia 2000+ is coordinated from the Center for PersonKommunikation (CPK) which has a wealth of experience and expertise in spoken language processing and radio communication two of the most central components of mobile IntelliMedia (see CPK Annual Report 1997, Fujisaki 1996). More details can be found on WWW: http://www.cpk.auc.dk/CPK/MMUI/.

2.1 Computer Science (CS) Research at Computer Science (CS) includes computer systems and the design/implementation of object-oriented programming languages and environments. The scientific approach covers the formally logical, the experimentally constructive, as well as the empirically descriptive environments.

Of particular interest for MultiMedia are the following subjects: principles for hypermedia construction, theories of synchronisation and cognition, distributed systems and networking, high volume databases, and the design and use of language mechanisms based on conceptual modelling.

CS contributions include experiments for performance evaluation of the available technology (e.g. high speed networking) and experiments on the methodology for design of MultiMedia systems. These contributions will be based on existing research activities, which includes networks, distributed models, and prototype hypermedia environments.

2.2 Medical Informatics (MI)

The research in the Medical Decision Support System group is centered around medical knowledge- based systems and the development of general tools to support complex decision making.

The research is building on a theory for representing causal dependencies by graphs (Bayesian networks), and uses these to propagate probability estimates. The group has developed several successful medical decision support systems, including sophisticated human-computer interaction issues.

The knowledge-based system technology based on Bayesian networks allowing for a proper handling of uncertain information has shown itself to be usable in creating intelligent coupling between

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 3/9 interface components and the underlying knowledge structure. This technology may be integrated in IntelliMedia systems.

It is foreseen that IntelliMedia systems will play a central role in the dissemination of information technology in the medical informatics sector. Systems representing complex knowledge, models and data structures e.g. advanced medical diagnostics system, virtual operation room, the telemedical praxis and so on, will require use of knowledge-based techniques for efficient interfacing.

2.3 Laboratory of Image Analysis (LIA) The research at LIA is directed towards three areas: Systems for computer vision, computer vision for autonomous robots, and medical and industrial application of image analysis.

So far the research has referred to sensory processing using single modalities, but it seems obvious that the available methods may be integrated into multi-modal system, where a major objective is coordination and optimal use of available modalities. New IntelliMedia systems may also include much more flexible modes of interaction between computers, including both speech, body movements, gestures, facial expressions and sign language. This motivates/reinforces the research in interpretation of manipulation and description of dynamically changing objects. Issues of research include also use of the combination of live images and computer graphics for creation of enhanced reality systems, which for example may be used in medical informatics systems on for example medical MRI images of brain tissue, tele presence systems, and tele manipulation.

2.4 Center for PersonKommunikation (CPK) Research at CPK is focused within the following three areas: Speech Communications, Data Communications and Radio Communications.

The research so far has been focused on the engineering design and development of IntelliMedia for speech and language in the context of professional use. The research is now ready to be further extended into the subsequent research paradigm which is based on the use of a number of available user interface components such as pen-based character recognition, optical character recognition, bar code readers, speech recognition, images and text and by combining these into an integrated MultiMedia interface (e.g. report generation, Personal Data Assistants (PDAs)).

A major effort at CPK within the radio area is measuring and characterizing mobile channels in different environments, including the use of smart antennas, both at the mobile user and at the base station. Smart antennas help improve signalling properties as well as allowing higher capacity.

[MONSTOR PHOTO HERE]

Figure 1. "Measuring the wideband distribution of radio waves in an indoor environment at Aalborg University"

3. The Chameleon demonstrator

The four groups of IntelliMedia 2000+ are developing an IntelliMedia computing platform called CHAMELEON which will be general enough to be used for a number of different applications. It is initially being applied to interpretation of plans/models of architectural layout of buildings and could later be envisaged in a multi-user Intelligent VideoConferencing scenario.

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 4/9 A first scenario for this application is an IntelliMedia Workbench with architectural layout plans (or model) on it and a laser pointing device which the computer uses for pointing and even giving route descriptions for some destination. The system keeps a database record of offices and their functionality/tenants. An advanced scenario would involve multiple speakers planning building and institution layout.

The application involves the integration of a distributed processing and learning platform, the HUGIN decision taking tool, image processing of plans and pointing, spoken dialogue processing (see Figure 2) and microphone arrays (see Figure 3) using a specification and design hub platform which can take MultiMedia input and conduct a semantic resolution of the individuals inputs.

[PHOTOGRAPH] Figure 2: Architecture of spoken dialogue platform

[PHOTOGRAPH] Figure 3: Microphone array experimental setup

Examples of interesting problems to be solved as part of this application would be resolution of ambiguity where a user says "Who's office is this?" but where the pointing-gesture is ambiguous since the person points sloppily between two rooms rather than into one. The system can then ask the user for a clarification as to which office he/she means. Other interesting interactions are "Point to Paul's office" and "Who's in the office beside Paul's?" especially when there are two Pauls in the building!

The hub platform demonstrates that (1) it is possible for agent modules to receive inputs particularly in the form of images, pointing gestures, spoken language, and sound sources and respond with required outputs (2) individual agent modules within the platform can produce output in the form of semantic representations to show their internal workings; (3) the semantic representations can be used for effective communication of information between different modules for various applications; and (4) various means of synchronising the communication between agents can be tested to produce optimal results.

Mobile computing aspects of this demonstrator become evident if we consider the user walking in the building represented by the plans/model with a wearable computer (see Bruegge and Bennington 1996, Rudnicky et al. 1996, Smailagic and Siewiorek 1996) and head-up display. This research could eventually be incorporated into more advanced scenarios involving multiple speakers in a VideoConferencing environment.

4. Education Teaching is a large part of the IntelliMedia programme and three courses have been initiated: (1) Graphical User Interfaces, (2) IntelliMedia Systems and (3) Readings in Advanced Intelligent MultiMedia. Graphical User Interfaces is a more traditional course involving teaching of methods for the development of optimal interfaces for Human Computer Interaction (HCI). The course brings students through methods for layout of buttons, menus, and form filling methods for interface screens and has hands on experience with the XV windows development tool.

A new international Master's Degree (M.Sc.) has been established and incorporates the courses just mentioned as core modules of a 1 and 1/2 year course taught in English on IntelliMedia. More details can be found on WWW: http://www.kom.auc.dk/ESN/. A Lifelong Learning course is given in August

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 5/9 for returning students of Aalborg University who wish to continue their education. This course is a compression of the core IntelliMedia courses.

The emphasis on group oriented and project oriented education at Aalborg University is an excellent framework in which IntelliMedia, an inherently interdisciplinary subject, can be taught. Groups can even design and implement a smaller part of a system which has been agreed upon between the groups.

5. MultiMedia Network (MMN) Aalborg University has initiated a number of Networks to link to industry and to conduct technology transfer. A MultiMedia Network (MMN) has been established which integrates IntelliMedia 2000+ from the engineering and computer science faculties with the humanistic faculty and already we have made a number of links to companies. It is also a natural forum with respect to conducting joint research and projects for student groups. The board of the network meets regularly to discuss joint activities between the University and industry. More details can be found on WWW: http://www.auc.dk/nc/

6. Conclusion Intelligent MultiMedia is an obvious area for the application of mobile technologies. Although there are a number of applications of IntelliMedia within the wired framework the applications become more exciting in a wireless scenario where users can become immersed in the context which they are investigating (e.g. building usability).

IntelliMedia 2000+ at Aalborg University brings together the necessary ingredients from research, teaching and links to industry to enable the successful implementation of Mobile Intelligent MultiMedia. Particularly, we have research groups in spoken dialogue processing, image processing, and radio communications which are the necessary features of this technology. Our application demonstrator which focusses on giving information on building usage is an ideal one for testing integration of various modules.

The emphasis on groupwork and project based education at Aalborg University enables better multidisciplinary integration which can normally be a problem. Our new Master's education will help to produce graduates who are proficient in the necessary theories and tools. The MultiMedia Network is helping to not only bring together the engineering, computer science and humanistic faculties but also to build links to industry. All of these ingredients are necessary for the success of the future of Mobile Intelligent MultiMedia which is part of the ever increasing Global Information Society.

7. References Baekgaard, Anders (1996) Dialogue Management in a Generic Dialogue System In Proceedings of the Eleventh Twente Workshop on Language Technology (TWLT) Dialogue Management in Natural Language Systems, 123-132. Twente, The Netherlands.

Bruegge, Bernd and Ben Bennington (1996) Applications of wireless research to real industrial problems: applications of mobile computing and communication IEEE Personal Communications, 64- 71, February

CPK Annual Report (1997) Center for PersonKommunikation (CPK), Fredrik Bajers Vej 7-A2, Institute of Electronic Systems (IES), Aalborg University, DK-9220, Aalborg, Denmark.

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 6/9 Dalsgaard, Paul and A. Baekgaard (1994) Spoken Language Dialogue Systems, in: Chr. Freksa, Ed., Prospects and Perspectives in Speech Technology: Proceedings in Artificial Intelligence, 178--191, September. M"/unchen, Germany: Infix.

Emmerson, Bob and Martyn Warwich (1997) Communications International Voice recognition: out of the laboratory and into the market "Mind your language", 8-14, May.

Fraser N. and P. Dalsgaard (1996) Spoken Dialogue Systems: a European Perspective, In Proceedings of ISSD-96, International Symposium on Spoken Dialogue (ISSD 96), October 2-3, Wyndham Franklin Plaza Hotel, Philadelphia, USA Fujisaki, Hiroya (Ed.), Tokyo, Japan: Acoustical Society of Japan (ASJ)

Fujisaki, Hiroya (1996) Proceedings of 1996 International Symposium on Spoken Dialogue (ISSD 96), October 2-3, Wyndham Franklin Plaza Hotel, Philadelphia, USA. Tokyo, Japan: Acoustical Society of Japan (ASJ).

IEEE Spectrum (1996) Technology 1996: analysis and forecast issue Can we talk? the great hookup debate January NY, USA: IEEE

IEEE Spectrum (1997) Sharing virtual worlds: avatars, agents, and social computing Vol. 34, No. 3, March NY, USA: IEEE

Larsen, L.B. (1996) "Voice controlled home banking - objectives and experiences of the Esprit OVID project" In Proceedings of IVTTA-96, New Jersey, USA September (IEEE 96-TH-8178).

Mc Kevitt, Paul (Ed.) (1995a) Integration of Natural Language and Vision Processing (Volume I): Computational Models and Systems Dordrecht, The Netherlands: Kluwer-Academic Publishers

Mc Kevitt, Paul (Ed.) (1995b) Integration of Natural Language and Vision Processing (Volume II): Intelligent Multimedia Dordrecht, The Netherlands: Kluwer-Academic Publishers

Mc Kevitt, Paul (Ed.) (1996a) Integration of Natural Language and Vision Processing (Volume III): Theory and Grounding Representations Dordrecht, The Netherlands: Kluwer-Academic Publishers

Mc Kevitt, Paul (Ed.) (1996b) Integration of Natural Language and Vision Processing (Volume IV): Recent Advances Dordrecht, The Netherlands: Kluwer-Academic Publishers

Mc Kevitt, Paul (1997) SuperinformationhighwayS In ``Sprog og Multimedier'' (Speech and Multimedia), Proc. of the Sixth Annual Meeting of the Danish Linguistics Society (DAnsk DAtalingvistik Forening, DAlF), Aalborg University, Denmark (May 1996), Tom Brondsted and Inger Lytje (Eds.), 166-183, April 1997, Aalborg, Denmark: Aalborg Universitetsforlag (Aalborg University Press).

Roehle, B. (1997) "Channeling the data flow", IEEE Spectrum, March, 32-38

Rudnicky, Alexander I., Stephen D. Reed, Eric H. Thayer (1996) SpeechWear: a mobile speech system Proceedings of ISSD-96, International Symposium on Spoken Dialogue (ISSD 96), October 2-3, Wyndham Franklin Plaza Hotel, Philadelphia, USA Fujisaki, Hiroya (Ed.), 161-164 Tokyo, Japan: Acoustical Society of Japan (ASJ)

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 7/9 Smailagic, Asim and P. Siewiorek (1996) Matching interface design with user tasks: modalities of interaction with CMU wearable computers IEEE Personal Communications, 14-25, February

Paul Mc Kevitt is a Visiting Professor of Intelligent MultiMedia Computing at Aalborg University in Denmark and a British EPSRC (Engineering and Physical Sciences Research Council) Advanced Fellow in the Department of Computer Science at the University of Sheffield in England. He completed his Ph.D. at the University of Exeter, England, Master's Degree at New Mexico State University, USA in 1988 and Bachelor's Degree from University College Dublin, Ireland, all in Computer Science. His primary research interests are in Natural Language Processing (NLP) including the processing of pragmatics, beliefs and intentions in dialogue. He is also interested in Philosophy, MultiMedia and the general area of Artificial Intelligence.

Paul Dalsgaard is a professor in speech communications at the Center for PersonKommunikation, where his group is involved in Spoken Dialogue Systems, Speech Recognition and Understanding, Speech Analysis and Synthesis and Design of MultiMedia/ MultiModal Human-Computer Interfaces. He was employed at the Academy of electronic Engineering in Copenhagen and Aalborg until 1974, where he joined the then founded Aalborg University. The speech communications group has been involved in a large number of EU-founded projects over the last ten years, which has led to the establishment of demonstrators applying results from ongoing research. Paul Dalsgaard's special interests lie in the area of applying acoustic-phonetic interdisciplinary research in speech processing and in modelling similar acoustic sounds across languages in an attempt to generalise speech processing multilingually. He is a Senior Member of IEEE, is a board member of the European Speech Communication Association (ESCA) and has had a visiting university position in Canada. He obtained his degree in electronic engineering in 1962 from the Technical University of Denmark.

Jorgen Bach Andersen is a professor of radio communications and is presently head of the Center for PersonKommunikation (CPK), a research center dealing with speech, data, and radio communications. He is especially involved with wireless communications, and CPK has a significant involvement in European projects in this area. His personal interests lie in the area of propagation and antennas where he has published widely. He has had visiting posts in the USA, Italy, and New Zealand. Prof. Andersen is a Fellow of the IEEE, and a former vice-president of URSI, the International Scientific Radio Union. obtained his degrees in electrical engineering (M.Sc. 1961, dr. techn. 1971) from the Technical University of Denmark, where he was employed until 1973 when he joined Aalborg University, Denmark. The authors can be contacted at: Center for PersonKommunikation (CPK), Fredrik Bajers Vej 7-A2, Institute of Electronic Systems (IES), Aalborg University, DK-9220, Aalborg , DENMARK. E-mail: {pmck,pd,jba}@cpk.auc.dk

Glossary artificial intelligence: an area of study where people are attempting to build machines to perform tasks which we think of only people as normally doing, such as processing language and visual information and "understanding" it. CHAMELEONs cognitive science: an area of study focussing on modelling the mechanisms by which people process information, language and vision. concurrent structures: structures which enable the modelling of concurrency and synchronisation of multiple inputs. decision-taking systems: systems which make decisions based on data inputs e.g. decisions about diseases based on symptoms, decisions on semantic structures for language/vision processing. enhanced reality: integration of virtual reality with standard vision processing/situations.

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 8/9 expert systems: systems which model a task normally done by experts such as diagnosis of diseases, configuration of computers, etc. Global Information Society: local users and groups of users are becoming linked globally due to wired and wireless SuperinformationhighwayS implemented on the internet hand-held devices: hand-held computers like mobile phones or integrated with mobile phones (e.g. NOKIA 9000 communicator) head-mounted display: a device is worn on the head where the user sees a computer display through a visor Hugin: decision taking software tool using Bayesian statistics HyperMedia environments: platforms which enable users to build distributed MultiMedia systems running across multiple machines and sites. Intelligent MultiMedia: an area which integrates MultiMedia with Artificial Intelligence and focusses on determining the semantics or meaning of various media such as speech and images mainly as input data. IntelliMedia 2000+: an initiative at Aalborg University, Denmark funded by the Faculty of Science and Technology to focus on Intelligent MultiMedia from the point of view of teaching, research and technology transfer network. Java: an object-oriented programming language which enables users to build moving graphical interfaces particularly useful for the internet. lifelong learning: learning for life where one continues to take courses and further one's education even after leaving university microphone arrays: arrays of microphones used to perform sound localisation. minimodule: a course unit at Aalborg University, DK Institute of Electronic Systems which usually consists of 4 hours, 2 hours on teaching and 2 on exercise work in groups mobile computing: computing platforms which are mobile such as the Nokia 9000 Communicator (trademark of Nokia). module: A course at Aalborg University, DK Institute of Electronic Systems which usually consists of 5 MiniModules. natural language processing (NLP): processing natural language (text) by computer. Personal Data Assistants (PDAs): systems which enable the user to effectively manage multiple data incoming from multiple sources; these are usually hand-held devices SuperinformationhighwayS: many people see the internet as a wired and wireless network of highways with information passing through just like vehicles traditional MultiMedia: focusses on the traditional problem of finding the best way to present various media such as speech, text and images as output data on a computer screen. wearable computer: user wears a computer on his/her body. wired framework: communication between computing devices is performed through cables or wires wireless framework: communication between computing devices is by radio or otherwise with no cables or wires

075c6d803bd596615768ed5259387475.doc | 20-04-18 | 13:01 | page 9/9

Recommended publications