Design, Implementation and Evaluation of a Voice Controlled Information Platform Applied in Ships Inspection

NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY DESIGN, IMPLEMENTATION AND EVALUATION OF A VOICE CONTROLLED INFORMATION PLATFORM APPLIED IN SHIPS INSPECTION by TOR-ØYVIND BJØRKLI DEPARTMENT OF ENGINEERING CYBERNETICS FACULTY OF ELECTRICAL ENGINEERING AND TELECOMMUNICATION THESIS IN NAUTICAL ENGINEERING 0 DNV NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY ABSTRACT This thesis describes the set-up of a speech recognition platform in connection with ship inspection. Ship inspections involve recording of damages that is traditionally done by taking notes on a piece of paper. It is assumed that considerable double work of manual re-entering the data into the ship database can be avoided by introducing a speech recogniser. The thesis explains the challenges and requirements such a system must meet when used on board. Its individual system components are described in detail and discussed with respect to their performance. Various backup solutions in case the speech recogniser fails are presented and considered. A list of selected relevant commercially available products (microphones, speech recogniser and backup solutions) is given including an evaluation of their suitability for their intended use. Based on published literature and own experiences gained from an speech demonstrator having essentially the same interface as the corresponding part as the DNV ships database, it can be concluded that considerable improvement in microphone and speech recognition technology is needed before they are applicable under challenging environments. The thesis ends with a future outlook and some general recommendations about promising solutions. Page i DNV NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY PREFACE The Norwegian University of Science and Technology (NTNU) offers a Nautical engineering Studies programme, required entrance qualification are graduation from a Naval Academy or Maritime College, along with practical maritime experience as an officer. In combination with the studies at the NTNU, a post-graduate thesis is to be done. I embarked on a career at sea, when I commenced my education at Vestfold Maritime College and later joined the Navy’s Officers’ training school. With my background from RnoCG (The Royal Norwegian Coastguard), RnoN (Royal Norwegian Navy) as well as the merchant fleet I’m naturally interested in shipping and safety at sea. The thesis is a result of my maritime experience along with the theoretical basis adapted at the NTNU, results from the semester project “The surveyor in the information age” written spring 1999 has also been a contributing factor to the outcome. When forming the structure of this project, the NTNU project template has been used as the main guideline. The thesis is the result of a literature study as well as information given by former and present surveyors within the DNV. This thesis in nautical engineering has been accomplished under supervision of Professor Tor Onshus at Department of Engineering Cybernetics (ITK1) NTNU. ACKNOWLEDGEMENTS This thesis in nautical engineering has been carried out during the autumn 1999 at the Department for Strategic Research in DNV. During this time, I was privileged to been work in DNV’s Mobile Worker project. That has provided an inspiring research at the junction between research and product testing. I would like to sincerely thank the project members for fruitful collaboration during my stay. The assistance and guidance given by a number of individuals has been essential for the accomplishment of the thesis. I would like to thank my head supervisor Professor Tor Onshus for his guidance during this project. I wish to thank all the researchers at DTP 343, Department for Strategic Research at DNV’s head office at Høvik. And especially Dr.Scient; Dipl. Ing. (FH) Thomas Mestl, for allowing me to carry out this project as a part of the Mobile Worker research program and his support during the thesis. Finally, I would also like to thank Thomas Jacobsen, a fellow student, for his assistance in checking and commenting my work. Oslo Wednesday, 15 December 1999 Tor-Øyvind Bjørkli 1 Department of Engineering Cybernetics Page ii DNV NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY Table of contents page ABSTRACT ................................................................................................................................i PREFACE...................................................................................................................................ii 1 INTRODUCTION........................................................................................................ 1 2 SYSTEM DESIGN....................................................................................................... 4 2.1 Constraints, requirements and potentials 4 2.2 System set up 7 2.3 Back up solutions 8 3 EVALUATION OF COMMERCIAL AVAILABLE SYSTEM COMPONENTS......................................................................................................... 15 3.1 Microphones 15 3.1.1 Physical principles 15 3.1.2 Noise reducing measures 18 3.1.3 Body placement 23 3.1.4 Commercially available microphones 24 3.2 Speech recognition software 28 3.2.1 Principles 31 3.2.2 Recognition enhancing measures 31 3.2.3 Commercially available products 33 3.2.4 General conclusions 38 4 SPEECH DEMONSTRATOR................................................................................... 40 4.1 Set-up 41 4.2 Experiences gained from the speech demonstrator 44 5 RECOMMANDATIONS AND FUTURE OUTLOOK ............................................ 46 6 REFERENCES...........................................................................................................48 Appendices 50 Page iii DNV NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY INTRODUCTION 1 INTRODUCTION Speech recognition itself is nothing new in fact everybody is doing it every day. However a machine that recognises the spoken word is a technological challenge and only recently they have come available. Such dictation systems e.g. for specific professions such as radiology has been around for years and carry five- figure price tags. Less expensive general-purpose systems require discrete speech, which is a tedious method of dictation with a pause after each word. Two years ago Dragon Systems achieved a new milestone with the release of NaturallySpeaking. This first general-purpose speech recognition system allows dictating in a conversational manner. IBM quickly followed with ViaVoice, costing hundreds of dollars less than the first version of NaturallySpeaking. A major factor driving the development of these speech-enabled applications is the steady increase in computing power. Speech recognition systems demand a lot of processing and disk space. The fine line below gives the history of speech recognition system, (PC Magazine, 10 March 1998): Speech Technology Timeline Late 1950s: Speech recognition research begins. 1964: IBM demonstrates Shoebox for spoken digits at New York World's Fair. 1968: The HAL-9000 computer in the movie 2001: A Space Odyssey introduces the world to speech recognition. 1978: Texas Instruments introduces the first single-chip speech synthesiser and the Speak and Spell toy. 1993: IBM launches the first packaged speech recognition product, the IBM Personal Dictation System for OS/2. 1993: Apple ships PlainTalk, a series of speech recognition and speech synthesis extensions for the Macintosh. 1994: Dragon Systems' DragonDictate for Windows 1.0 is the first software-only PC-based dictation product. 1996: IBM introduces MedSpeak/Radiology, the first real-time continuous-speech recognition product. 1996: OS/2 Warp 4 becomes the first operating system to include built-in speech navigation and recognition. June 1997: Dragon ships NaturallySpeaking, the first general-purpose continuous-speech recognition product. August 1997: IBM ships ViaVoice. Fall 1997: Microsoft CEO Bill Gates identifies speech recognition as a key technological advance. Future: The next generation of speech based interfaces will enable people to communicate with computers in the same way they communicate with other people (Scientific American, August 1999) In fact, the dream that machines can understand human speech has been for century’s as Leonhard Euler expressed already in 1761: ”It would be a considerable invention indeed, that of a machine able to mimic our speech, with its sounds and articulations. I think it is not impossible”. As functioning speech, recognition systems appear on the market they are tried out in a large variety of everyday applications, raging from cars, toys personal computers and mobile phones to telephone call centres. It has taken more than four decades for the speech recognition technology to become mature enough for these practical applications. Moreover, some computer industry visionaries have predicted that speech will be the main input modality in the future user interfaces. It is nevertheless important to note that the current speech recognition and application boom is not only due to advanced speech recognition algorithms developed during the last few years, but may be mainly due to huge processing power improvements in current Page 1 DNV NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY INTRODUCTION microprocessors. In fact, the core speech recognition technology, on which current applications mainly rely, was already developed in the late 1980’s and early 1990’s. The trend in IT development goes towards miniaturisation of components. As the components become smaller and smaller they will have a size

Design, Implementation and Evaluation of a Voice Controlled Information Platform Applied in Ships Inspection

Commercial Tools in Speech Synthesis Technology

A Tooi to Support Speech and Non-Speech Audio Feedback Generation in Audio Interfaces

Really Useful

Hardware Requirements System Extensions

D5.1 ANALYSIS: Multi-User, Multimodal & Context Aware Value Added Services

Apple Directions 10/95

Speech Synthesis

Review of Speech Synthesis Technology

Universidad Católica “Nuestra Señora De La Asunción”

Desktop Messaging and My Callpilot Installation and Administration

Stato Dell'arte Dialogo Vocale

Speech Synthesis with Neural Networks