Windows Vista Media Center Controlled by Speech

Total Page:16

File Type:pdf, Size:1020Kb

Windows Vista Media Center Controlled by Speech UNIVERSIDADE DE LISBOA Faculdade de Ciências Departamento de Informática WINDOWS VISTA MEDIA CENTER CONTROLLED BY SPEECH Mário Miguel Lucas Pires Vaz Henriques Mestrado em Engenharia Informática 2007 UNIVERSIDADE DE LISBOA Faculdade de Ciências Departamento de Informática WINDOWS VISTA MEDIA CENTER CONTROLLED BY SPEECH Mário Miguel Lucas Pires Vaz Henriques Projecto orientado pelo Prof. Dr. Carlos Teixeira e co-orientado pelo Prof. Dr. Miguel Sales Dias Mestrado em Engenharia Informática 2007 Acknowledgments I would like to thank… To Prof. Doutor Miguel Sales Dias for his orientation and supervision and for allowing my presence in the MLDC team; To Prof. Doutor Carlos Teixeira for his orientation and supervision during my project; To Eng. Pedro Silva for his amazing friendship, constant support and help; To Eng. António Calado by his constant help and support; To Sandra da Costa Neto for all the support and help executing the usability tests; To all other colleagues from MLDC; To all Microsoft colleagues that participated in the usability tests. i Contents Index of Figures ..................................................................... iv Acronyms ................................................................................ v 1. Introduction ....................................................................... 1 1.1 Introduction .............................................................................................................................. 1 1.2 Automatic Speech Recognition ................................................................................................ 2 1.3 Microsoft Windows Vista Media Center ................................................................................. 3 1.4 Host institution ......................................................................................................................... 4 1.5 Thesis problems identification ................................................................................................. 5 1.6 Solution approaches ................................................................................................................. 5 1.6.1 Automatic language detection ........................................................................................... 5 1.6.2 Acoustic Echo Cancelation ............................................................................................... 5 1.7 Main thesis goals ...................................................................................................................... 6 1.8 Chapter Organization ............................................................................................................... 7 1.9 Conclusion ............................................................................................................................... 8 2. User and System Requirements .......................................... 9 2.1 Introduction .............................................................................................................................. 9 2.2 Overall user requirements ........................................................................................................ 9 2.3 The Usage Scenario ............................................................................................................... 10 2.4 Detailed system requirements ................................................................................................ 10 2.4.1 Hardware Requirements .................................................................................................. 10 2.4.2 Application requirements ................................................................................................ 12 2.5 Conclusion ............................................................................................................................. 14 3. System Architecture and development ............................. 15 3.1 Technologies used .................................................................................................................. 15 3.1.1 Vista Media Center SDK .................................................................................................... 15 3.1.2 Vista Media Center Add-In ................................................................................................. 16 3.1.3 Media Player SDK and Media Database ............................................................................ 17 3.2 The Hardware Description ..................................................................................................... 17 3.3 The Media Center Add-in architecture .................................................................................. 18 3.3.1 Speech Recognition and Speech Synthesis ..................................................................... 19 3.3.2 Microphone Array ........................................................................................................... 20 3.3.3 Dynamic grammar .......................................................................................................... 21 3.4 Developed work ..................................................................................................................... 23 3.4.1 Multiple recognition engines problem ............................................................................ 23 3.4.2 EVA – Easy Voice Automation ...................................................................................... 23 3.5 Conclusion ............................................................................................................................. 24 4. Usability Evaluation .......................................................... 25 4.1 Introduction ............................................................................................................................ 25 4.2 The interfaces: Remote Control vs Speech ............................................................................ 25 4.2.1 Usability evaluation methodology .................................................................................. 25 4.2.2 Usability experiment ....................................................................................................... 27 4.2.3 Evaluation results and analysis ....................................................................................... 30 4.2.4 Problems in the system ................................................................................................... 31 4.2.5 Subject comments ........................................................................................................... 31 4.2.6 Time analysis .................................................................................................................. 33 4.2.7 Questionnaire and observed results ................................................................................ 33 4.3 Hoolie VS Speech Macros (Media Center Add-In) ............................................................... 36 4.3.1 Other usability evaluation ............................................................................................... 36 ii 4.3.2 Usability evaluation ........................................................................................................ 36 4.3.3 Users Comments ............................................................................................................. 37 4.3.4 Other issues found ........................................................................................................... 37 4.4 Conclusion ............................................................................................................................. 40 5. Conclusion & Future Work ............................................... 41 6. References ........................................................................ 42 7. Annexes ............................................................................ 43 iii Index of Figures Figure 1 – Media Center logo Figure 2 – MLDC logo Figure 3 – Media Center music menu Figure 4 – Media Center simple User Interface Figure 5 – MC remote control Figure 6 – The usage scenario Figure 7 - Media Center box Figure 8 – High definition TV Figure 9 – Linear four Microphone Array Figure 10 – Wireless headset microphone Figure 11 – Standard Desktop Microphone Figure 12 – The Media Center Add-In architecture Figure 13 – Dynamic Grammar Process Diagram Figure 14 - Speech interface Figure 15 – System help Figure 16 – Subjects performing their usability tests Figure 17 – Media Center Main Menu Figure 18 – Command cannot be performed Figure 19 - Options available in the Graphical User Interface Figure 20 - Options unavailable in the Graphical User Interface iv Acronyms AEC – Acoustic Echo Cancellation API – Application Programming Interface ASR – Automatic Speech Recognition EMEA – Europe, Middle-East and Africa ENG - English EVA – Easy Voice Automation HCI - Human-Computer Interface HMM – Hidden Markov Model LAN – Local Area Network MC – Microsoft Media Center MLDC – Microsoft Language Development Center PTG – European Portuguese SAPI – Microsoft Speech API SDK – Software Development Kit SR – Speech Recognition TTS – Text-to-Speech WMP – Windows Media Player v 1. Introduction 1.1 Introduction Automatic Speech recognition (ASR) and synthesis technologies are already present in the daily life of an increased number of people. Due to the potential of speech, as a natural Human- Computer Interface (HCI) modality, a large number of applications are adopting
Recommended publications
  • Through the Looking Glass: Webcam Interception and Protection in Kernel
    VIRUS BULLETIN www.virusbulletin.com Covering the global threat landscape THROUGH THE LOOKING GLASS: and WIA (Windows Image Acquisition), which provides a WEBCAM INTERCEPTION AND still image acquisition API. PROTECTION IN KERNEL MODE ATTACK VECTORS Ronen Slavin & Michael Maltsev Reason Software, USA Let’s pretend for a moment that we’re the bad guys. We have gained control of a victim’s computer and we can run any code on it. We would like to use his camera to get a photo or a video to use for our nefarious purposes. What are our INTRODUCTION options? When we talk about digital privacy, the computer’s webcam The simplest option is just to use one of the user-mode APIs is one of the most relevant components. We all have a tiny mentioned previously. By default, Windows allows every fear that someone might be looking through our computer’s app to access the computer’s camera, with the exception of camera, spying on us and watching our every move [1]. And Store apps on Windows 10. The downside for the attackers is while some of us think this scenario is restricted to the realm that camera access will turn on the indicator LED, giving the of movies, the reality is that malware authors and threat victim an indication that somebody is watching him. actors don’t shy away from incorporating such capabilities A sneakier method is to spy on the victim when he turns on into their malware arsenals [2]. the camera himself. Patrick Wardle described a technique Camera manufacturers protect their customers by incorporating like this for Mac [8], but there’s no reason the principle into their devices an indicator LED that illuminates when can’t be applied to Windows, albeit with a slightly different the camera is in use.
    [Show full text]
  • Windows 7 Operating Guide
    Welcome to Windows 7 1 1 You told us what you wanted. We listened. This Windows® 7 Product Guide highlights the new and improved features that will help deliver the one thing you said you wanted the most: Your PC, simplified. 3 3 Contents INTRODUCTION TO WINDOWS 7 6 DESIGNING WINDOWS 7 8 Market Trends that Inspired Windows 7 9 WINDOWS 7 EDITIONS 10 Windows 7 Starter 11 Windows 7 Home Basic 11 Windows 7 Home Premium 12 Windows 7 Professional 12 Windows 7 Enterprise / Windows 7 Ultimate 13 Windows Anytime Upgrade 14 Microsoft Desktop Optimization Pack 14 Windows 7 Editions Comparison 15 GETTING STARTED WITH WINDOWS 7 16 Upgrading a PC to Windows 7 16 WHAT’S NEW IN WINDOWS 7 20 Top Features for You 20 Top Features for IT Professionals 22 Application and Device Compatibility 23 WINDOWS 7 FOR YOU 24 WINDOWS 7 FOR YOU: SIMPLIFIES EVERYDAY TASKS 28 Simple to Navigate 28 Easier to Find Things 35 Easy to Browse the Web 38 Easy to Connect PCs and Manage Devices 41 Easy to Communicate and Share 47 WINDOWS 7 FOR YOU: WORKS THE WAY YOU WANT 50 Speed, Reliability, and Responsiveness 50 More Secure 55 Compatible with You 62 Better Troubleshooting and Problem Solving 66 WINDOWS 7 FOR YOU: MAKES NEW THINGS POSSIBLE 70 Media the Way You Want It 70 Work Anywhere 81 New Ways to Engage 84 INTRODUCTION TO WINDOWS 7 6 WINDOWS 7 FOR IT PROFESSIONALS 88 DESIGNING WINDOWS 7 8 WINDOWS 7 FOR IT PROFESSIONALS: Market Trends that Inspired Windows 7 9 MAKE PEOPLE PRODUCTIVE ANYWHERE 92 WINDOWS 7 EDITIONS 10 Remove Barriers to Information 92 Windows 7 Starter 11 Access
    [Show full text]
  • General Subtitling FAQ's
    General Subtitling FAQ’s What's the difference between open and closed captions? Open captions are sometimes referred to as ‘burnt-in’ or ‘in-vision’ subtitles. They are generally encoded as a permanent part of the video image. Closed captions is a generic term for subtitles that are viewer-selectable and is generally used for subtitles that are transmitted as a discrete data stream in the VBI then decoded and displayed as text by the TV receiver e.g. Line 21 Closed Captioning system in the USA, and Teletext subtitles, using line 335, in Europe. What's the difference between "live" and "offline" subtitling? Live subtitling is the real-time captioning of live programmes, typically news and sports. This is achieved by using either Speech Recognition systems or Stenographic (Steno) keyboards. Offline subtitling is created by viewing previously recorded material. Typically this produces better results because the subtitle author can ensure they produce An accurate translation/transcription with no spelling mistakes Subtitles are timed to coincide precisely with the dialogue Subtitles are positioned to avoid obscuring other important onscreen features Which Speech Recognition packages do Starfish support? Starfish supports Speech Recognition packages from IBM and Dragon and other suppliers who confirm to the Microsoft Speech API. The choice of the software is usually dictated by the availability of a specific language. Starfish Technologies Ltd FAQ General FAQ’s What's the difference between subtitling and captioning? The use of these terms varies in different parts of the world. Subtitling often refers to the technique of using open captions for language translation of foreign programmes.
    [Show full text]
  • Speech Recognition Technology for Disabilities Education
    J. EDUCATIONAL TECHNOLOGY SYSTEMS, Vol. 33(2) 173-184, 2004-2005 SPEECH RECOGNITION TECHNOLOGY FOR DISABILITIES EDUCATION K. WENDY TANG GILBERT ENG RIDHA KAMOUA WEI CHERN CHU VICTOR SUTAN GUOFENG HOU OMER FAROOQ Stony Brook University, New York ABSTRACT Speech recognition is an alternative to traditional methods of interacting with a computer, such as textual input through a keyboard. An effective system can replace or reduce the reliability on standard keyboard and mouse input. This can especially assist dyslexic students who have problems with character or word use and manipulation in a textual form; and students with physical disabilities that affect their data entry or ability to read, and therefore to check, what they have entered. In this article, we summarize the current state of available speech recognition technologies and describe a student project that integrates speech recognition technologies with Personal Digital Assistants to provide a cost-effective and portable health monitoring system for people with disabilities. We are hopeful that this project may inspire more student-led projects for disabilities education. 1. INTRODUCTION Speech recognition allows “hands-free” control of various electronic devices. It is particularly advantageous to physically disabled persons. Speech recognition can be used in computers to control the system, launch applications, and create “print-ready” dictation and thus provide easy access to persons with hearing or 173 Ó 2004, Baywood Publishing Co., Inc. 174 / TANG ET AL. vision impairments. For example, a hearing impaired person can use a microphone to capture another’s speech and then use speech recognition technologies to convert the speech to text.
    [Show full text]
  • Microsoft Patches Were Evaluated up to and Including CVE-2020-1587
    Honeywell Commercial Security 2700 Blankenbaker Pkwy, Suite 150 Louisville, KY 40299 Phone: 1-502-297-5700 Phone: 1-800-323-4576 Fax: 1-502-666-7021 https://www.security.honeywell.com The purpose of this document is to identify the patches that have been delivered by Microsoft® which have been tested against Pro-Watch. All the below listed patches have been tested against the current shipping version of Pro-Watch with no adverse effects being observed. Microsoft Patches were evaluated up to and including CVE-2020-1587. Patches not listed below are not applicable to a Pro-Watch system. 2020 – Microsoft® Patches Tested with Pro-Watch CVE-2020-1587 Windows Ancillary Function Driver for WinSock Elevation of Privilege Vulnerability CVE-2020-1584 Windows dnsrslvr.dll Elevation of Privilege Vulnerability CVE-2020-1579 Windows Function Discovery SSDP Provider Elevation of Privilege Vulnerability CVE-2020-1578 Windows Kernel Information Disclosure Vulnerability CVE-2020-1577 DirectWrite Information Disclosure Vulnerability CVE-2020-1570 Scripting Engine Memory Corruption Vulnerability CVE-2020-1569 Microsoft Edge Memory Corruption Vulnerability CVE-2020-1568 Microsoft Edge PDF Remote Code Execution Vulnerability CVE-2020-1567 MSHTML Engine Remote Code Execution Vulnerability CVE-2020-1566 Windows Kernel Elevation of Privilege Vulnerability CVE-2020-1565 Windows Elevation of Privilege Vulnerability CVE-2020-1564 Jet Database Engine Remote Code Execution Vulnerability CVE-2020-1562 Microsoft Graphics Components Remote Code Execution Vulnerability
    [Show full text]
  • I Had the Same Problem, Again, Recently and the K-Lite Codec Pack Didn't Work for Me Either
    I had the same problem, again, recently and the k-lite codec pack didn't work for me either. I am running Windows 7, and got my DVD's working with Neuroguide by downloading the codecs from http://www.windows7codecs.com/ I went there and found that they also offer a Windows 8 package at http://www.windows8codecs.com/ You may want to try that. -- Jim Friess Sorry to be late in posting about DVDFab. There is a free version which I find excellent. I copy all DVDs (often double density DVDs are needed), and never have difficulty. The copy only includes the movie so there is no ploughing through warnings and adverts. The copy works equally well with Neuroguide and Brainmaster DVD players. Google “DVDfab free” to find it. Atholl Today at 12:40 PM Hi, Here are a set of instructions that completely fixed my problem. This is compiled from several previous threads, mostly from Nick Dogris. I have included links, and several folks have had their problems resolved using the steps below. Hope it helps: 1. Upgrade from Windows 7 to Windows 8.1 … This link will give you all the details. http://windows.microsoft.com/en-us/windows-8/upgrade-assistant- download-online-faq ($119.99) and Windows 8.1 Pro (199.99). This link shows you a comparison of the features in each…http://windows.microsoft.com/en- us/windows/compare. Note: 8.1 Pro does not come with Windows Media Center, you still have to buy the “8.1 Pro Pack” detailed in #2 below.
    [Show full text]
  • Automated Regression Testing Approach to Expansion and Refinement of Speech Recognition Grammars
    University of Central Florida STARS Electronic Theses and Dissertations, 2004-2019 2008 Automated Regression Testing Approach To Expansion And Refinement Of Speech Recognition Grammars Raul Dookhoo University of Central Florida Part of the Computer Sciences Commons, and the Engineering Commons Find similar works at: https://stars.library.ucf.edu/etd University of Central Florida Libraries http://library.ucf.edu This Masters Thesis (Open Access) is brought to you for free and open access by STARS. It has been accepted for inclusion in Electronic Theses and Dissertations, 2004-2019 by an authorized administrator of STARS. For more information, please contact [email protected]. STARS Citation Dookhoo, Raul, "Automated Regression Testing Approach To Expansion And Refinement Of Speech Recognition Grammars" (2008). Electronic Theses and Dissertations, 2004-2019. 3503. https://stars.library.ucf.edu/etd/3503 AUTOMATED REGRESSION TESTING APPROACH TO EXPANSION AND REFINEMENT OF SPEECH RECOGNITION GRAMMARS by RAUL AVINASH DOOKHOO B.S. University of Guyana, 2004 A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the School of Electrical Engineering and Computer Science in the College of Engineering and Computer Science at the University of Central Florida Orlando, Florida Fall Term 2008 © 2008 Raul Avinash Dookhoo ii ABSTRACT This thesis describes an approach to automated regression testing for speech recognition grammars. A prototype Audio Regression Tester called ART has been developed using Microsoft’s Speech API and C#. ART allows a user to perform any of three tasks: automatically generate a new XML-based grammar file from standardized SQL database entries, record and cross-reference audio files for use by an underlying speech recognition engine, and perform regression tests with the aid of an oracle grammar.
    [Show full text]
  • Speech@Home: an Exploratory Study
    Speech@Home: An Exploratory Study A.J. Brush Paul Johns Abstract Microsoft Research Microsoft Research To understand how people might use a speech dialog 1 Microsoft Way 1 Microsoft Way system in the public areas of their homes, we Redmond, WA 98052 USA Redmond, WA 98052 USA conducted an exploratory field study in six households. [email protected] [email protected] For two weeks each household used a system that logged motion and usage data, recorded speech diary Kori Inkpen Brian Meyers entries and used Experience Sampling Methodology Microsoft Research Microsoft Research (ESM) to prompt participants for additional examples of 1 Microsoft Way 1 Microsoft Way speech commands. The results demonstrated our Redmond, WA 98052 USA Redmond, WA 98052 USA participants’ interest in speech interaction at home, in [email protected] [email protected] particular for web browsing, calendaring and email tasks, although there are still many technical challenges that need to be overcome. More generally, our study suggests the value of using speech to enable a wide range of interactions. Keywords Speech, home technology, diary study, domestic ACM Classification Keywords H.5.2 User Interfaces: Voice I/O. General Terms Human Factors, Experimentation Copyright is held by the author/owner(s). Introduction CHI 2011, May 7–12, 2011, Vancouver, BC, Canada. The potential for using speech in home environments ACM 978-1-4503-0268-5/11/05. has long been recognized. Smart home research has suggested using speech dialog systems for controlling . home infrastructure (e.g. [5, 8, 10, 12]), and a variety We conducted a two week exploratory speech diary of novel home applications use speech interfaces, such study in six households to understand whether or not as cooking support systems [2] and the Nursebot speech interaction was desirable, and, if so, what types personal robotic assistant for the elderly [14].
    [Show full text]
  • Personalized Speech Translation Using Google Speech API and Microsoft Translation API
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 Personalized Speech Translation using Google Speech API and Microsoft Translation API Sagar Nimbalkar1, Tekendra Baghele2, Shaifullah Quraishi3, Sayali Mahalle4, Monali Junghare5 1Student, Dept. of Computer Science and Engineering, Guru Nanak Institute of Technology, Maharashtra, India 2Student, Dept. of Computer Science and Engineering, Guru Nanak Institute of Technology, Maharashtra, India 3Student, Dept. of Computer Science and Engineering, Guru Nanak Institute of Technology, Maharashtra, India 4Student, Dept. of Computer Science and Engineering, Guru Nanak Institute of Technology, Maharashtra, India 5Student, Dept. of Computer Science and Engineering, Guru Nanak Institute of Technology, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Speech translation innovation empowers inherent in globalization.[6] Speech to speech translation speakers of various language to impart. It subsequently is of systems are often used for applications in a specific situation, enormous estimation of mankind as far as science, cross- such as supporting conversations in non-native languages. cultural exchange and worldwide trade. Henceforth we The demand for trans-lingual conversations, triggered by IT proposed a that can connect this language hindrance. In this technologies has boosted research activities on Speech to paper we propose a personalized speech translation system speech technology. Most of them were consist of three using Google Speech API and Microsoft Translation API. modules namely speech 7recognition, machine translation Personalized in the sense that user have the choice to select and text to speech translation. Speech recognition is an the languages which will be used by the system as a input and interdisciplinary sub-field of computational linguistics that a output.
    [Show full text]
  • Release 90 Notes
    ForceWare Graphics Drivers Release 90 Notes Version 91.45 For Windows XP / 2000 Windows XP Media Center Edition NVIDIA Corporation August 2006 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, 3DFX, 3DFX INTERACTIVE, the 3dfx Logo, STB, STB Systems and Design, the STB Logo, the StarBox Logo, NVIDIA nForce, GeForce, NVIDIA Quadro, NVDVD, NVIDIA Personal Cinema, NVIDIA Soundstorm, Vanta,
    [Show full text]
  • License Clerks
    Paralegals and Legal Assistants License Clerks TORQ Analysis of Paralegals and Legal Assistants to License Clerks ANALYSIS INPUT Transfer Title O*NET Filters Paralegals and Legal Importance LeveL: Weight: From Title: 23-2011.00 Abilities: Assistants 50 1 Importance LeveL: Weight: To Title: License Clerks 43-4031.03 Skills: 69 1 Labor Market Importance Level: Weight: Maine Statewide Knowledge: Area: 69 1 TORQ RESULTS Grand TORQ: 93 Ability TORQ Skills TORQ Knowledge TORQ Level Level Level 96 91 91 Gaps To Narrow if Possible Upgrade These Skills Knowledge to Add Ability Level Gap Impt Skill Level Gap Impt Knowledge Level Gap Impt No Critical Gaps Recorded! Speaking 76 13 83 Customer and Personal 88 32 73 Service Transportation 12 2 76 LEVEL and IMPT (IMPORTANCE) refer to the Target License Clerks. GAP refers to level difference between Paralegals and Legal Assistants and License Clerks. ASK ANALYSIS Ability Level Comparison - Abilities with importance scores over 50 Paralegals and Legal Description Assistants License Clerks Importance Oral Comprehension 67 51 75 Oral Expression 66 53 75 Written Comprehension 69 50 72 Written Expression 64 48 65 Speech Recognition 59 41 62 Speech Clarity 53 44 62 Near Vision 71 51 59 Problem Sensitivity 51 42 53 Deductive Reasoning 59 44 50 Inductive Reasoning 64 42 50 Information Ordering 55 44 50 Jul-07-2009 - TORQ Analysis Page 1 of 53. Copyright 2009. Workforce Associates, Inc. Paralegals and Legal Assistants License Clerks Selective Attention 42 39 50 Skill Level Comparison - Abilities with importance
    [Show full text]
  • Automatic Speech Recognition Using Limited Vocabulary: a Survey
    Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.xxxx.DOI Automatic Speech Recognition using limited vocabulary: A survey JEAN LOUIS K. E. FENDJI1, (Member, IEEE), DIANE M. TALA2, BLAISE O. YENKE1, (Member, IEEE), and MARCELLIN ATEMKENG3 1Department of Computer Engineering, University Institute of Technology, University of Ngaoundere, 455 Ngaoundere, Cameroon (e-mail: [email protected], [email protected]) 2Department Mathematics and Computer Science, Faculty of Science, University of Ngaoundere, 454 Ngaoundere, Cameroon (e-mail: [email protected]) 3Department of Mathematics, Rhodes University, Grahamstown 6140, South Africa (e-mail: [email protected]) Corresponding author: Jean Louis K.E. Fendji (e-mail: lfendji@ gmail.com), Marcellin Atemkeng (e-mail: [email protected]). ABSTRACT Automatic Speech Recognition (ASR) is an active field of research due to its huge number of applications and the proliferation of interfaces or computing devices that can support speech processing. But the bulk of applications is based on well-resourced languages that overshadow under-resourced ones. Yet ASR represents an undeniable mean to promote such languages, especially when design human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary.
    [Show full text]