TTS(Text-To-Speech) Survey 프로젝트 명 : 시각 장애인용 웨어러블 텍스트 인식 시스템 작품 명 : 노텍, Noʊ Tek (Know Text)

Total Page:16

File Type:pdf, Size:1020Kb

TTS(Text-To-Speech) Survey 프로젝트 명 : 시각 장애인용 웨어러블 텍스트 인식 시스템 작품 명 : 노텍, Noʊ Tek (Know Text) 캡스턴디자인 조사 보고서 TTS(Text-To-Speech) Survey 프로젝트 명 : 시각 장애인용 웨어러블 텍스트 인식 시스템 작품 명 : 노텍, noʊ tek (Know Text) 인천대학교 정보기술대학 임베디드시스템공학과 팀명 : Kim’s Club 지도교수 황광일 교수님 멘토 박수민 (UI. 2009년 졸업, 現 아로마소프트 근무) 임베디드시스템공학과 2401251 김 대 유 임베디드시스템공학과 2401307 김 호 성 임베디드시스템공학과 2501268 김 지 상 임베디드시스템공학과 2501214 김 수 철 목 차 1 서론 .......................................................................................................................................................... 1 1.1 조사 동기 및 의의 ........................................................................................................................................... 1 1.2 조사 개요 .......................................................................................................................................................... 1 1.2 일러 두기 .......................................................................................................................................................... 2 2 여러가지 TTS 소개 ............................................................................................................................... 4 2.1 ERRICSON LABS ........................................................................................................................................... 4 2.2 Free TTS Open Source Project ....................................................................................................................... 5 2.3 eSpeak Open Source Project .......................................................................................................................... 6 2.4 The MBROLA Project .................................................................................................................................... 7 2.5 Microsoft Speech API(SAPI) .......................................................................................................................... 8 2.6 AT&T Labs Natural Voices® TTS ............................................................................................................. 12 2.7 Neo Speech™ VoiceText™ TTS Engine ..................................................................................................... 12 2.8 iSpeech ............................................................................................................................................................ 14 2.9 Power TTS ...................................................................................................................................................... 16 2.10 Edu tool JSK ................................................................................................................................................... 17 2.11 그 밖의 주요 상용 제품 ................................................................................................................................ 18 3 결론 ........................................................................................................................................................ 20 3.1 총평 및 대안 .................................................................................................................................................. 20 3.2 참고 문헌 및 출처 ......................................................................................................................................... 20 2011 Capstone Design. 시각 장애인용 웨어러블 텍스트 인식 시스템, Kim’s Club TTS Survey 1. 서론 1.1 조사 동기 및 의의 이번 캡스턴디자인 작품, ‘시각장애인용 웨어러블 텍스트 인식 시스템’을 구현하기 위해서 글자를 음성으로 출력하는 작업의 선행자료로 TTS Survey를 진행한다. 이 조사를 통해 현재 존재하는 여러 가지 TTS 관련 기술 동향 및 특징을 파악하고 더 나아가 우리 프로젝트에 적합한 TTS 엔진 또는 라이브러리 찾는데 의의를 갖는다. 1.2 조사 개요 조사 방식으로, 인터넷을 주로 참고 하였고 포탈 검색 사이트로 NAVER™, Google™를 이용했다. 조사 기간 : 3월 25일 ~ 4월 14일 중심 주제 : 임베디드 환경에서의 TTS API, 라이브러리 그리고 엔진. 검색 키워드 : text to speech, 무료 tts, tts 라이브러리, tts 소스, tts 이용, tts, tts api, tts source, tts library, tts free, tts sdk, tts android, tts linux TTS를 이용한 기술 동향과 사례 분석으로는 디비피아(http://www.dbpia.co.kr/)를 이용해 논문을 참조했다. 단어 정의와 의미를 파악하는데 역시 인터넷 사전, 텀즈(http://terms.co.kr/) 또는 위키백과(http://www.wikipedia.org/)를 참고했다. 아쉬운 점으로, 우리 학교 도서관을 포함한 TTS 관련 서적 자료는 없었고, 그나마 있는 논문 자료 역시 우리 프로젝트와 관련 없는 연구이기에 생략하였다. 2011 Capstone Design. 시각 장애인용 웨어러블 텍스트 인식 시스템, Kim’s Club TTS Survey 1 1.3 일러 두기 조사를 진행하면서, 검색 키워드간에도 의미의 혼란으로 조사 목적에 맞는 명확한 근거를 마련하기 위해 중심 키워드 정리를 통해 다시 조사를 진행했다. ① TTS (Text To Speech) ? TTS는 도움말 파일이나 웹페이지와 같은 컴퓨터 문서의 내용을 사람이 읽어주는 소 리로 만들어주는 음성합성 프로그램의 한 종류이다. TTS는 시각 장애가 있는 사람들 을 위해 컴퓨터 화면에 나타난 정보를 대신 읽어주는 것도 가능하다. 현재 나와 있는 TTS 프로그램들로는, 음성으로 내용을 읽어주는 이메일과, 자동 응 답 시스템의 음성안내 기능 등이 있다. TTS는 종종 음성인식 프로그램과 함께 사용되 기도 한다. 시판되는 TTS 제품에는 Read Please 2000, Proverbe Speech Unit, 그리고 TextAloud 등 많은 종류가 있다. 루슨트나 AT&T 등에서도 "Text-to-Speech"라고 불리는 자체 제품들을 보유하고 있다. ② API ? API(Application Programming Interface, 응용 프로그램 프로그래밍 인터페이스)는 응용 프로그램에서 사용할 수 있도록 운영 체제나 프로그래밍 언어가 제공하는 기능 을 제어할 수 있도록 만든 인터페이스를 뜻한다. 주로 파일 제어, 윈도우 제어, 화상 처리, 문자 제어 등을 위한 인터페이스를 제공 한다. 한마디로 함수 집합체. 예를 들면, 윈도 API, 마이크로소프트 윈도의 다이렉트엑스(Direct X) : 주로 게임용 그래픽 지원 역할, 단일 유닉스 규격, 자바 API, 스칼라 API, OpenGL, OpenAL, OpenCL 여러가지 등등.. 2011 Capstone Design. 시각 장애인용 웨어러블 텍스트 인식 시스템, Kim’s Club TTS Survey 2 ③ Library ? 라이브러리는 다른 프로그램들과 링크되기 위하여 존재하는, 하나 이상의 서브루틴 이나 함수들이 저장된 파일들의 모음을 말하는데, 함께 링크될 수 있도록 보통 컴파 일된 형태인 목적코드 형태로 존재한다. 라이브러리는 코드 재사용을 위해 조직화된 초창기 방법 중의 하나이며, 많은 다른 프로그램들에서 사용할 수 있도록, 운영체계나 소프트웨어 개발 환경제공자들에 의해 제공되는 경우가 많다. 라이브러리 내에 있는 루틴들은 두루 쓸 수 있는 범용일 수도 있지만, 3차원 애니 메이션 그래픽 등과 같이 특별한 용도의 함수로 설계될 수도 있다. 라이브러리들은 사용자의 프로그램과 링크되어, 실행이 가능한 완전한 프로그램을 이룬다. 이러한 링크는 대개 정적 연결되지만, 시스템에 따라 동적으로 연결(DLL)될 수도 있다. 리눅스인 경우, 정적인 경우.O, 동적인 경우에는 .SO 확장자로 되어있다. ④ 엔진, Engine ? In computer science, a software engine refers to the core of a computer program. Software engines drive the functionality of the program, and are distinct from peripheral aspects of the program, such as look and feel. (위키백과) 이와 같이, TTS 엔진은 결국 프로그램을 의미하며, 우리가 찾으려는 목적인 라이브 러리 또는 API 형태와는 다른 차원의 자료임을 유념할 필요가 있다. 여러 가지 TTS를 소개하기 전에, 문서는 한글과 영문으로 혼용하여 작성하였음을 미리 말한다. 2011 Capstone Design. 시각 장애인용 웨어러블 텍스트 인식 시스템, Kim’s Club TTS Survey 3 2. 여러 가지 TTS 소개 2.1 ERICSSON LABS. ① URL : https://labs.ericsson.com/apis/text-to-speech/ ② Provider : ERICSSON LABS of ERICSSON Co. ③ Screenshot : 그림 1) ERISSON LABS 의 TEXT TO SPEECH 홈페이지 화면 ④ Overview : The text-to-speech enabler provides you with the opportunity to develop speech enabled applications. The API consists of a simple web interface as well as an Android SDK. The Android SDK wraps the REST API for translating text to audio data in the requested format. The REST protocol and the server of Text-To-Speech enabler are independent of the platform used on the client side. The Android SDK is for easing your development on Android platform with our text-to-speech enabler. Your application will be able to convert text to audio data through our text-to-speech enabler using the API. ⑤ Pros. : 무료이고, Android 환경을 지원 하며 The REST Protocol 경우 API KEY를 받으면 웹 환경에서도 사용할 수 있다. 추후 우리 시스템이 클라이언 트 환경에서 구축될 경우를 대비해 검토될 수 있는 자원으로 유용하다. ⑥ Cons. : 사용자 층은 그리 많지 않으며, 따라서 검증이 되었다고 보기가 힘들 다. 2011 Capstone Design. 시각 장애인용 웨어러블 텍스트 인식 시스템, Kim’s Club TTS Survey 4 최신 업데이트가 1999년 11월인 걸로 보아 요즘 TTS 성능보다는 떨어질 것 으로 예상한다. 또한 주기적인 업데이트가 이루어지지도 않은 점이 마이너스 다. 한글판 지원이 안되는 것도 큰 단점. ⑦ Requirements. : The REST protocol – Interet 환경(Ethernet, Wi-fi), apach, PHP, API Key(사 이트에서 받음) Andriod SDK – Android Programming(JAVA), Andriod SDK. 2.2 Free TTS 1.2 Open Source Project ① URL : http://freetts.sourceforge.net/ ② Provider : Open Source Community. 원래는 Sun Microsystems Laboratories Speech Team, Based on CMU's Flite engine. & 부분적으로 JSAPI 1.0 포함 ③ Screenshot : 그림 2) Free TTS Open Source Project 사이트 화면 ④ Overview : FreeTTS is a speech synthesis system written entirely in the JavaTM programming language. 2011 Capstone Design. 시각 장애인용 웨어러블 텍스트 인식 시스템, Kim’s Club TTS Survey 5 It is based upon Flite : a small run-time speech synthesis engine developed at Carnegie Mellon University. ⑤ Pros. : 무료 API. 문서화가 잘 되어있음. Cons. : 영문판만 지원. 전부 JAVA 언어로만 구성. 2.3 eSpeak text to speech Open Source Project ① URL : http://espeak.sourceforge.net/ ② Provider : Open Source Community. ③ Screenshot : 그림 3) eSpeak Open Source Project 사이트 화면 ④ Overview : A command line program (Linux and Windows) to speak text from a file or from stdin. A shared library version for use by other programs. (On Windows this is a DLL). 2011 Capstone Design. 시각 장애인용 웨어러블 텍스트 인식 시스템, Kim’s Club TTS Survey 6 A SAPI51 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. eSpeak has been ported to other platforms, including Solaris and Mac OSX. ⑤ Pros. : 위 Free TTS 그룹 보다는 많은 추천 유저(83명)를 보유. ⑥ Cons. : 유저 인터페이스로 Command line. ⑦ Requirements. : O/S : 32-bit MS Windows (NT/2000/XP), All POSIX (Linux/BSD/UNIX- like OSes) 2.4 The MBROLA Project ① URL : http://tcts.fpms.ac.be/synthesis/ ② Provider : TCTS Lab of the Faculté Polytechnique de Mons (Belgium), ③ Screenshot : 그림 4) The MBROLA Project 홈페이지 화면 ④ Overview : The aim of the MBROLA project, initiated by the TCTS Lab of the Faculté Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many languages as possible, and provide them free for non-commercial applications. The ultimate goal is to boost academic research on speech synthesis, and particularly on prosody generation, known as one of 1 SAPI : Speech Application Programming Interface, an API produced by Microsoft for Speech Recognition and Speech Synthesis. 2011 Capstone Design. 시각 장애인용 웨어러블 텍스트 인식 시스템, Kim’s Club TTS Survey 7 the biggest challenges taken up by Text-To-Speech synthesizers for the years to come. Central to the MBROLA project is MBROLA, a speech synthesizer
Recommended publications
  • Rečové Interaktívne Komunikačné Systémy
    Rečové interaktívne komunikačné systémy Matúš Pleva, Stanislav Ondáš, Jozef Juhár, Ján Staš, Daniel Hládek, Martin Lojka, Peter Viszlay Ing. Matúš Pleva, PhD. Katedra elektroniky a multimediálnych telekomunikácií Fakulta elektrotechniky a informatiky Technická univerzita v Košiciach Letná 9, 04200 Košice [email protected] Táto učebnica vznikla s podporou Ministerstvo školstva, vedy, výskumu a športu SR v rámci projektu KEGA 055TUKE-04/2016. c Košice 2017 Názov: Rečové interaktívne komunikačné systémy Autori: Ing. Matúš Pleva, PhD., Ing. Stanislav Ondáš, PhD., prof. Ing. Jozef Juhár, CSc., Ing. Ján Staš, PhD., Ing. Daniel Hládek, PhD., Ing. Martin Lojka, PhD., Ing. Peter Viszlay, PhD. Vydal: Technická univerzita v Košiciach Vydanie: prvé Všetky práva vyhradené. Rukopis neprešiel jazykovou úpravou. ISBN 978-80-553-2661-0 Obsah Zoznam obrázkov ix Zoznam tabuliek xii 1 Úvod 14 1.1 Rečové dialógové systémy . 16 1.2 Multimodálne interaktívne systémy . 19 1.3 Aplikácie rečových interaktívnych komunikačných systémov . 19 2 Multimodalita a mobilita v interaktívnych systémoch s rečo- vým rozhraním 27 2.1 Multimodalita . 27 2.2 Mobilita . 30 2.3 Rečový dialógový systém pre mobilné zariadenia s podporou multimodality . 31 2.3.1 Univerzálne riešenia pre mobilné terminály . 32 2.3.2 Projekt MOBILTEL . 35 3 Parametrizácia rečových a audio signálov 40 3.1 Predspracovanie . 40 3.1.1 Preemfáza . 40 3.1.2 Segmentácia . 41 3.1.3 Váhovanie oknovou funkciou . 41 3.2 Spracovanie rečového signálu v spektrálnej oblasti . 41 3.2.1 Lineárna predikčná analýza . 43 3.2.2 Percepčná Lineárna Predikčná analýza . 43 3.2.3 RASTA metóda . 43 3.2.4 MVDR analýza .
    [Show full text]
  • Magic Quadrant for Interactive Voice Response Systems and Enterprise Voice Portals, 2008
    Magic Quadrant for Interactive Voice Response Systems and Enterprise Voice Portals, 2008 Gartner RAS Core Research Note G00154201, Steve Cramoysan, Rich Costello, 18 February 2008 RA1 05192008 Organizations are increasingly adopting voice response solutions based on Internet standards and a voice portal architecture. Leading vendors are improving integration between voice self-service and live-agent functions, and reducing the complexity of developing and operating solutionOrganizations are increasingly adopting voice response solutions based on Internet standards and a voice portal architecture. Leading vendors are improving integration between voice self-service and live-agent functions, and reducing the complexity of developing and operating solutions. WHAT YOU NEED TO KNOW Providing self-service functionality is an important strategy that will help call center managers balance costs and quality of service. Leading companies require their customer service operations to provide increased automation and smooth integration from automated self- service to live-agent-handled tasks. They also need tighter integration between channels, and the ability to respond to the fast-changing application needs of the call center business. These business drivers are, in turn, leading to greater use of speech recognition and a shift to standards-based platforms and Web-based architectures for voice portals. They are also increasing the need for improved tools to enable call center staff to reconfigure applications without the help of technical staff. Functional differences between vendor platform products will erode, and vendor consolidation will continue. Differentiation will be based more often on integration in two directions. First, voice response is becoming a part of the call center portfolio, with the routing function and voice response increasingly being sourced and integrated by the same vendor.
    [Show full text]
  • Speech Synthesis
    Gudeta Gebremariam Speech synthesis Developing a web application implementing speech tech- nology Helsinki Metropolia University of Applied Sciences Bachelor of Engineering Information Technology Thesis 7 April, 2016 Abstract Author(s) Gudeta Gebremariam Title Speech synthesis Number of Pages 35 pages + 1 appendices Date 7 April, 2016 Degree Bachelor of Engineering Degree Programme Information Technology Specialisation option Software Engineering Instructor(s) Olli Hämäläinen, Senior Lecturer Speech is a natural media of communication for humans. Text-to-speech (TTS) tech- nology uses a computer to synthesize speech. There are three main techniques of TTS synthesis. These are formant-based, articulatory and concatenative. The application areas of TTS include accessibility, education, entertainment and communication aid in mass transit. A web application was developed to demonstrate the application of speech synthesis technology. Existing speech synthesis engines for the Finnish language were compared and two open source text to speech engines, Festival and Espeak were selected to be used with the web application. The application uses a Linux-based speech server which communicates with client devices with the HTTP-GET protocol. The application development successfully demonstrated the use of speech synthesis in language learning. One of the emerging sectors of speech technologies is the mobile market due to limited input capabilities in mobile devices. Speech technologies are not equally available in all languages. Text in the Oromo language
    [Show full text]
  • Speech Synthesis
    Contents 1 Introduction 3 1.1 Quality of a Speech Synthesizer 3 1.2 The TTS System 3 2 History 4 2.1 Electronic Devices 4 3 Synthesizer Technologies 6 3.1 Waveform/Spectral Coding 6 3.2 Concatenative Synthesis 6 3.2.1 Unit Selection Synthesis 6 3.2.2 Diaphone Synthesis 7 3.2.3 Domain-Specific Synthesis 7 3.3 Formant Synthesis 8 3.4 Articulatory Synthesis 9 3.5 HMM-Based Synthesis 10 3.6 Sine Wave Synthesis 10 4 Challenges 11 4.1 Text Normalization Challenges 11 4.1.1 Homographs 11 4.1.2 Numbers and Abbreviations 11 4.2 Text-to-Phoneme Challenges 11 4.3 Evaluation Challenges 12 5 Speech Synthesis in Operating Systems 13 5.1 Atari 13 5.2 Apple 13 5.3 AmigaOS 13 5.4 Microsoft Windows 13 6 Speech Synthesis Markup Languages 15 7 Applications 16 7.1 Contact Centers 16 7.2 Assistive Technologies 16 1 © Specialty Answering Service. All rights reserved. 7.3 Gaming and Entertainment 16 8 References 17 2 © Specialty Answering Service. All rights reserved. 1 Introduction The word ‘Synthesis’ is defined by the Webster’s Dictionary as ‘the putting together of parts or elements so as to form a whole’. Speech synthesis generally refers to the artificial generation of human voice – either in the form of speech or in other forms such as a song. The computer system used for speech synthesis is known as a speech synthesizer. There are several types of speech synthesizers (both hardware based and software based) with different underlying technologies.
    [Show full text]
  • Show Directory January 30 - February 1, 2006 Hyatt Regency Hotel, Embarcadero Center, San Francisco, CA 051128-01 STW06 SD 1/9/06 7:14 PM Page 3
    051128-01 STW06 SD 1/9/06 7:14 PM Page 2 Gold Sponsors Bridging the Gap Silver Sponsor Bronze Sponsor Media Sponsors Show Directory January 30 - February 1, 2006 Hyatt Regency Hotel, Embarcadero Center, San Francisco, CA 051128-01 STW06 SD 1/9/06 7:14 PM Page 3 Mayor’s Welcome Letter www.speechtek.com Show Directory SpeechTEK West 2006 051128-01 STW06 SD 1/9/06 7:14 PM Page 4 051128-01 STW06 SD 1/9/06 7:14 PM Page 5 Table of CContents Conference Schedule At-A-Glance . .4-5 Welcome Letter . .6 Keynotes . .8 Passport for Prizes . .10 Reception Information . .10 Company Workshops . .12 Industry Focus Workshops . .14-15 Tuesday Conference Schedule . .16-19 Wednesday Conference Schedule . .20-23 SpeechTEK University Schedule . .24-25 Floor Plans . .26, 28-29 Exhibitor Profiles . .30-39 Speaker Profiles . .40-65 Gold Sponsors Silver Sponsor Bronze Sponsor 02 www.speechtek.com Show Directory SpeechTEK West 2006 051128-01 STW06 SD 1/9/06 7:14 PM Page 6 051128-01 STW06 SD 1/9/06 7:14 PM Page 7 Schedule At-A-Glance Sunday, January 29, 2006 4:00 AM -7:00 PM Registration Monday, January 30, 2006 7:00 AM -6:00 PM Registration 7:00 AM -8:30 AM Continental Breakfast 8:00 AM -9:15 AM Opening Keynote: A Customer Panel Moderated by Bill Meisel Featuring Keith Topel (Bank of America) and Jorg 9:45 AM -12:30 PM Industry Focus Workshops Industry Focus Workshops Retail Industry Workshop Manufacturing Workshop Travel & Hospitality Industry Workshop Consumer Electronics Workshop 12:30 PM -1:30 PM Networking Opportunity - Lunch Sponsored by Nuance 1:30 PM -5:00
    [Show full text]
  • Show Directory
    050502-01 SD 7/14/05 9:29 AM Page 2 Show Directory August 1-4, 2005 • New York Marriott Marquis 050502-01 SD 7/14/05 9:29 AM Page 3 050502-01 SD 7/14/05 9:29 AM Page 4 050502-01 SD 7/14/05 9:29 AM Page 5 Serving it all... Mayor’s Letter SpeechTEK 2005 New York Show Directory 050502-01 SD 7/14/05 9:29 AM Page 6 Table of Contents Conference Schedule At-A-Glance . .2-3 Welcome Letter . .4 Industry Insights . .6 Keynotes . .8 What’s “Hot, Cool and Retooled” . .10 Company Workshops . .12 Conference Agenda . .22-49 Exhibitor Profiles . .50-66 Speaker Profiles . .Insert Floor Plans . .Insert Owned and managed by: AmComm, Inc., 2628 Wilhite Court, Suite 100, Lexington, KY 40503 • (859) 278-2223 • FAX (859) 278-7364 050502-01 SD 7/14/05 12:38 PM Page 7 Conference At-A-Glance Agenda Sunday, July 31, 2005 Registration, 5th Floor. 4:00 p.m. - 7:00 p.m. SpeechTEK Preview, Odets, 4th Floor . 5:00 p.m. - 7:00 p.m. Monday, August 1, 2005 Registration, 5th Floor . 7:00 a.m. - 6:00 p.m. Continental Breakfast - SPONSORED BY VoiceObjects, Broadway Ballroom, 6th Floor . 7:00 a.m. - 8:00 a.m. Welcome & Opening Keynote, Broadway Ballroom, 6th Floor . 8:00 a.m. - 9:30 a.m. Lunch - SPONSORED BY ScanSoft, 4th and 7th Floor Lobbies . 11:45 a.m. - 1:30 p.m. Break - SPONSORED BY Apptera, 4th and 7th Floor Lobbies . 3:00 p.m.
    [Show full text]
  • CIC Text to Speech Engines Technical Reference Introduction to CIC Text to Speech Engines
    PureConnect® 2020 R1 Generated: 18-February-2020 CIC Text to Speech Engines Content last updated: 11-June-2019 See Change Log for summary of Technical Reference changes. Abstract This document describes the Text-to-Speech engines supported in CIC and provides installation and configuration information. For the latest version of this document, see the PureConnect Documentation Library at: http://help.genesys.com/cic. For copyright and trademark information, see https://help.genesys.com/cic/desktop/copyright_and_trademark_information.htm. 1 Table of Contents Table of Contents 2 Introduction to CIC Text to Speech Engines 3 Supported TTS engines 3 Supported languages 3 TTS SAPI Engines 4 Microsoft SAPI engine 4 Other SAPI engines 4 SAPI architecture 4 Configure the SAPI TTS voice on the CIC server 4 TTS MRCP Engines 6 Interaction Text to Speech 7 Benefits of Interaction Text to Speech 7 Supported Languages for Interaction Text to Speech 8 Licensing for Interaction Text to Speech 8 Interaction Designer Tools for Interaction Text to Speech 8 Interaction Text to Speech with SAPI or MRCP TTS as default 9 Partially Supported SSML Objects 11 Supported Say-as Text Normalization 11 User-defined Dictionaries 15 Configure the TTS engine in Interaction Administrator 17 Add Voices and Languages for SAPI 19 Change Log 21 2 CIC Text to Speech Engines Technical Reference Introduction to CIC Text to Speech Engines The PureConnect platform uses a Text-to-Speech (TTS) engine to read text to callers over the telephone. For example, a user can take advantage of this system to retrieve an email message over the phone.
    [Show full text]
  • Te Reo Māori Speech Technologies and NCEA Examinations
    Te Reo Māori Speech Technologies and NCEA examinations Completed for NZQA – March 2021 by Paora Mato, Te Taka Keegan, Dayne Perkins-Gordon The University of Waikato 1 Introduction As part of their role, the New Zealand Qualifications Authority (NZQA) are responsible for ensuring the delivery, quality and credibility of New Zealand’s secondary school educational qualifications. NZQA are also charged with managing the New Zealand Qualifications Framework (NQF), administering the secondary school assessment system, independent quality assurance of non-university tertiary education providers and qualifications recognition and standard-setting for some specified unit standards.1 This research is focused on particular aspects of the online delivery and assessment of the National Certificate of Educational Achievement (NCEA), which includes the educational achievement standards normally delivered in New Zealand secondary schools. In an effort to replicate how students already interact with their world, NZQA is supplementing the hand- written NCEA examinations with online digital options. Currently, both paper and digital examinations are able to be answered in either English or te reo Māori, if the exam allows it. Assistive technologies such as Spell Checking, assisted correction and Text-to-Speech are currently available only in the English language. Current Text-To-Speech (TTS) functionalities in NCEA digital assessments are not available in te reo Māori, affecting translated examinations and any examinations that would use te reo Māori text and speech. Before TTS functionalities can be implemented accurately across all appropriate NCEA digital assessments, research needs to be completed to identify current speech technologies and to investigate digital sources of te reo Māori suitable for use in TTS functions.
    [Show full text]
  • Speech Synthesis 1 Speech Synthesis
    Speech synthesis 1 Speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.[1] Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.[2] The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood. An intelligible Stephen Hawking is one of the most famous text-to-speech program allows people with visual impairments or people using speech synthesis to communicate reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s. Overview of text processing A text-to-speech system (or "engine") is composed of two parts:[3] a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words.
    [Show full text]
  • 041109-07 STW SD Final
    GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS Show MEDIA SPONSORS Directory GOLD SPONSORS SILVER SPONSORS February 21-23, 2005 San Francisco Marriott Table of Contents BRONZE SPONSORS Conference Schedule At-A-Glance . .2-3 Welcome Letter . .4 Conference Agenda . .8-32 Floor Plans . .Centerfold MEDIA SPONSORS Exhibitor Profiles . .33-42 Speaker Profiles . .43-68 Owned and managed by: AmComm, Inc. 2628 Wilhite Court / Building Four, Suite 100, Lexington, KY 40503 (859) 278-2223 / FAX (859) 278-7364 SpeechTEK West Show Directory Conference Schedule At-A-Glance Monday, February 21, 2005 Registration - Yerba Buena Grand Assembly . .7:00 am - 6:00 pm Media Room - Nob Hill A - Sponsored by Cisco Systems . .8:00 am - 6:00 pm Speaker Ready Room - Nob Hill A . .8:00 am - 6:00 pm Continental Breakfast* - Salons 1-6 Hallway . .8:00 am - 9:15 am SpeechTEK University* - Salons 1 - 6 . .9:15 am - 12:30 pm Lunch* - Salons 7&8, Yerba Buena Grand Assembly . .12:00 pm - 1:30 pm SpeechTEK University* - Salon 1 - 6 . .1:30 pm - 5:00 pm Welcome Reception in Exhibitor Showcase - Sponsored by Intel . .5:00 pm - 7:00 pm Tuesday, February 22, 2005 Registration - Yerba Buena Grand Assembly . .7:00 am - 6:00 pm Media Room - Nob Hill A - Sponsored by Cisco Systems . .7:00 am - 6:00 pm Speaker Ready Room - Nob Hill A . .7:00 am - 6:00 pm Continental Breakfast* - Salons 1-6 Hallway . .7:30 am - 8:30 am Conference Sessions* - Salon 1 - 6 . .8:30 am - 11:50 am Conference Schedule At-A-Glance cont.
    [Show full text]
  • Introducing Accessibility Features in Authoring
    MASTER THESIS ACADEMIC YEAR 2009 – 2010 Introducing Accessibility Features in Authoring Tools for Creating Accessible Educational Games: <e-Adventure> Author: Elisa Sanz-Troyano Advisors: Baltasar Fernández-Manjón, Ph.D. Pablo Moreno-Ger, Ph.D. June 2010 Research Master in Computer Science in Intelligent Systems Computer Science School Dpt. Software Engineering and Artificial Intelligence Complutense University of Madrid PROYECTO FIN DE MÁSTER CURSO 2009 – 2010 Accesibilidad en Herramientas de Autoría para la Creación de Juegos Educativos Accesibles: <e-Adventure> Autora: Elisa Sanz-Troyano Dirigido por: Baltasar Fernández-Manjón Pablo Moreno-Ger Junio 2010 Máster en Investigación Informática en Sistemas Inteligentes Facultad de Informática Dpt. Ingeniería del Software e Inteligencia Artificial Universidad Complutense de Madrid Introducing Accessibility Features in Authoring Tools for Creating Accessible Educational Games: <e-Adventure> I El/la abajo firmante, matriculado/a en el Máster en Investigación en Informática de la Facultad de Informática, autoriza a la Universidad Complutense de Madrid (UCM) a difundir y utilizar con fines académicos, no comerciales y mencionando expresamente a su autor el presente Trabajo Fin de Máster: “Introducing Accessibility Features in Authoring Tools for Creating Accessible Educational Games: <e-Adventure>”, realizado durante el curso académico 2009-2010 bajo la dirección del Dr. Baltasar Fernández Manjón y del Dr. Pablo Moreno Ger en el Departamento de Ingeniería del Software e Inteligencia Artificial
    [Show full text]
  • Télécom Bretagne
    N° d’ordre : 2011telb0179 Sous le sceau de l’Université européenne de Bretagne Télécom Bretagne En habilitation conjointe avec l’Université de Bretagne-Sud Ecole Doctorale – Sicma VoIP-based Framework for the Integration of Open-source and Proprietary Solutions Thèse de Doctorat Mention: Informatique Présentée par Ahmad Hammoud Département : Informatique Directeur de thèse : Serge Garlatti Soutenue le 11/07/2011 Jury : Patrick BELLOT Professeur, Telecom ParisTech Rapporteur Julien BOURGEOIS Professeur, Université Franche Comté Rapporteur Flavio OQUENDO Professeur, Université Bretagne Sud Examinateur Bouabib EL OUAHIDI Professeur, Université Mohammed V Agdal Examinateur Daniel BOURGET Maître de Conférences, Telecom Bretagne Examinateur Serge GARLATTI Professeur, Telecom Bretagne Examinateur ii Plagiarism Policy Compliance Statement I certify that I have read and understood Telecom Bretagne’s Plagiarism Policy. I understand that failure to comply with this Policy can lead to academic and disciplinary actions against me. This work is substantially my own, and to the extent that any part of this work is not my own I have indicated that by acknowledging its sources. Name: Ahmad Hammoud Signature: Date: 1/07/2011 iii I grant to “Telecom Bretagne” the right to use this work for the University’s own purpose without cost to the University or its students and employees. I further agree that the University may reproduce and provide single copies of the work to the public for the cost of reproduction. iv To my family: Rola, Youssof, Houssam, and Abboodi. To Dr. Daniel BOURGET, for his guidance and supervision. To Prof. Annie GRAVEY, for her orientation and support. v vi Acknowledgements It is a great pleasure for me to acknowledge the assistance, mention the inspirations, and appreciate the contributions of many professionals who have generously provided their help.
    [Show full text]