Context Aware Voice User Interfaces for Workflow Support

Total Page:16

File Type:pdf, Size:1020Kb

Context Aware Voice User Interfaces for Workflow Support Context Aware Voice User Interfaces for Workflow Support Vom Fachbereich Informatik der Technischen Universit¨atDarmstadt genehmigte Dissertation zur Erlangung des akademischen Grades Dr.-Ing. vorgelegt von Dipl.-Inf. Univ. Dirk Schnelle geboren in Celle Tag der Einreichung: 28.06.2007 Tag der Disputation: 27.08.2007 Referenten: Prof. Dr. Max M¨uhlh¨auser,Darmstadt PD Dr.-Ing. habil. Elmar N¨oth,Erlangen Darmstadt 2007 D17 i Ehrenw¨ortliche Erkl¨arung 1 Hiermit erkl¨areich, die vorgelegte Arbeit zur Erlangung des akademischen Grades “Dr.-Ing.” mit dem Titel “Context Aware Voice User Interfaces for Workflow Sup- port” selbst¨andigund ausschließlich unter Verwendung der angegebenen Hilfsmittel erstellt zu haben. Ich habe bisher noch keinen Promotionsversuch unternommen. Darmstadt, den 28.06.2007 Dipl.-Inf. Univ. Dirk Schnelle 1Gem¨aß §9 Abs. 1 der Promotionsordnung der TU Darmstadt ii iii Wissenschaftlicher Werdegang des Verfassers2 10/1989–06/1995 Studium der Informatik an der Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg 07/1993–10/1993 Studienarbeit am Lehrstuhl f¨urMustererkennung Friedrich-Alexander-Universit¨atErlangen-N¨urnberg Automatische Erkennung von prosodischen Phrasengrenzen 06/1993–12/1994 Diplomarbeit am Lehrstuhl f¨urMustererkennung Friedrich-Alexander-Universit¨atErlangen-N¨urnberg Grundfrequenzbestimmung mit k¨unstlichen Neuronalen Netzen 04/2003–09/2006 Wissenschaftlicher Mitarbeiter an der Technischen Universit¨at Darmstadt 2Gem¨aß §20 Abs. 3 der Promotionsordnung der TU Darmstadt iv v Abstract Audio is a significant factor in the design of the human computer interface in ubiquitous computing. The characteristics of the medium allow for a hands-free interaction without the need to switch the focus to a display (eyes-free). Moreover, determining the graphical capabilities of the wearable device to adapt the output to the device is not needed. But audio based interfaces are also challenging, since humans are visually oriented. The ubiquitous computing community recognized the advantages of the audio channel, but the restrictions inherent to the medium are mostly ignored. Authors of such systems who know about these challenges often look for a solution by using additional modalities, preferably visually oriented. This thesis analyses these challenges with respect to the humans cognitive capa- bilities and shows a possible solution using audio design patterns. Users on the move face additional challenges, since wearable devices do not have the computational power and storage capacity of desktop PCs which are required to use speech recognizers and speech synthesizers off the shelf. Thus recognition and synthesis software running on such a device does not have the capabilities of their desktop size counterparts. Streaming technologies can use high-end voice technology on a server at the cost of network dependency. This thesis introduces a new approach combining both worlds. In order to aid the user best, context information is used, location at first place. The context information is obtained from the current environment and from the task that she wants to accomplish. workflow engines are a standard for storing and controlling tasks as processes. Additionally, workflow engines consider the data flow which is a fundamental aspect of each process. Initial approaches to using context information in workflow engines exist but they lack a reusable concept. This thesis introduces such a concept. The process descriptions in workflow engines are modality independent in order to have reusable business logic. Modality dependant specifications of the current activity reside outside the workflow engine and are stored in a meta format that can be used to address multiple modalities. The user interfaces are generated by code generators at run-time. Current concepts consider neither a separation of modality independent business logic nor do they regard data flow. If a user is working on multiple processes in parallel, she needs support in returning into the context of an interrupted process. The challenges inherent to audio are not well suited to aid her resuming the task efficiently. In this thesis, we introduce a first concept of an audio-only support for task resumption. vi vii Zusammenfassung Im Ubiquitous-Computing-Umfeld ist Audio eine wesentliche Komponente bei der Gestaltung der Mensch-Maschine-Schnittstelle. Die Eigenschaften des Medi- ums erm¨oglichen eine Interaktion ohne Zuhilfenahme der H¨ande(hands-free) und ohne den Blick von der Hauptt¨atigkeit abzuwenden (eyes-free). Daneben m¨ussen auch die grafischen M¨oglichkeiten der tragbaren Ger¨atenicht ausgelotet werden, um die aktuelle Ausgabe an diese F¨ahigkeiten anzupassen. Da der Mensch je- doch haupts¨achlich visuell ausgerichtet ist, bringt die Gestaltung von audiobasierten Schnittstellen eine Reihe von Herausforderungen mit sich. Dennoch werden die Vorteile des Audiokanals in den meisten Arbeiten im Bere- ich Ubiquitous Computing als solche erkannt. Die damit einher gehenden Ein- schr¨ankungenwerden jedoch h¨aufigg¨anzlich ignoriert. Autoren, die die technis- chen Beschr¨ankungenaudiobasierter Schnittstellen erkennen, suchen oft eine L¨osung durch das Hinzuziehen weiterer Modalit¨aten,vorzugsweise visueller Art. In dieser Arbeit werden die Herausforderungen unter Ber¨ucksichtigung kogni- tiven F¨ahigkeiten von Benutzern rein audiobasierter Anwendungen systematisch analysiert und M¨oglichkeiten zu ihrer Bew¨altigungmit Hilfe von Design Patterns aufgezeigt. F¨urdie mobile Benutzerin ergibt sich zudem die Problematik, dass mobile Ger¨ate nicht ¨uber das Leistungspotenzial, insbesondere in Bezug auf Rechenleistung und Speicherkapazit¨at,wie Desktop PCs verf¨ugen,die die Verwendung von Standardsoft- ware zur Spracherkennung und Sprachsynthese erst erm¨oglicht. Sprachsoftware, die auf den tragbaren Ger¨atenl¨auft,verf¨ugtdeswegen nicht ¨uber die Leistungsf¨ahigkeit der Standardsoftware. Streamingtechnologien hingegen nutzen h¨oherwertige Sprach- software auf dem Server auf Kosten von Netzabh¨angigkeiten. In dieser Arbeit wird ein neuer Ansatz vorgestellt, der beide Welten miteinander verbindet. Um die Benutzerin effizient unterst¨utzenzu k¨onnen,werden Kontextinformatio- nen, insbesondere der Aufenthaltsort der Benutzerin, genutzt. Diese werden zum einen aus der aktuellen Umgebung der Benutzerin gewonnen und zum anderen aus dem Kontext der zu erledigenden Aufgabe. Eine standardisierte Form zur Spe- icherung und zur Ablaufsteuerung dieser Aufgaben ist in Form von Prozessen einer Workflow Engine gegeben. Zudem haben Workflow Engines den Vorteil, dass sie auf einen durchg¨angigenDatenfluss, der immer zu einem Prozess geh¨ort,Wert legen. Er- ste Ans¨atzezur Nutzung von Kontextinformationen in Workflow Engines existieren, es fehlt jedoch ein wiederverwertbares Konzept. In dieser Arbeit wird ein solches Konzept vorgestellt. Die Beschreibung des Prozesses in der Workflow Engine wird Modalit¨atenunabh¨angiggehalten, um so Widerverwendbarkeit zu erreichen. Eine Modalit¨atenabh¨angigeKonkretisierung des aktuellen Prozessschrittes wird außer- halb der Workflow Engine in einem Metaformat gespeichert. Mit Hilfe eines Code- generators wird hieraus die verwendete Benutzungsschnittstelle zur Laufzeit erzeugt. Bisherige Konzepte haben hier weder eine solche Trennung vorgesehen, noch den Datenfluss aktiv unterst¨utzt.Arbeitet eine Benutzerin an mehreren Prozessen gle- ichzeitig, so muss sie bei der Wiederaufnahme eines Prozesses aktiv unterst¨utzt werden, um effizient weiter arbeiten zu k¨onnen,was durch die speziellen Eigen- schaften des auditiven Mediums erschwert wird. In dieser Arbeit wird erstmalig ein Konzept zur aktiven Unterst¨utzungdieser Wiedereinf¨uhrungin den Aufgabenkon- text pr¨asentiert. viii ix Danksagungen Die vorliegende Arbeit entstand w¨ahrendmeiner T¨atigkeit als wissenschaftlicher Mitarbeiter an der Tbsechnischen Universi¨atDarmstadt im Fachgebiet Telekoop- eration. An dieser Stelle m¨ochte ich mich bei allen bedanken, die mich bei der Erstellung dieser Arbeit geholfen haben und ohne deren breite Unterst¨utzungund Hilfe diese Arbeit nicht zustande gekommen w¨are. An erster Stelle m¨ochte ich mich bei Prof. Dr. Max M¨uhlh¨auserf¨urden Mut bedanken, sich auf das Experiment einzulassen, einen Mitarbeiter aus dem indus- triellen Umfeld als wissenschaftlichen Mitarbeiter zu einer Promotion zu f¨uhren. Weiterhin bedanke ich mich f¨urdie Freiheiten, die er mir gew¨ahrthat, meine Ideen zu entwickeln. Herrn PD Dr. habil. Elmar N¨othdanke ich f¨urseine Bereitschaft das Koreferat zu ¨ubernehmen und die Unterst¨utzungim Bereich Spracherkennung. Einen betr¨achtlichen Beitrag zum Gelingen dieser Arbeit und zum angenehmen Verlauf dieser Dissertation haben die vielen freundlichen Mitarbeiterinnen und Mi- tarbeiter des Fachgebietes Telekooperation geleistet, die mir in guter Erinnerung bleiben werden: Besonders gilt dieser Dank Herrn Prof. Dr. Jussi Kangasharju f¨ur konstruktive Gespr¨ache und wertvolle Unterst¨utzungbei Publikationen, Fernando Lyardet bei der Entwicklung von Ideen im Pattern Umfeld und Erwin Aitenbichler f¨urdie Unterst¨utzungbeim Einsatz von MundoCore. Weiterhin m¨ochte ich mich bei Melanie Hartmann, Tao Wei, Marcus St¨ander, Hidir Akhan, Alexander Hannappel und Falko Radke bedanken, die im Rahmen ihrer Diplomarbeiten und Studienarbeiten wesentliche Bausteine f¨urPrototypen realisiert haben oder Benutzerstudien durchgef¨uhrthaben, die Grundlage vieler Ergebnisse dieser Arbeit waren. Meinem Bruder Detlef Schnelle und Br´ıd Phelan sei f¨urdie stilistischen und orthographischen Korrekturvorschl¨agegedankt.
Recommended publications
  • Voice Interfaces
    VIEW POINT VOICE INTERFACES Abstract A voice-user interface (VUI) makes human interaction with computers possible through a voice/speech platform in order to initiate an automated service or process. This Point of View explores the reasons behind the rise of voice interface, key challenges enterprises face in voice interface adoption and the solution to these. Are We Ready for Voice Interfaces? Let’s get talking! IO showed the new promise of voice offering integrations with their Voice interfaces. Assistants. Since Apple integration with Siri, voice interfaces has significantly Almost all the big players (Google, Apple, As per industry forecasts, over the next progressed. Echo and Google Home Microsoft) have ‘office productivity’ decade, 8 out of every 10 people in the have demonstrated that we do not need applications that are being adopted by world will own a device (a smartphone or a user interface to talk to computers businesses (Microsoft and their Office some kind of assistant) which will support and have opened-up a new channel for Suite already have a big advantage here, voice based conversations in native communication. Recent demos of voice but things like Google Docs and Keynote language. Just imagine the impact! based personal assistance at Google are sneaking in), they have also started Voice Assistant Market USD~7.8 Billion CAGR ~39% Market Size (USD Billion) 2016 2017 2018 2019 2020 2021 2022 2023 The Sudden Interest in Voice Interfaces Although voice technology/assistants Voice Recognition Accuracy Convenience – speaking vs typing have been around in some shape or form Voice Recognition accuracy continues to Humans can speak 150 words per minute for many years, the relentless growth of improve as we now have the capability to vs the typing speed of 40 words per low-cost computational power—and train the models using neural networks minute.
    [Show full text]
  • Compound Word Formation.Pdf
    Snyder, William (in press) Compound word formation. In Jeffrey Lidz, William Snyder, and Joseph Pater (eds.) The Oxford Handbook of Developmental Linguistics . Oxford: Oxford University Press. CHAPTER 6 Compound Word Formation William Snyder Languages differ in the mechanisms they provide for combining existing words into new, “compound” words. This chapter will focus on two major types of compound: synthetic -ER compounds, like English dishwasher (for either a human or a machine that washes dishes), where “-ER” stands for the crosslinguistic counterparts to agentive and instrumental -er in English; and endocentric bare-stem compounds, like English flower book , which could refer to a book about flowers, a book used to store pressed flowers, or many other types of book, as long there is a salient connection to flowers. With both types of compounding we find systematic cross- linguistic variation, and a literature that addresses some of the resulting questions for child language acquisition. In addition to these two varieties of compounding, a few others will be mentioned that look like promising areas for coordinated research on cross-linguistic variation and language acquisition. 6.1 Compounding—A Selective Review 6.1.1 Terminology The first step will be defining some key terms. An unfortunate aspect of the linguistic literature on morphology is a remarkable lack of consistency in what the “basic” terms are taken to mean. Strictly speaking one should begin with the very term “word,” but as Spencer (1991: 453) puts it, “One of the key unresolved questions in morphology is, ‘What is a word?’.” Setting this grander question to one side, a word will be called a “compound” if it is composed of two or more other words, and has approximately the same privileges of occurrence within a sentence as do other word-level members of its syntactic category (N, V, A, or P).
    [Show full text]
  • NLP-5X Product Brief
    NLP-5x Natural Language Processor With Motor, Sensor and Display Control The NLP-5x is Sensory’s new Natural Language Processor targeting consumer electronics. The NLP-5x is designed to offer advanced speech recognition, text-to-speech (TTS), high quality music and speech synthesis and robotic control to cost-sensitive high volume products. Based on a 16-bit DSP, the NLP-5x integrates digital and analog processing blocks and a wide variety of communication interfaces into a single chip solution minimizing the need for external components. The NLP-5x operates in tandem with FluentChip™ firmware - an ultra- compact suite of technologies that enables products with up to 750 seconds of compressed speech, natural language interface grammars, TTS synthesis, Truly Hands-Free™ triggers, multiple speaker dependent and independent vocabularies, high quality stereo music, speaker verification (voice password), robotics firmware and all application code built into the NLP-5x as a single chip solution. The NLP-5x also represents unprecedented flexibility in application hardware designs. Thanks to the highly integrated architecture, the most cost-effective voice user interface (VUI) designs can be built with as few additional parts as a clock crystal, speaker, microphone, and few resistors and capacitors. The same integration provides all the necessary control functions on-chip to enable cost-effective man-machine interfaces (MMI) with sensing technologies, and complex robotic products with motors, displays and interactive intelligence. Features BROAD
    [Show full text]
  • Hunspell – the Free Spelling Checker
    Hunspell – The free spelling checker About Hunspell Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character encoding. Hunspell interfaces: Ispell-like terminal interface using Curses library, Ispell pipe interface, OpenOffice.org UNO module. Main features of Hunspell spell checker and morphological analyzer: - Unicode support (affix rules work only with the first 65535 Unicode characters) - Morphological analysis (in custom item and arrangement style) and stemming - Max. 65535 affix classes and twofold affix stripping (for agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.) - Support complex compoundings (for example, Hungarian and German) - Support language specific features (for example, special casing of Azeri and Turkish dotted i, or German sharp s) - Handle conditional affixes, circumfixes, fogemorphemes, forbidden words, pseudoroots and homonyms. - Free software (LGPL, GPL, MPL tri-license) Usage The src/tools dictionary contains ten executables after compiling (or some of them are in the src/win_api): affixcompress: dictionary generation from large (millions of words) vocabularies analyze: example of spell checking, stemming and morphological analysis chmorph: example of automatic morphological generation and conversion example: example of spell checking and suggestion hunspell: main program for spell checking and others (see manual) hunzip: decompressor of hzip format hzip: compressor of
    [Show full text]
  • Building a Treebank for French
    Building a treebank for French £ £¥ Anne Abeillé£ , Lionel Clément , Alexandra Kinyon ¥ £ TALaNa, Université Paris 7 University of Pennsylvania 75251 Paris cedex 05 Philadelphia FRANCE USA abeille, clement, [email protected] Abstract Very few gold standard annotated corpora are currently available for French. We present an ongoing project to build a reference treebank for French starting with a tagged newspaper corpus of 1 Million words (Abeillé et al., 1998), (Abeillé and Clément, 1999). Similarly to the Penn TreeBank (Marcus et al., 1993), we distinguish an automatic parsing phase followed by a second phase of systematic manual validation and correction. Similarly to the Prague treebank (Hajicova et al., 1998), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error free treebank for French. Similarly to the Negra project (Brants et al., 1999), we annotate both constituents and functional relations. 1. The tagged corpus pronoun (= him ) or a weak clitic pronoun (= to him or to As reported in (Abeillé and Clément, 1999), we present her), plus can either be a negative adverb (= not any more) the general methodology, the automatic tagging phase, the or a simple adverb (= more). Inflectional morphology also human validation phase and the final state of the tagged has to be annotated since morphological endings are impor- corpus. tant for gathering constituants (based on agreement marks) and also because lots of forms in French are ambiguous 1.1. Methodology with respect to mode, person, number or gender. For exam- 1.1.1.
    [Show full text]
  • A Guide to Chatbot Terminology
    Machine Language A GUIDE TO CHATBOT TERMINOLOGY Decision Trees The most basic chatbots are based on tree-structured flowcharts. Their responses follow IF/THEN scripts that are linked to keywords and buttons. Natural Language Processing (NLP) A computer’s ability to detect human speech, recognize patterns in conversation, and turn text into speech. Natural Language Understanding A computer’s ability to determine intent, especially when what is said doesn’t quite match what is meant. This task is much more dicult for computers than Natural Language Processing. Layered Communication Human communication is complex. Consider the intricacies of: • Misused phrases • Intonation • Double meanings • Passive aggression • Poor pronunciation • Regional dialects • Subtle humor • Speech impairments • Non-native speakers • Slang • Syntax Messenger Chatbots Messenger chatbots reside within the messaging applications of larger digital platforms (e.g., Facebook, WhatsApp, Twitter, etc.) and allow businesses to interact with customers on the channels where they spend the most time. Chatbot Design Programs There’s no reason to design a messenger bot from scratch. Chatbot design programs help designers make bots that: • Can be used on multiple channels • (social, web, apps) • Have custom design elements • (response time, contact buttons, • images, audio, etc.) • Collect payments • Track analytics (open rates, user • retention, subscribe/unsubscribe • rates) • Allow for human takeover when • the bot’s capabilities are • surpassed • Integrate with popular digital • platforms (Shopify, Zapier, Google • Site Search, etc.) • Provide customer support when • issues arise Voice User Interface A voice user interface (VUI) allows people to interact with a computer through spoken commands and questions. Conversational User Interface Like a voice user interface, a conversational user interface (CUI) allows people to control a computer with speech, but CUI’s dier in that they emulate the nuances of human conversation.
    [Show full text]
  • Voice User Interface on the Web Human Computer Interaction Fulvio Corno, Luigi De Russis Academic Year 2019/2020 How to Create a VUI on the Web?
    Voice User Interface On The Web Human Computer Interaction Fulvio Corno, Luigi De Russis Academic Year 2019/2020 How to create a VUI on the Web? § Three (main) steps, typically: o Speech Recognition o Text manipulation (e.g., Natural Language Processing) o Speech Synthesis § We are going to start from a simple application to reach a quite complex scenario o by using HTML5, JS, and PHP § Reminder: we are interested in creating an interactive prototype, at the end 2 Human Computer Interaction Weather Web App A VUI for "chatting" about the weather Base implementation at https://github.com/polito-hci-2019/vui-example 3 Human Computer Interaction Speech Recognition and Synthesis § Web Speech API o currently a draft, experimental, unofficial HTML5 API (!) o https://wicg.github.io/speech-api/ § Covers both speech recognition and synthesis o different degrees of support by browsers 4 Human Computer Interaction Web Speech API: Speech Recognition § Accessed via the SpeechRecognition interface o provides the ability to recogniZe voice from an audio input o normally via the device's default speech recognition service § Generally, the interface's constructor is used to create a new SpeechRecognition object § The SpeechGrammar interface can be used to represent a particular set of grammar that your app should recogniZe o Grammar is defined using JSpeech Grammar Format (JSGF) 5 Human Computer Interaction Speech Recognition: A Minimal Example const recognition = new window.SpeechRecognition(); recognition.onresult = (event) => { const speechToText = event.results[0][0].transcript;
    [Show full text]
  • Eindversie-Paper-Rianne-Nieland-2057069
    Talking to Linked Data: Comparing voice interfaces for general­purpose data Master thesis of Information Sciences Rianne Nieland Vrije Universiteit Amsterdam [email protected] ABSTRACT ternet access (Miniwatts Marketing Group, 2012) and People in developing countries cannot access informa- 31.1% is literate (UNESCO, 2010). tion on the Web, because they have no Internet access and are often low literate. A solution could be to pro- A solution to the literacy and Internet access problem vide voice-based access to data on the Web by using the is to provide voice-based access to the Internet by us- GSM network. Related work states that Linked Data ing the GSM network (De Boer et al., 2013). 2G mobile could be a useful input source for such voice interfaces. phones are digital mobile phones that use the GSM net- work (Fendelman, 2014). In Africa the primary mode The aim of this paper is to find an efficient way to make of telecommunication is mobile telephony (UNCTAD, general-purpose data, like Wikipedia information, avail- 2007). able using voice interfaces for GSM. To achieve this, we developed two voice interfaces, one for Wikipedia This paper is about how information on the Web ef- and one for DBpedia, by doing requirements elicitation ficiently can be made available using voice interfaces from literature and developing a voice user interface and for GSM. We developed two voice interfaces, one us- conversion algorithms for Wikipedia and DBpedia con- ing Wikipedia and the other using DBpedia. In this cepts. With user tests the users evaluated the two voice paper we see Wikipedia and DBpedia as two different interfaces, to be able to compare them.
    [Show full text]
  • Voice Assistants and Smart Speakers in Everyday Life and in Education
    Informatics in Education, 2020, Vol. 19, No. 3, 473–490 473 © 2020 Vilnius University, ETH Zürich DOI: 10.15388/infedu.2020.21 Voice Assistants and Smart Speakers in Everyday Life and in Education George TERZOPOULOS, Maya SATRATZEMI Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece Email: [email protected], [email protected] Received: November 2019 Abstract. In recent years, Artificial Intelligence (AI) has shown significant progress and its -po tential is growing. An application area of AI is Natural Language Processing (NLP). Voice as- sistants incorporate AI by using cloud computing and can communicate with the users in natural language. Voice assistants are easy to use and thus there are millions of devices that incorporates them in households nowadays. Most common devices with voice assistants are smart speakers and they have just started to be used in schools and universities. The purpose of this paper is to study how voice assistants and smart speakers are used in everyday life and whether there is potential in order for them to be used for educational purposes. Keywords: artificial intelligence, smart speakers, voice assistants, education. 1. Introduction Emerging technologies like virtual reality, augmented reality and voice interaction are reshaping the way people engage with the world and transforming digital experiences. Voice control is the next evolution of human-machine interaction, thanks to advances in cloud computing, Artificial Intelligence (AI) and the Internet of Things (IoT). In the last years, the heavy use of smartphones led to the appearance of voice assistants such as Apple’s Siri, Google’s Assistant, Microsoft’s Cortana and Amazon’s Alexa.
    [Show full text]
  • Integrating a Voice User Interface Into a Virtual Therapy Platform Yun Liu Lu Wang William R
    Integrating a Voice User Interface into a Virtual Therapy Platform Yun Liu Lu Wang William R. Kearns University of Washington, Seattle, University of Washington, Seattle, University of Washington, Seattle, WA 98195, USA WA 98195, USA WA 98195, USA [email protected] [email protected] [email protected] Linda E Wagner John Raiti Yuntao Wang University of Washington, Seattle, University of Washington, Seattle, Tsinghua University, Beijing 100084, WA 98195, USA WA 98195, USA China & University of Washington, [email protected] [email protected] WA 98195, USA [email protected] Weichao Yuwen University of Washington, Tacoma, WA 98402, USA [email protected] ABSTRACT ACM Reference Format: More than 1 in 5 adults in the U.S. serving as family caregivers are Yun Liu, Lu Wang, William R. Kearns, Linda E Wagner, John Raiti, Yuntao Wang, and Weichao Yuwen. 2021. Integrating a Voice User Interface into a the backbones of the healthcare system. Caregiving activities sig- Virtual Therapy Platform. In CHI Conference on Human Factors in Computing nifcantly afect their physical and mental health, sleep, work, and Systems Extended Abstracts (CHI ’21 Extended Abstracts), May 08–13, 2021, family relationships over extended periods. Many caregivers tend Yokohama, Japan. ACM, New York, NY, USA, 6 pages. https://doi.org/10. to downplay their own health needs and have difculty accessing 1145/3411763.3451595 support. Failure to maintain their own health leads to diminished ability in providing high-quality care to their loved ones. Voice user interfaces (VUIs) hold promise in providing tailored support 1 INTRODUCTION family caregivers need in maintaining their own health, such as Chronic diseases, such as diabetes and cancer, are the leading causes fexible access and handsfree interactions.
    [Show full text]
  • Master Thesis
    Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science MASTER THESIS Creation of Dialog Applications Based on Schema.org Ontology Bc. Tatiana Okonechnikova Supervisor: Ing. Jan Sediv´y,CSc.ˇ January 2019 Author Statement I declare that the presented work was developed independently and that I have listed all sources of information used within it in accordance with the methodical in- structions for observing the ethical principles in the preparation of university theses. Prague, 1st January 2019 ::::::::::::::::::::: signature 3 ZADÁNÍ DIPLOMOVÉ PRÁCE I. OSOBNÍ A STUDIJNÍ ÚDAJE Příjmení: Okonechnikova Jméno: Tatiana Osobní číslo: 393194 Fakulta/ústav: Fakulta elektrotechnická Zadávající katedra/ústav: Katedra počítačů Studijní program: Otevřená informatika Studijní obor: Datové vědy II. ÚDAJE K DIPLOMOVÉ PRÁCI Název diplomové práce: Tvorba dialogových aplikací na základě ontologie schema.org Název diplomové práce anglicky: Creation of dialog applications based on schema.org ontology Pokyny pro vypracování: Review the conversational AI technologies technologies. Focus on a semi-automatic creation of a dialog from structured and semi-structured data. As a source of structured data use the RDF knowledge database (DBpedia, Wikidata). Use a straightforward static web page as a representant of the semi-structured data. Design and implement semi-automatic application creating a simple dialog for intelligent speakers, such as Amazon Alexa or Google Home. Select a topic, (for example movies) find the RDF database or a web portal and build and test simple dialogs conveying user selected information. Test the application with at least three users and report the results. Seznam doporučené literatury: Speech and Language Processing https://web.stanford.edu/~jurafsky/slp3/ (vybrané kapitoly) Jméno a pracoviště vedoucí(ho) diplomové práce: Ing.
    [Show full text]
  • Compounds and Word Trees
    Compounds and word trees LING 481/581 Winter 2011 Organization • Compounds – heads – types of compounds • Word trees Compounding • [root] [root] – machine gun – land line – snail mail – top-heavy – fieldwork • Notice variability in punctuation (to ignore) Compound stress • Green Lake • bluebird • fast lane • Bigfoot • bad boy • old school • hotline • Myspace Compositionality • (Extent to which) meanings of words, phrases determined by – morpheme meaning – structure • Predictability of meaning in compounds – ’roid rage (< 1987; ‘road rage’ < 1988) – football (< 1486; ‘American football’ 1879) – soap opera (< 1939) Head • ‘A word in a syntactic construction or a morpheme in a morphological one that determines the grammatical function or meaning of the construction as a whole. For example, house is the head of the noun phrase the red house...’ Aronoff and Fudeman 2011 Righthand head rule • In English (and most languages), morphological heads are the rightmost non- inflectional morpheme in the word – Lexical category of entire compound = lexical category of rightmost member – Not Spanish (HS example p. 139) • English be- – devil, bedevil Compounding and lexical category noun verb adjective noun machine friend request skin-deep gun foot bail rage quit verb thinktank stir-fry(?) ? adjective high school dry-clean(?) red-hot low-ball ‘the V+N pattern is unproductive and limited to a few lexically listed items, and the N+V pattern is not really productive either.’ HS: 138 Heads of words • Morphosyntactic head: determines lexical category – syntactic distribution • That thinktank is overrated. – morphological characteristics; e.g. inflection • plurality: highschool highschools • tense on V: dry-clean dry-cleaned • case on N – Sahaptin wáyxti- “run, race” wayxtitpamá “pertaining to racing” wayxtitpamá k'úsi ‘race horse’ Máytski=ish á-shapaxwnawiinknik-xa-na wayxtitpamá k'úsi-nan..
    [Show full text]