<<

Semantic Authoring for Augmented Communication Using Multilingual Text Generation

Thesis submitted in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

by

Yael Netzer

Submitted to the Senate of

Ben-Gurion University of the Negev

November 2006 Beer-Sheva Semantic Authoring for Blissymbols Augmented Communication Using Multilingual Text Generation

Thesis submitted in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

by

Yael Netzer

Submitted to the Senate of

Ben-Gurion University of the Negev

Approved by the advisor Approved by the Dean of the Kreitman School of Advanced Graduate Studies

November 2006 Beer-Sheva This work was carried out under the supervision of Dr. Michael Elhadad

In the Department of Computer Science Faculty: Natural Sciences Acknowledgment

During the course of life, we meet people who become significant to us and they change life in a meaningful way. I feel lucky that I met my advisor, Michael Elhadad, from whom I learned about Natural Processing and Generation in particular. I thank Dr. Elhadad for his cleverness and kindness. Michael agreed to enter the AAC research field with me and he cooperated with my excitement about it. I admire his ability to translate thoughts into solvable problems, his patience and most of all his belief in me, that kept me working. I thank Yoav Goldberg for the implementation of the Bliss lexicon - no one would have done it better, and Ofer Biller for the development of SAUT. Meetings of the NLP group in Ben-Gurion University were always a joy, especially the discussions on music afterwards with Meni Adler and Oren Hazai. The Department of Computer Science in Ben-Gurion University in Beer-Sheva hosted me for the last 15 years (for all of my studies), so it was one of the most stable things in my life - I especially thank Prof. Abraham Melkman and Prof. Klara Kedem for their sincere concern for me, Dr. Mayer Goldberg for answering my Lisp queries, dear Dr. Tzachi Rosen for the useful discussions and his true friendship, Ami Berler for the coffee breaks, and Valerie Glass for being my friend and assisting me with the formalities of the University. The lab people were always helpful. I thank Prof. Nomi Shir for teaching me linguistics and her loving Attitude, and Dr. Judy Wine for introducing me to the AAC world in her course in Shaare Zedek. The remarkable personality of my late grandmother, Dr. Puah Menczel, and the devotion of my mother Dvorah and my Sister Ruti to the society were the initial motivation for my drifting into the AAC field and I’ grateful for that. I thank my beloved sons Guy, Eitan, and Daniel for being such inspiring language users, and especially Daniel who taught me not to take the acquisition and usage of language for granted.

My sisters Chana and Ruti, my brother Yosef, and especially my parents Dvorah and Ehud were always available for me with love and support and I am grateful.

iii This work is dedicated with love to my parents Ehud and Dvorah

iv Contents

Abstract ix

List of Figures xiii

List of Tables xiv

List of Abbreviations xv

1 Introduction 1 1.1 Background ...... 1 1.2 Motivation ...... 3 1.3 Objectives ...... 4 1.4 Contributions ...... 5

2 Background 7 2.1 The need for communication - AAC ...... 7 2.1.1 What is Augmentative and Alternative Communication? ...... 8 2.1.2 Who Needs AAC – Disability Types ...... 10 2.1.3 A Brief History of AAC ...... 11 2.1.4 AAC Techniques ...... 13 2.2 Speeding up Communication ...... 18 2.2.1 Natural Language Processing and AAC ...... 20 2.2.2 Language Techniques for Assistive Systems ...... 21 2.3 Summary ...... 30

3 Objectives 32 3.1 Generation from Telegraphic Input ...... 33 3.2 Generation as Semantic Authoring ...... 37

4 Usage Scenario 39 4.1 Maintaining a View of Context ...... 41 4.2 Argument Structure Specification ...... 41 4.3 Referring Expressions ...... 42 4.4 Lexical Choice and Syntactic Realization ...... 43 4.5 Summary ...... 45

5 System Architecture 46 5.1 Infrastructure Development ...... 46 5.2 Flow of Information ...... 48 5.2.1 Changing Displays Dynamically ...... 49 5.2.2 Lexical Choice and Syntactic Realization ...... 52 5.3 Summary ...... 54

6 Natural Language Generation and Syntactic Realization 56 6.1 Natural Language Generation ...... 57 6.1.1 The Architecture of an NLG System ...... 57 6.1.2 Multilingual Generation (MLG) ...... 59 6.1.3 AAC as an MLG Application ...... 60 6.2 The Syntactic Realizer ...... 61 6.2.1 Input for Surface realization module ...... 62 6.3 HUGG ...... 62 6.3.1 FUF/SURGE ...... 63 6.3.2 SURGE input of a clause ...... 64 6.3.3 Main Issues in Hebrew Generation ...... 66 6.3.4 Hebrew Clause ...... 67 6.3.5 Subjectless Clauses ...... 67 6.3.6 Existential, equative, possessive, and attributive clauses ...... 68

vi 6.3.7 Morphology ...... 70 6.4 Summary ...... 71

7 Lexical resources 72 7.1 Lexicons in NLG ...... 73 7.1.1 Levin’ verb classes ...... 74 7.1.2 Online Resources ...... 75 7.1.3 Choice of Lexical Sources ...... 79 7.2 Bliss Lexicon ...... 80 7.2.1 Overview on Blissymbolics ...... 80 7.2.2 The Design of the Bliss Lexicon ...... 85 7.2.3 Bliss Lexicon Software Development ...... 87 7.3 Using Lexical Resources for the System Lexical Chooser ...... 88 7.4 Integrating a Large-scale Reusable Lexicon for NLG ...... 90 7.5 Summary ...... 93

8 Communication Boards 94 8.1 The SAUT Semantic Authoring Tool ...... 94 8.1.1 Conceptual Graphs ...... 95 8.1.2 Authoring Tools ...... 97 8.1.3 The SAUT Editor ...... 98 8.2 Bliss Communication Board ...... 102 8.3 Implementing a Communication Board ...... 102 8.4 The Processing Method - Adopting the SAUT Technique ...... 104 8.5 Summary ...... 106

9 Comparison with Existing NLG-AAC Systems 107 9.1 Blisstalk ...... 107 9.2 compansion ...... 108 9.3 Transforming Telegraphic Language to Greek ...... 111 9.4 pvi Intelligent Voice Prothesis ...... 113 9.5 cogeneration ...... 115

vii 9.6 Summary ...... 116

10 Evaluation 118 10.1 Evaluation of NLG systems ...... 119 10.2 Evaluation of AAC systems ...... 120 10.3 Evaluation our System ...... 123 10.4 Evaluating SAUT ...... 124 10.4.1 User Experiment ...... 124 10.4.2 Evaluation ...... 125 10.5 Evaluating Efficiency ...... 128 10.6 Summary ...... 129

11 Contributions and future work 131 11.1 Bliss symbols lexicon ...... 132 11.2 HUGG ...... 132 11.3 Integration of a large-scale, reusable lexicon with a natural language generator . . . 133 11.4 SAUT ...... 133 11.5 Communication Board ...... 134 11.6 Future Work ...... 134

Bibliography 137

viii Abstract

This work presents a new approach to generating messages in an augmentative and alternative communication system, in the context of natural langauge generation.

Background

The field of Augmentative and Alternative Communication (AAC) is concerned with studying methods of communication that can be added to natural communication (speech and ), - pecially when an individual lacks some of the skills to achieve it. An AAC system is defined as an “integrated group of components, including the symbols, aids, strategies, and techniques used by individuals to enhance communication.” [ASHA, 1991]. In the absence of an oral ability, symbols of various types are presented on a display (or a communication board). Communication is conducted by the sequential selection of symbols on the display, until it can be interpreted and pronounced by the partner of the interaction. If technology is present, artificial voice is used. Natural language generation (NLG) is a subfield in Natural Language Processing (NLP). The term NLG refers to the process of generating utterances in a spoken language from another representation of data, based on linguistic resources. For all applications, the generated text can be in various lan- guages, leading to applications of multilingual generation (MLG). Multilingual generation (MLG) aims to generate text in several from one source of information, without using translation.

Objectives

This work presents a novel way to generate full sentences from a sequence of symbols, using NLG

ix techniques and the notion of dynamic displays [Porter, 2000]. In this work, we investigate ways to exploit natural language generation (NLG) techniques for designing communication boards or dynamic displays for AAC users. The purpose of this work is to design an NLG symbols-to-text system for AAC purposes. Previous works on NLG-AAC have adopted a technique of first parsing a telegraphic sequence, then re-generating a full sentence in natural language. The main difficulty in this method is that when parsing a telegraphic sequence of words or symbols, many of the hints that are used to capture the structure of the text, and accordingly the meaning of the utterance, are missing, as well as the lack of many pragmatic clues which makes semantic parsing of telegraphic style even harder. The main question we address in this dissertation is whether generation is possible, not through the process of parsing and regeneration, but through a controlled process of authoring, where each step in the selection of symbols is controlled by the input specification defined for the linguistic realizer. In addition, we address the need to implement a wide coverage lexicon, which will not restrict the system to a small vocabulary. We investigate how a reusable, wide coverage lexicon can be integrated with existing syntactic realizers and within the AAC usage scenario. The third aspect we address is multilingual (English/Hebrew) generation. In a continuation of our previous work ([Dahan-Netzer and Elhadad, 1998a], [Dahan-Netzer and Elhadad, 1998b], [Dahan-Netzer and Elhadad, 1999]) – the aim is to develop a system that can generate text in both Hebrew and English from the same sequence of symbols. We have chosen Bliss symbols as the input language of the communication board. Bliss is an iconic language which is used world-wide by AAC users. Bliss is composed of a set of approxi- mately 200 atomic meaning-carrying symbols. The rest of the symbols (approximately 2500) are a combination of these atomic symbols. This compositionality is a very important characteristics of Bliss as a language and we designed a lexicon which captures the strong connection between the meaning and the form of the symbols. We investigate how the explicit, graphic meaning of words can be used in the process of language generation. Finally, a practical objective of our work is to provide Bliss tools for Hebrew speakers. Most software developed in the world for Bliss (either commercial or experimental) can not be used by Hebrew-speaking users. We have developed a set of tools (lexicon, composition) to work with He- brew Bliss as part of this research.

Contributions

The construction of this project is based on a set of tools, which have been developed separately, then integrated into the AAC system. The underlying process of message generation is based on layered lexical knowledge bases (LKB) and an ontology. Each LKB adds necessary information to the overall lexical knowledge. The main developments of this work are the Bliss Lexicon and an English verbs lexicon. We designed and implemented the Bliss symbols lexicon for both Hebrew and English. The lexicon can be used either as a stand-alone lexicon for reference or as part of an application. The design of the lexicon takes advantage of the unique properties of the language. Technically, only a set of atomic shapes is physically drawn while combined symbols are generated automatically, following the symbol’s entry in a database that was constructed from the Hebrew and English Bliss Dictionaries. The lexicon was implemented in a way that allows searches through either textual (a word), or semantic components (..,”all symbols that contain a wheel”), or by forms (e.g., ”all symbols that contain a circle”). We have integrated a large-scale, reusable verbs lexicon with FUF (Functional Unification Formalism) [Elhadad, 1991] / SURGE (A comprehensive generation grammar of English written in FUF)[Elhadad and Robin, 1996] as a tactical component), so the knowledge is encoded in the lexicon and can be reused, as well as to automate to some extent the development of the lexical realization component in a generation application. The integration of the lexicon with FUF/SURGE also brings other benefits to message genera- tion, including the possibility of accepting a semantic input at the level of WordNet synsets, the production of lexical and syntactic paraphrases, the prevention of non-grammatical outputs, reuse across applications, and wide coverage. An additional component of the system’s infrastructure is the syntactic realizer. HUGG (Hebrew Unification Grammar for Hebrew) is a syntactic realizer (SR) for Hebrew generation, implemented with FUF. HUGG inputs are designed to be as similar as possible to the inputs of the English SR SURGE. The core of the processing machinery of the AAC message generation system is based on SAUT (Semantic AUThering Tool) [Biller, 2005] [Biller et al., 2005] – an authoring system for logical forms

xi encoded as conceptual graphs (CG). The system belongs to the family of WYSIWYM (What You See Is What You Mean) text generation systems: logical forms are entered interactively and the corresponding linguistic realization of the expressions is generated in several languages. The system maintains a model of the discourse context corresponding to the authored documents. The overall purpose of this work is the development of an AAC system, namely a dynamic (virtual) communication board for Bliss users. The communication board we designed is inspired by both the semantic authoring technique as implemented in SAUT as well as from dynamic displays as studied by [Burkhart, 2005]. The symbols displayed on the screen at each step depend on the context of the previously entered symbols. For example, if the previous symbol denotes a verb which requires an instrumental theme, only symbols that can function as instruments are presented on the current display. The general context of each utterance or conversation can be determined by the user, therefore narrowing the diversity of symbols displayed. Finally, we review evaluation strategies for both NLG and AAC systems. Both fields struggle with similar issues to define evaluation metrics that can be reproduced and can drive system improvement in a predictable manner. We present two aspects of the evaluation of the AAC system we developed: we first performed a user evaluation of the coverage, efficiency, and usability of the semantic authoring approach, as implemented in the SAUT system. We established a detailed evaluation scenario of the potential rate of data entry of the system by analyzing a small corpus of Bliss sentences.

Keywords: Natural Language Generation, Augmentative and Alternative Communication, Lexical resources, Blissymbols Language, Semantic Authoring, Dynamic Display.

xii List of Figures

2.1 PCS board ...... 16 2.2 Rebus board ...... 17 2.3 Comparison of symbols of concrete objects [CallCentre, 1998] ...... 18 2.4 Comparison of symbols of abstract concepts [CallCentre, 1998] ...... 19 2.5 Minspeak ° changes in meaning of apple ...... 20

3.1 Dynaxov °c sentence starters ...... 37

4.1 VerbNet entry for Play ...... 42 4.2 Bliss sequences for to be yeS verbs ...... 44

5.1 General architecture and flow of information ...... 47 5.2 System Architecture ...... 49 5.3 Ontology fragment for the concepts: pan, breakfast, girl, egg, serve . . . . . 50 5.4 Ontology fragment of relations ...... 51 5.5 The display after the choice of the to play symbol ...... 52

6.1 A fragment of Hspell database for the word celev (dog) ...... 70

7.1 Wordnet entry for the word girl ...... 77 7.2 VerbNet entries for make - build-26.1 and watch ...... 78 7.3 FrameNet entry of the verb abstain ...... 79 7.4 ComLex entry of the verb abstain ...... 79 7.5 Hebrew and Bliss Medical Words ...... 80 7.6 Example for Bliss symbol types ...... 83

xiii 7.7 Usages of Pointers for Meaning Selection ...... 84 7.8 Example: mind, minds, brain, thoughtful, think, thought, will think...... 85 7.9 Semantic modifiers: much, intensifier, opposite...... 86 7.10 Hebrew vs. English Representation of Symbols ...... 87 7.11 Hierarchy of Bliss Objects ...... 88 7.12 A snapshot of the Bliss Lexicon Web Application ...... 89 7.13 Lexicon entry for the verb appear ...... 92 7.14 VerbNet make - build-26.1 ...... 92

8.1 Architecture of the SAUT System ...... 95 8.2 Linear representation of a Conceptual Graph ...... 96 8.3 Snapshot of editing state in the SAUT system ...... 98

9.1 The preferred semantic structure for the input Apple eat John ...... 110

10.1 Output of LAM [Hill et al., 2001] ...... 123

xiv List of Tables

10.1 Learning time measures of recipe writing in SAUT ...... 125 10.2 Translation vs. Semantic Authoring time...... 126 10.3 Accuracy percentage of four documents written in SAUT ...... 127 10.4 Error analysis in subjects’ generated documents...... 127 10.5 Sentences vs. SAUT representation, number of words ...... 130

xv List of Abbreviations

AAC Augmentative and Alternative Communication

CG Conceptual Graphs

FD Functional Description

EVCA English Verb Classes and Alternations

FUF Functional Unification Formalism

HUGG Hebrew Unification Grammar for Generation

LKB Lexical Knowledge Bases

MLG Multilingual Generation

NLG Natural Language Generation

NLP Natural Language Processing

PCS Picture Communication Symbols

SAUT Semantic AUthoring Tool

SR Surface (Syntactic) Realizer

SURGE Surface Realizer for Generation of English

xvi Chapter 1

Introduction

The greatest problem in communication is the illusion that it has been accomplished. - George Bernard Shaw

This work presents a new approach to generating messages in an augmentative and alternative communication system, in the context of natural langauge generation.

1.1 Background

The human method of acquiring language is a complicated process that may last a life time. Usage of language is an ever developing matter as well. For the great majority of human beings, communication via natural language is an obvious act, but this is not the case for everyone. People who suffer from severe language impairments lack the ability to express themselves and cannot achieve various forms of communication. The field of Augmentative and Alternative Communication (AAC) is concerned with studying methods of communication that can be added to natural communication (speech and writing), especially when an individual lacks some of the skills to achieve it. An AAC system is defined as an “integrated group of components, including the symbols, aids, strategies, and techniques used by individuals to enhance communication.” [ASHA, 1991] Research of this area includes psychology, medicine, speech therapy, engineering, and education. AAC devices refer to either manual or automated tools, and include all devices that, in some way, support the process of production or understanding of spoken or written utterances [Langer and

1 Newell, 1997]: message generation devices, text simplification devices, TV subtitle generators, interpretation and reading aids for vision-impaired people. An aided communication system is the actual device that a person uses to communicate with his environment (a person may use more than one such system at different times). In the absence of an verbal ability, symbols of various types are presented on a display (or a communication board). Communication is conducted by the sequential selection of symbols on the display, which are then interpreted by the partner in the interaction. If synthesis-speech technology is present, artificial voice is used. Natural Language Processing (NLP) is the field of computer science that studies how linguistic knowledge can help develop text-based applications, such as machine translation, text summariza- tion, expert systems, document production. Natural language generation (NLG) is a subfield in Natural Language Processing (NLP), a field lying at the intersection of Computer Science, Linguistics, and Cognitive Sciences. The term NLG refers to the process of generating utterances in a spoken language from another representation of data, based on linguistic resources.

NLG techniques are finding a growing range of applications. Systems where vast volumes of data require expert interpretation can exploit NLG so that the data is summarized and explained in spoken language. The use of NLG is, in general, (1) to make data understandable (expert systems, reports) and (2) to produce routine documents that must be updated often.

In some applications, NLG fulfils part of the overall requirement and the NLG techniques are combined with other NLP aspects: Machine Translation (MT) ([Dorr et al., 1998] [Temizsoy and Cicekli, 1998]) and automatic summarization ([Barzilay et al., 1999], [Hovy and Lin, 1998]). For all applications, the generated text can be in various languages, leading to application of multilingual generation (MLG). Multilingual generation (MLG) aims to generate text in several languages from one source of information, without using translation. Both NLP and AAC relate to the use of language from two very different points of view, but have very much in common. Both fields of research and assessment search a path to produce language in a non-natural way, as well as making the text easier to understand when the ability to understand is damaged or absent.

2 Using NLP techniques for AAC purposes (NLP-AAC in short) as a field of research has devel- oped in the last decade, and a few dedicated workshops were organized (for instance, at the ACL conference, 1997). A special issue on the subject was published by the Journal of Natural Language Engineering in 1998. Several systems that integrate NLG techniques in aided communication sys- tems have been developed in the past ([McCoy et al., 1998] [Vaillant and Checler, 1995] [Karberis and Kouroupetroglou, 2002] [Copestake, 1997]). This work presents a novel way to generate full sentences from a sequence of symbols, using NLG techniques and the notion of dynamic displays [Porter, 2000].

1.2 Motivation

In this work, we investigate ways to exploit natural language generation (NLG) techniques for designing communication boards or dynamic displays for AAC users. The scenario we consider is the following: an AAC user selects a sequence of symbols; his partner then reads out the sequence and utters a natural language sentence. We interpret this scenario as a typical natural language generation process: content planning is performed by the AAC user and content is expressed by the sequence of selected symbols; linguistic realization is performed by the interlocutor. The purpose of this work is to design an NLG symbols-to-text system for AAC purposes. In the design of an AAC system, the main motivation is to provide the user with a communication tool that enables a high rate of communication with as wide expressive power as possible. Another way to consider the task we address, is to compare it to the task of expanding telegraphic style input to fully articulated language – with function words (determiners, preposition) and proper handling of morphology (such as inflections, plural markers, etc.). In this way, NLG techniques save the user avoidable keystrokes and produce more fluent output. Moreover, the process of the message generation is incremental, i.e. a partial linguistic representation is displayed to the user after each choice of a symbol is made. This incremental method, along with the immediate feedback can be used to aid the user not only to generate grammatical utterances, but also in the process of planning a message that will be well understood to his companion and will represent correctly the communication goal 1.

1I thank the anonymous reviewer for this remark

3 Previous works on NLG-AAC systems ([Vaillant, 1997], [Copestake, 1997], [McCoy et al., 1998] for example) have adopted a technique of first parsing a telegraphic sequence, then re-generating a full sentence in natural language. The initial message is of a telegraphic nature because it lacks the main cues of morphological and syntactic structure that exist in natural language. As a consequence, reconstruction of the intended meaning is made difficult. Deep semantic and lexical knowledge sources are required to recover the meaning. Such resources are not readily available in general and, as a result, systems with only a reduced vocabulary have been demonstrated. The main difficulty in this method is that when parsing a telegraphic sequence of words or symbols, many of the hints that are used to capture the structure of the text and accordingly the meaning of the utterance are missing. Moreover, as an AAC device is used not only for typing text, but also for real-time conversations, the interpretation of the utterance relies to a large extent on pragmatics – such as the time of a mentioned event, omitting syntactic roles and making reference to the immediate environment. The need to recover such pragmatic clues makes the semantic parsing of telegraphic style even harder.

1.3 Objectives

The main question we address in this dissertation is whether generation is possible, not through the process of parsing and regeneration, but through a controlled process of authoring, where each step in the selection of symbols is controlled by the input specification defined for the linguistic realizer. In addition, we address the need to implement a wide coverage lexicon, which will not restrict the system to a small vocabulary. We investigate how a reusable, wide coverage lexicon can be integrated with existing syntactic realizers and within the AAC usage scenario. The third aspect we address is multilingual (English/Hebrew) generation. In a continuation of our previous work ([Dahan-Netzer and Elhadad, 1998a], [Dahan-Netzer and Elhadad, 1998b], [Dahan-Netzer and Elhadad, 1999]), the aim is to develop a system that can generate text in both Hebrew and English from the same sequence of symbols. We have chosen Bliss symbols as the input language of the communication board. Bliss is an iconic language which is used world-wide by AAC users. Bliss is composed of a set of approxi- mately 200 atomic meaning-carrying symbols. The rest of the symbols (approximately 2500) are a

4 combination of these atomic symbols. This compositionality is a very important characteristic of Bliss as a language and we designed a lexicon which captures the strong connection between the meaning and the form of the symbols. We investigate how the explicit, graphic meaning of words can be used in the process of language generation. Finally, a practical objective of our work is to provide Bliss tools for Hebrew speakers. When Bliss was adopted for use in Israel, a decision was made to write Bliss symbols from right to left as in the Hebrew , and consequently to invert the display of the symbols (or at least of most of them). As a result, most software developed in the world for Bliss (either commercial or experimental) could not be used by Hebrew-speaking users. We have developed a set of tools (lexicon, composition) to work with Hebrew Bliss as part of this research.

1.4 Contributions

The construction of this project is based on a set of tools which have been developed separately, then integrated into the AAC system. The underlying process of message generation is based on layered lexical knowledge bases (LKB) and an ontology. Each LKB adds necessary information. The main developments of this work are the Bliss Lexicon and an English verbs lexicon. We designed and implemented the Bliss symbols lexicon for both Hebrew and English. The lexicon can be used either as a stand-alone lexicon for reference or as part of an application. The design of the lexicon takes advantage of the unique properties of the language. Technically, only a set of atomic shapes is physically drawn while combined symbols are generated automatically, following the symbol’s entry in a database that was constructed from the Hebrew and English Bliss Dictionaries. The lexicon was implemented in a way that allows searches through either textual (a word), or semantic components (e.g.,”all symbols that contain a wheel”), or by forms (e.g., ”all symbols that contain a circle”). We have integrated a large-scale, reusable verbs lexicon with FUF/SURGE [Elhadad, 1991] [Elhadad and Robin, 1996] as a tactical component, so the knowledge encoded in the lexicon can be reused, as well as to automate to some extent the development of the lexical realization component in a generation application. The integration of the lexicon with FUF/SURGE also brings other benefits to generation,

5 including the possibility of accepting a semantic input at the level of WordNet synsets, the production of lexical and syntactic paraphrases, the prevention of non-grammatical output, reuse across applications, and wide coverage. An additional component of the system’s infrastructure is the syntactic realizer. HUGG is a syntactic realizer (SR) for Hebrew generation, implemented with FUF. HUGG inputs are designed to be as similar as possible to the inputs of the English SR SURGE. The core of the processing machinery of the AAC message generation system is based on SAUT [Biller, 2005] [Biller et al., 2005] – an authoring system for logical forms encoded as conceptual graphs (CG). The system belongs to the family of WYSIWYM (What You See Is What You Mean) text generation systems: logical forms are entered interactively and the corresponding linguistic realization of the expressions is generated in several languages. The system maintains a model of the discourse context corresponding to the authored documents. The overall purpose of this work is the development of an AAC system, namely a dynamic (virtual) communication board for Bliss users. The communication board we designed is inspired by both the semantic authoring technique as implemented in SAUT as well as from dynamic displays as studied by [Burkhart, 2005]. The symbols displayed on the screen at each step depend on the context of the previously entered symbols. For example, if the previous symbol denotes a verb which requires an instrumental theme, only symbols that can function as instruments are presented on the current display. The general context of each utterance or conversation can be determined by the user, therefore narrowing the diversity of symbols displayed. Finally, we review evaluation strategies for both NLG and AAC systems. Both fields struggle with similar issues to define evaluation metrics that can be reproduced and can drive system improvement in a predictable manner. We present two aspects of the evaluation of the AAC system we developed: we first performed a user evaluation of the coverage, efficiency and usability of the semantic authoring approach, as implemented in the SAUT system. We established a detailed evaluation scenario of the potential rate of data entry of the system by analyzing a small corpus of Bliss sentences.

6 Chapter 2

Background

“Communication is the essence of life”

2.1 The need for communication - AAC

For most people, communicating through language is an obvious act, which is done thoughtlessly and naturally. Communication is required for various , such as interaction, expressing needs and desires, expressing knowledge, and inventing new ideas. Although communication between human beings can be achieved with body gestures, facial expressions, or written messages – it is mostly achieved with spoken words. However, this is not the case for everyone. Estimations of the percentage of people with severe language impairments across the world show that approximately one percent of the world population suffers from severe communication impairment. 1[Beukelman and Mirenda, 1998][pp. 4-5] People with severe language impairments cannot use language in a natural way, and must use additional augmentative techniques in order to communicate. In some cases, people use facial gestures or sign language (as do deaf people). In other cases, additional devices are required, either hi-tech or low-tech (non-electronic devices) such as communication boards. The study of augmentative and alternative communication and the use of communication boards is a relatively new field of research and practice, a field that involves speech pathologists, physical

1Figures vary between 0.8-1.2% in the USA down to 0.12% in Australia. Variations may be due to different definitions of severe language impairments (the American figures probably include deaf people who are not considered AAC users, and Australian figures exclude adults with acquired language disabilities such as aphasics)

7 and occupational therapists, assistive technology engineers, teachers, psychologists, medical experts, and the social services.

2.1.1 What is Augmentative and Alternative Communication?

Augmentative and alternative communication (AAC) is concerned with studying methods of com- munication that can be added to natural communication (speech and writing), especially when an individual lacks some of the skills to achieve it. An AAC system is defined as an “integrated group of components, including the symbols, aids, strategies, and techniques used by individuals to enhance communication” [ASHA, 1991]. The objectives of an AAC system have been specified in a variety of manners. [Beukelman and Mirenda, 1998] analyze communication with respect to the participants’ goals, the interaction con- tent, scope and rate, and the participants’ tolerance for communication breakdown. For example, they categorize the participants goals in these categories:

• Express one’s needs and/or wants

• Transfer information

• Achieve social closeness

• Meet social etiquette

When designing an AAC system, these aspects provide a way to evaluate the effectiveness of the system and define its scope (what is the goal of the interaction, how fast it should be produced, who the partner is, etc.). [Porter, 2000] lists the requirements that an AAC intervention (i.e., the use of an AAC system in a specific interaction setting) needs to fulfill to meet the communication needs:

• Intelligibility – the AAC system provides access to sufficient vocabulary to enable commu- nication and to stimulate the further development of the interaction.

• Specificity – the AAC system provides access to vocabulary related to the current context.

• Efficiency – the AAC system provides easy and fast access to the vocabulary, overcoming specific motor or physical difficulty.

8 • Autonomy – the AAC system provides the possibility to initiate an interaction with minimal aid from a peer.

• Social value – the AAC system enables communication in different environments and with different people.

[McCoy et al., 2001] refine these criteria to design an evaluation grid for AAC systems. When comparing AAC systems, or comparing an AAC system with a non-assisted environment, the following measures can quantify the quality of the system:

Intelligibility -

• better ability to express oneself

• more fluent (natural) conversation

• more natural interactions

Efficiency -

• faster communication

• fewer keystrokes

Social value -

• longer turns

• perception of communicative competence

[McCoy et al., 2001] also list the long-range consequences of the use of an AAC device on the language impaired participant:

• Development of interaction skills

• Development of literacy skills

• Development of turn-taking skills

• Socialization

• Personal opportunities because of improved communication abilities

9 • Communicative competence

Different AAC techniques target different objectives. As a consequence, the environment of a user must be engineered with a combination of devices, each enabling different forms of communi- cation: social chatting requires a pool of socializing utterances with fast access; writing devices are slower to operate but provide more expressive language. Different devices are used during bathing time or at night time, when in hospital, or when shopping [McCoy et al., 2001].

2.1.2 Who Needs AAC – Disability Types

There are various physical and/or cognitive reasons for the disabilities and impairments of language and speech. [Beukelman and Mirenda, 1998] define a dichotomy between developmental disabilities and acquired physical disabilities: Developmental Disabilities

Cerebral Palsy (CP) is a developmental neuromotor disorder that is a result of a nonprogressive abnormality of brain development. CP is most commonly spastic, i.e., increased muscle tone causes certain degrees of dysfunction of the limbs, but may be otherwise characterized with abrupt, involuntary movements of the extremities, or be rigid or atonic. 60-70% of children with CP have some degree of mental retardation. Approximately half of them suffer from visual impairments; some from hearing loss and seizures. Speech is affected too: dysarthria (the inability to use speech muscles) is very common, and other speech problems exist that are caused by muscle dysfunction. Some speech disorders are connected to the mental retardation, hearing state, and acquired helplessness.

Mental retardation is characterized with significantly low intellectual skills accompanied with possibly limited communication, self-care, and other social skills. It may be defined by the level of support a person needs [Beukelman and Mirenda, 1998][p. 250], assuming that ap- propriate support can impact the abilities of individuals to live in a community. It is very likely to be accompanied by other disabilities.

Developmental apraxia of speech (or childhood dyspraxia): children with articulation errors and difficulty with volitional or imitative production of speech may also suffer from slowness in motor skill development or mental retardation, probably due to neurological impairments

10 Autism and Pervasive Developmental Disorders (PDD). The main three characteristics of autism are impairments in social interaction, in communication, and restricted and stereo- typical patterns of behaviors. The range of communication skills of autists is wide: from non-communicating skills to good ones (Asperger’s syndrome).

Acquired physical disabilities

Amyotrophic Lateral Sclerosis (ALS) is a progressive degenerative disease of unknown etiol- ogy involving the motor neurons of the brain and the spinal cord. As the disease progresses, patients completely lose their ability to speak.

Other brain diseases such as Multiple Sclerosis and Guillain-Barr´eSyndrome, may cause dysarthria. In Parkinson’s disease, speech disorder first affects the voice and intonation, but its intelligi- bility may be reduced or totally lost.

Spinal Cord Injury or Brain-Stem Stroke (cerebrovascular accidents) may cause a tem- porary or a permanent loss of ability to speak. Writing or use of keyboards may be also disabled due to the physical condition.

Aphasia is the state of a person’s inability to comprehend or generate language, due to a brain stroke, an injury, a brain tumor, or other diseases.

Most forms of language impairment are associated with motor limitations. AAC has, therefore, traditionally focussed on easing the selection of words or symbols from lists of pre-selected items.

2.1.3 A Brief History of AAC

AAC as a field of practice is continuously affected by social changes, psychological theories and by the development of technology. There are several aspects of AAC which have changed through time, including the identification of the needing population (who merits AAC intervention), assessment (the decision of when to intervene), means (what tools are to be used), and evaluation of AAC interventions. Starting from the 1950s and 1960s, growing awareness of human rights led to efforts to increase the integration of persons with disabilities into society. In these early years, assessment by speech therapists was limited to individuals with particular skills, such as the ability to imitate sounds

11 and comprehend and learn a spoken language. This restriction was due to a failure to distinguish between language and speech disabilities. Practitioners started applying sign language to individuals with disabilities, in addition to deaf people. This approach had the benefit of enabling fast communication. However, sign language is not understood by most people, cognitive impairments affect the quality of the language, and sign language requires accurate signaling, which in many cases was not possible since people with speech disability also suffer from motor impairments. Through the 70s, public schools of many countries were legally obliged to accept all children with disabilities. This legal action encouraged more significant efforts on finding solutions for non- speaking children. However, the professional attitude was to wait until there is certainty that the person will not acquire spoken language, yet acquire prerequisite skills before they were considered viable candidates for AAC services [Hourcade et al., 2004]. The focus in the intervention of aided communication was on the pragmatic aspect, i.e., recognizing that the ability to communicate is not only knowledge of a language but also to learn its functionalities. Aided communication was still mostly sign language, gestures and picture symbols, but usually not both of them together. Symbol sets such as Rebus [Beukelman and Mirenda, 1998] and Blissymbols [McDonald, 1982] were developed. Electronic devices were introduced, such as message printing devices and scanning devices. Augmentative and alternative communication methods were still restricted to people with good cognitive skills or without severe motor disabilities. Throughout the 80s, assessment of AAC for individuals was based on the Communication Needs Model: the primary goal was to reduce an individual’s unmet communication needs. Speech therapists first identified an individual’s current communication needs and then, the degree to which those needs were being met. At first, candidacy for AAC was determined considering one’s cognitive abilities, age, and motor-oral abilities. If a decision for intervention was taken, an aided or unaided communication device was chosen, and finally goals of communication were determined. With the development of computers, new means of communication become possible: voice prosthesis, pointing devices, and assisting software. It was also understood that in order to enable good communication, both aided and unaided techniques must be available for each individual, with no regard to his cognitive and physical disabilities, and in any case, there is no prerequisite for aided communication assessment. Contemporary assessment is mostly characterized by the Participation Model, a model which

12 assumes that each individual is entitled to and can achieve enhanced communication. Following this model, therapists identify the individual’s patterns of communication throughout the day and in different contexts, then assess the future communication needs. The AAC system for an individual is the overall answer and the devices for her needs [Hourcade et al., 2004]. Starting twenty years ago, [Hunnicutt, 1986] side-by-side with the development of Natural Language Processing (NLP) research, many novel approaches were introduced to the field. These will be discussed later in Section 2.2.2.

2.1.4 AAC Techniques

An aided communication system is the actual device that a person uses to communicate with his environment (a person may use more than one such system at different times). In the absence of an oral ability, symbols of various types are presented on a display (or a communication board). Communication is conducted by the sequential selection of symbols on the display, which are then interpreted by the partner in the interaction. If technology is present, artificial voice is used.

AAC devices are characterized by three aspects [Hill and Romich, 2002]:

1. Selection method

2. Input language

3. Output medium

In a computerized system, as [McCoy and Hershberger, 1999] mention, a processing method aspect is added to this list. This method refers to the process which creates the output once symbols are inserted. Each individual’s condition determines which method will be chosen for these aspects. Adjust- ing the right communication device is done by a team of professionals: a speech pathologist, an occupational therapist, an engineer, and others. This section elaborates on these three aspects – first the possible selection methods i.e., the physical choice of symbols on the communication board; then the types of input languages that are commonly used for aided communication and the various considerations in their initial choice; and finally the output devices that are in use. The processing methods are discussed later in Sections

13 2.2.2 and 8.4, and Chapter 9.

Selection methods are strongly connected to the person’s cognitive and physical abilities, and are affected by the device used to communicate. Selection can be either direct or assisted. Direct selection is achieved by pointing with a finger or using physical pressure on a display, a keyboard, or a touchscreen. It is also possible with eye-gaze or with the use of an alternative pointing device such as a light-generating device on the head or an eye-gaze tracking device. If direct selection is not possible, scanning techniques are possible with the aid of a peer, by using a set of switches, or by an auditory scan. The display or the keyboard is scanned at an adjustable rate and selection is made when the speaker indicates that the desired symbol was reached. Effective scanning techniques are crucial since the process of selection is much slower than in the direct one. If speech intelligibility is good enough and if technology is available, voice recognition is very useful, although it may be restricted to a small set of words and phrases. Selection methods are tied with the type of display that is used. As displays vary in many ways, so do the parameters of symbol selection: number of items presented, size of the display, as well as symbols and their orientation. Size is affected by the particular visual, motor and cognitive abilities of the user and by the space that is available in his environment or by, for example, the motor skills for direct selection [Woltosz, 1997]. Displays may be static or dynamic. Static displays consist of a board where the presented symbols do not change automatically. They “provide a fixed set of symbols which are mechanically affixed to an underlying layer of plastic or paper material.” [Woltosz, 1997] In dynamic displays, the symbols displayed may change automatically in response to the use of the device.2 Low-tech paper static displays have the advantage of being cheap, easily made and highly portable. Electronic static devices offer more novel uses such as semantic interpretation of symbol sequence, lower cost, and mobility. Dynamic devices have the benefits of a wider vocabulary and a decision-based usage of the display as opposed to a memory-based usage that may be cognitively more demanding. Selecting the device is affected by the various factors that characterize AAC

2Although the term dynamic display refers to electronic devices in most cases (for example http://www.augcominc.com/articles/7 2 1.html), it is also used for booklet-style carton displays as in Porter’s system [Porter, 2000].

14 decisions – the special needs of the communicator. For a motor disability, which affects accuracy of selection but not the language skills, a dynamic display which offers a reduced number of symbols at any given time may be more appropriate than a static display with a large number of small symbols. However, if selection itself is at a very slow rate, and navigation between displays creates an additional load, a static display may be more useful [Woltosz, 1997]. Nowadays, there are several off-the-shelf computerized devices both for dynamic and static dis- plays. Some are dedicated devices such as dynavox and some take advantage of laptop computers as a basis for the communication device (Mayer-Johnson’s Speaking Dynamically °c , Don-Johnston Talk-About °c ). Considerations in designing the layout of dynamic displays, especially for young children, are described in detail by [Burkhart, 2005] and [Porter, 2000]. The main idea is to allow the com- municator access to as wide a vocabulary as possible while keeping every presentation simple and easy to use. This is achieved by easing access to each page using category buttons, as well as other browsing options (next page, main menu, and similar). Another important issue is to leave space for new acquired words and easy access to the , and the overall positioning of all symbols.

The input language for AAC purposes varies through countries, kinds of disability, and special characteristics of the individual (who may use, for instance, both spelling methods and symbolic displays). A significant for symbols is on the scale of transparency. Transparent symbols share an immediately recoverable referent (icons) while opaque symbols require knowledge in order to under- stand their meaning (e.g., written language), in between these two extremes, there are translucent symbols [Beukelman and Mirenda, 1998], symbols that are not readily guessed without additional information. The trade-off between expressibility and transparency is significant and affects the decisions of which symbol system should be used, usually according to the speaker’s abilities. I will not discuss here unaided symbols, which refer to body and facial gestures and signs, or aided tangible symbols (real or miniature objects). Aided representational symbols [Beukelman and Mirenda, 1998] refer to two-dimensional sym- bols in various levels of abstraction (or transparency). Representational symbols are further divided into:

15 1. photographs - colored, black and white;

2. line drawing symbols

Within line-drawing-symbols, the most common of all are the Picture Communication Symbols (known as PCS).

Figure 2.1: PCS board

The PCS symbol system (from Mayer-Johnson Co.) is a line-drawing set of symbols, which is accompanied with software (BoardMaker °c ) (Figure 2.1). PCS has a set of 3900 symbols and continue to develop intensively. PCS is effectively used by pre-school children without cognitive disabilities and by adults with cognitive disabilities. The use of PCS seems to be acquired more quickly than in Blissymbols ([Beukelman and Mirenda, 1998] p. 59). Rebus symbols are also line-drawing symbols (Figure 2.2). To date, there are 7000 symbols 3 either colored or black-and-white, covering a vocabulary of over 20,000 words. Originally, the idea of the symbols was to represent homophone words with the same symbol (i.e., the symbol for not stands for both not and knot ([Beukelman and Mirenda, 1998] p. 60), however, this method is not applicable anymore 4. Learnability of the Rebus symbols is as easy as PCS or a bit less.

3http://www.widgit.com/symbols/about symbols/widgit rebus.htm 4http://www.widgit.com/symbols/about symbols/literacy/02.htm

16 Figure 2.2: Rebus board

Other line-drawn symbol systems are the picsyms, DynaSyms, and Symbols, which are all translucent to some extent and are all easier to learn than Blissymbols. We have chosen the bliss symbols as the set of symbols in this suggested implementation. The reasons for this choice are (a) an immediate need for Bliss symbols tools for usage in Hebrew, as was apparently told by Israeli practitioners; and secondly, () the internal structure of the Bliss symbols can be used efficiently, for instance, in the search process. Bliss symbols are composed of meaning carrying atoms and therefore graphical representation can be linked to the semantic represenation directly. However, the architecture of the systems does not depend completely on Bliss, and it is possible to use another set of symbols, if each symbol is linked to the matching concept in the lexicon. An extended review on bliss symbols is found in Chapter 7.2.1. Figures 2.3 and 2.4 exemplify various representations of concrete and abstract words in the above mentioned symbol sets. On the opaque side of the transparency scale there are the orthography symbols – written language, , text, and phonemic symbols. There are several possible outputs for communication displays. Electronic devices (VOCA - Voice Output Communication Aid) have either

1. digital output – i.e., recorded utterances.

2. synthesized output – a text-to-speech system.

Digital output has the benefits of being more personal but requires recording in advance. Syn- thesized output is flexible but may not be as pleasant for the user.

17 Figure 2.3: Comparison of symbols of concrete objects [CallCentre, 1998]

2.2 Speeding up Communication

A very important aspect of an AAC device is the rate of communication it enables its user. The average number of words in a normal spoken conversation is 150-250 per minute, but for an AAC user, it is less than 15 words per minute under most circumstances, and may even vary between 2 to 8 words per minute. Therefore, a major aim of designing communication tools is to find methods to enhance this aspect. Measuring rate enhancement is a complex matter since it is affected by various factors, which vary among the individuals who use the systems. [Beukelman and Mirenda, 1998] lists the following factors:

• Linguistic cost (average number of selections)

• Motor act index (number of keystrokes)

• Time or duration of message production.

• Cognitive processing time that is needed to make the selections.

• Productivity and clarity indices (i.e., measures of which meaning may be encoded and how well it is encoded).

Three main factors of rate measurement include [Hill et al., 2001]:

18 Figure 2.4: Comparison of symbols of abstract concepts [CallCentre, 1998]

1. language representation method usage.

2. selection rate

3. errors

The first factor is measured by the number of words generated per minute, and enhancement is measured by the ratio of words measured to the enhancement techniques. [Hill et al., 2001] One option to enhance communication is to use message encoding. Encoding can be done by letters, letter-category, alpha-numeric, or numeric encoding. For example: letter encoding (Please open the door for me == OD).

Methods used for abbreviation expansion are elaborated in section 2.2.2. A more sophisticated encoding method is the iconic encoding system, as realized in Minspeak’s semantic compaction [Baker, 1984]. In this system, which contains 128 basic symbols, the same symbol can be used for various meanings as context determines, and be prestored in an electronic device so that when the sequence is chosen the right vocal output is produced. For instance: apple by itself does not have any meaning, but apple + rainbow will refer to the word red. apple + house means a grocery, and time + apple means what time do we eat

19 Figure 2.5: Minspeak °c changes in meaning of apple

(see Fig. 2.5). It is important to note that this system was intended for vocal output since the complicated encoding system may not be understandable by a conversation partner [CallCentre, 1998].

2.2.1 Natural Language Processing and AAC

Natural Language Processing (NLP) is the field of computer science that studies how linguistic knowledge can help develop text-based applications, such as machine translation, text summariza- tion, expert systems, document production. NLP research and application development can be divided into three related subfields:

1. Natural Language Understanding (NLU) – understanding the meaning of a given text

2. Natural Language Generation (NLG) – generating text representing a given meaning

3. Language-transformation (such as machine translation) – transforming a given text into an- other textual representation

Underlying all three of these objectives, several lower-level tasks are required by all NLP appli- cations: Part of Speech (POS) tagging consists of assigning to each word in a text its part of speech (a label such as verb, noun, pronoun, preposition). POS tagging is a crucial task for further levels

20 of text processing, such as shallow parsing (or chunking) – i.e., identifying phrases in a given text. Attachment resolution is needed to assure correct syntactic parsing, i.e., finding the syntactic structure of a sentence. Anaphora resolution – i.e., finding antecedents of referring expressions and word sense disambiguation – finding the most likely sense of a given word, building the road to semantic parsing – i.e., understanding the meaning of text. NLP applications can be viewed from another perspective – how linguistic knowledge is encoded and how it is acquired. Some applications rely heavily on statistical information and machine learning techniques, and produce a program with opaque information encoding. Some statistical applications produce a set of rules which is readable and understandable to a human reader. Non- statistical methods (also called symbolic) rely on hand-written encoding of rules.

2.2.2 Language Techniques for Assistive Systems

NLP techniques have been used for AAC applications to enhance the rate of communication and extend the range of expressions that can be generated. The key applications include message generation, abbreviation expansion, word prediction and text simplification. Enhancements brought by NLP techniques first of all focus on reducing the number of characters typed by the user as much as possible. [Boissiere, 2003] defines the coding principle, and accordingly distinguishes three aspects of writing assistive systems:

• User’s point of view (with reference to the coding principle)

1. Abbreviation expansion – the user memorizes a set of abbreviated words and rewriting rules

2. Word prediction with a list of possible words

3. Word prediction with letter guessing

• Designer’s point of view – how syntactic, statistic, lexical and semantic knowledge sources are used to improve the coding principle

• Combined view

21 Word Prediction

Word prediction aims at easing word insertion in a textual software by guessing the next word that should possibly be written, or by giving the user a list of possible options of words. A similar process happens naturally in a human conversation between an AAC-user and a speaking partner. In such a situation, a speaking partner is most likely to predict the word that is to be said by using her knowledge about language and the context of the conversation [Garay- Vitoria and Abascal, 2004]. The main purpose of word prediction is to speed up typing, but it can also help dyslexic people in reducing writing errors. This field of research has seen a surge of interest with the development of mobile phones (with their limited keyboard) and of handheld devices. An example for word prediction in a given state of text insertion is as follows: I play b The system may offer the following words: be born ball baseball brand Now, if the next letter inserted is a, then the system narrows the offered words to: ball baseball basketball baglama balalaika The strategies taken in word prediction software are either to complete the current typed word by, for example, calculating the probabilities with each new character that is inserted, or to offer, in a pop-up menu, a choice of words which she probably meant to write, given the previous letters or words already typed. The process of prediction itself is made by using the following knowledge sources:

1. Statistic information - starting from unigrams i.e., taking into account probabilities of iso- lated words, or by more complex language models such as Markov models. The most common method in prediction applications is the unigram (see references in [Garay-Vitoria and Abas- cal, 2004]).

2. Syntactic knowledge - considering part of speech tags, and phrase structures. Syntactic knowledge can be statistic in nature or can be based on hand-coded rules [Garay-Vitoria and Abascal, 1997].

3. Semantic knowledge can be used by assigning categories to words and finding a set of rules

22 which constrain the possible candidates for the next word. This method is not widely used in word prediction, mostly because it requires complex hand coding or may be time consuming and inefficient for real-time requirements [Garay-Vitoria and Abascal, 1997].

All methods require lexical data. Such data can be acquired from corpora along with word frequencies and lexical databases (which may be incorporated into the system). A word-prediction lexicon usually includes words frequencies. It may also include part of speech data and semantic data. Lexicons must be adaptable, e.g., updated with the user’s vocabulary, and should be organized in an efficient way (linear vs. tree structure, with the trade-off of insertion cost) [Garay-Vitoria and Abascal, 2004]. In languages with a rich inflectional morphology, a statistical method based on frequencies only is not efficient, and a wide variety of syntactic knowledge is required [Boissiere, 2003] [Garay- Vitoria and Abascal, 1997]. A mixed approach which involves language models with part of speech information is more appealing and has been implemented in various systems (see references in [Boissiere, 2003]). For instance, [Garay-Vitoria and Abascal, 1997] present a system where, in the beginning of the sentence, words are predicted using a statistical model, but then, parsing of the partial sentence is used to predict words. Another possible method is by using two steps in prediction – one of a root and then of its possible inflections [Garay-Vitoria and Abascal, 2004]. Syntactic approaches require a set of linguistic tools such as POS taggers and lemmatizers, which are not available in all languages. Statistical methods are based on learning parameters from large corpora. This is problematic when the language that is written with the aid of the word prediction system is of a different style than the training data (which is, in most cases, obtained from newspapers). Since the personal language that is used may be very different than the one that was the base for modelling, systems must have a good strategy for handling unseen words or sequences of words (backoff models). Some word predictors build their language model on-line and are updated as the user enters more text. This strategy is an effective way to balance the mismatch with “off the shelf” language models, but it suffers from the limited amount of data available to construct the individual language model.

23 There are several heuristics which are claimed to reduce the number of keystrokes significantly:

1. Recency promotion either by increasing statistical parameters of recently seen words, or by managing a file of the words used most recently.

2. The trigger and target method, where certain words can be used as a trigger to the possible presence of another word within some distance.

3. Capitalization of proper nouns and at the beginning of sentences.

4. Inflecting words where needed (based on syntactic knowledge).

5. Writing compounds (in languages with rich compounding like German or Dutch).

The Drawbacks of the word prediction method lies mostly in the need to take an overt action to verify the system selection. Typing is, therefore, not a fluent task and may be a cognitive load [Shieber and Baker, 2003]. Evaluation of word prediction systems considers the keystroke savings, time savings, and cog- nitive overload (length of choice list vs. accuracy). A predictor is considered to be adequate if its hit ratio is high as the required number of selections decreases [Garay-Vitoria and Abascal, 2004]. Word prediction can save approximately 50% of the keystrokes (detailed analysis can be found, for instance in [Garay-Vitoria and Abascal, 2004]).

Abbreviation Expansion

Another option to enhance communication is to use message encoding. Encoding can be done by letters, letter-category, alpha-numeric, or numeric encoding. For example: the letter encoding for Please open the door for me can be OD. This is a very natural way to increase the communication rate. The naive and primary method for text expansion is a pre-defined look-up table, which typically is defined by the user. This technique requires memorizing the codes and maintaining a look-up table and may cause a cognitive overload [Moulton et al., 1999]. An ideal system would allow the user to generate abbreviations with no cognitive load or extra cost in keystrokes, process spelling, or typing error, and allow the use of new words in the lexicon. [Shieber and Baker, 2003] suggest a system that applies both prediction and compression of text insertion, using a human-centered compression method. This is accomplished by allowing the user

24 to drop all vowels in a word (except for initial letters), as well as dropping one letter in consecutive duplicate consonants. A language model was learned on the Wall Street Journal corpus, in four stages: construction of an -gram model of the corpus (basically, this means that the model can predict the likelihood of any sequence of n words), translating the language model to the compressed version of words, taking care of unknown words, and handling numbers. For example, consider the sequence: < An >< example >< of >< NUM >< words >. The sequence gets a probability by applying the language model. It is then converted to a sequence of characters and the unknown number is inserted – resulting in an example of 5 words. The words are then compressed – an exmpl of 5 wrds. In this example, the last word can be also the abbreviation of wards. However, probabilities are assigned for each possible sequence and the algorithm finds the most likely possible source. The reduction of keystrokes was measured in the number of characters of this system and was 26.5%, with a low error rate of 3%.

Symbols and Prediction

In addition to word prediction, several systems use prediction for sequences of symbols. [Waller and Jack, 2002] used word prediction methods for translating Bliss symbols into English. Based on the idea of language-independent word prediction [Claypool et al., 1998], a system for language independent translation from Bliss to English was developed to combine the translation module into a Bliss word-processor [Andreasen et al., 1998]. For this purpose, two dictionary files were created: a word association dictionary, containing tri-gram information from a given corpus (a source text-file), and a file with information for each Bliss symbol: the symbol translation in English, synonyms, and possible inflections of the word. The word association dictionary contains balanced binary trees of three levels: each word is a node in a balanced binary tree, and functions as a root for another binary tree which contains the words that are found to follow it in the text, again, each node is a root of a third level of binary tree for the third word of a sequence. Each node contains the frequency of the word as well. Once these files are created, translation proceeds as follows: given a Bliss sequence to translate, the program consults the Bliss dictionary and retrieves all synonyms/inflections for the given word. For every possible sequence, using a Markov language model, the association dictionary is searched and the probability of the sequence is calculated. However, because of the telegraphic nature of

25 the Bliss utterance, for a given sequence AB, the trees are searched for A B sequences as well. For example, if the given input is: boy + to go – assuming lad is a synonym of boy and going is a possible inflection of to go, possible strings that are computed are: boy + is + going or lad is going. Evaluation on different sizes of text input file, shows that for a 1,000,000-word file, the shorter the sentences are, the better the translation is. However, the longer the sentences get, the more mistakes are found (hence sentences of up to 5 words were translated well). Better results may be achieved with different source files. An additional system for symbol prediction is [Gatti and Matteucci, 2005], named CABA2L (Composition Assistant for Bliss Augmentative Alternative Language), predicts Bliss symbols and claims to reduce 60% of the time required to produce a message. The system’s approach is statis- tic/semantic, using a discrete implementation of Auto-Regressive Hidden Markov Model (AR- HMM). The Markov hidden states are the semantic category of the previous Bliss symbol. All Bliss symbols were assigned one of six grammatical categories, and each category is further subcat- egorized by semantics. Subcategories may share a logical connection, but substantive categories, for instance, will be specified only if they have a parallel verbal category (for instance food and feeding; however, there is no substantive animal category since there is no other corresponding category). CABA2L is integrated into BLISS2003, a communication software centered on the Bliss lan- guage. CABA2L receives the last symbol entered from BLISS2003, and calculates the four most likely symbols to be chosen next. These symbols are presented in a separate pane, and scanning of symbols starts from this pane. Tests with users of BLISS2003 showed a time reduction of 60%, a very short time needed for adjustment to the system, and no significant delays of system calculations.

Text Simplification

Aphasia refers to a loss of communication skills of adults as a result of a stroke, brain tumor, de- generative diseases, or a head injury. Disabilities may be in comprehension of language (Wernicke’s Aphasia) or producing language (Broca’s aphasia), as well as in reading and writing. A total loss of reading skills is called alexia and partial reading disorders - acquired dyslexia. Most aphasic people display difficulties in sentence comprehension. The most acute problems for aphasic patients are:

26 1. comprehension of sentences with multiple verbs and their functional argument structures.

2. the tendency to read sentences in an SVO (Subject-Verb-Object) order makes passive clauses problematic – especially when the meaning allows a reverse reading of the clause (i.e., a cake was eaten by the boy will be understood correctly, but not the man was slapped by the woman) [Canning et al., 2000].

3. Anaphora resolution.

Text simplification is a language-transformation task in NLP research. The purpose is to rephrase a given text to make it comprehensible by aphasic readers while preserving the origi- nal meaning of the text. Complex syntactic structures and non-frequent words are identified, and text is generated anew with simpler syntactic structure and more frequent words. There are several text simplification systems such as PSET (Practical Simplification of English Text) [Carroll et al., 1998], SYSTAR [Canning et al., 2000], and ENDOCRINE [Liben-Nowell, 2000]. The typical architecture of a text simplification system is as follows:

Analyzer Syntactic analyzer and a partial disambiguator.

Simplifier Generator of text in simpler structures.

The Analyzer is composed of three main modules, structured in a pipeline:

1. a lexical tagger,

2. morphological analyzer - inflectional analysis of words in the text given the part of speech.

3. a parser - builds a syntactic tree and marks words with their grammatical relations.

The resources used for the system include a lexicon of places, organizations, institutes, and similar for named-entity recognition. Quoted text as well as headlines are not simplified. As was mentioned above, there are two tasks of simplification: lexical simplification – i.e., use of more frequent or unsophisticated words, and syntactic simplification – i.e., transforming complex syntactic structures into simpler ones. The aim of the system is not to summarize but to simplify the source text, thus keeping it as cohesive as possible. Cohesion is kept in two manners: (1) Ensuring that the resolved anaphors are not replaced if the original noun phrase (NP) appears

27 previously within the sentence, and (2) replacing original sentence-opening anaphors with NPs to maintain the text style. In the SYSTAR system, while regenerating the simplified text, cohesion is kept by filling elided NPs after splitting compound sentences, and preserving the tense, mood, and aspect of passive sentences [Canning et al., 2000].

The architecture of the simplifier also consists of three modules in a pipe-line:

1. Anaphora resolution (while considering context)

2. Splitting compound sentences and transforming passive to active sentences (single sentence processing)

3. Replacement of some resolved anaphors.

Simplification is done using a set of rules - a given sentence is unified with left-hand pattern and when a match occurs, it is transformed following the right-hand pattern of the rule. Resolving and replacing anaphors is done when pronouns of a a given set (he, she, him, her, they, his, hers, their) occurs in a sentence. The resolution is based on the CogNIAC system [Baldwin, 1995], which returns a set of possible antecedents for a given pronoun. A sequence of rules is then applied: (1) coreference information (gender, number, type); (2) subject/object picking subject/object respectively; (3) pronouns with unknown grammatical function will pick the most recent antecedent; (4) recency (up to two sentences window). A pronoun is replaced only if its antecedent was detected in a previous sentence. In [Liben-Nowell, 2000], the syntactic simplifier is based on a set of rules that are written by linguists, then applied on parse trees. The lexical simplifier [Carroll et al., 1998] finds the set of synonyms for a given word in Word- net, calculates their frequencies and, with consideration of the simplification level required for the user, consults the Oxford Psycholinguistic Database [Quinlan, 1992] for the synonyms’ Kucera- Francis frequency. Finally, the most frequent word is chosen. Since true disambiguation requires deep semantic analysis of text, in cases of ambiguity, the assumption is that frequent words won’ have to be replaced and less frequent words tend to be less ambiguous.

28 Evaluation of the simplifier was calculated for the different modules, with 60% recall, 84% precision of Anaphora resolution, 100% recall, 88% precision of complex sentences, 70% correct activisation of passive clause. The user evaluations show that the new simplified text shortens reading time.

Message Generation

Although most systems described above can be considered as message generation devices, I include in this category systems that are not used to enhance or ease the text typing process, but are being used as a computerized communication board with symbols, letters or words and capable of generating a full sentence from a partial input sequence. This section describes only sentence retrieval systems; section 8.4 and chapter 9 discuss in depth those systems that use natural language generation (NLG) techniques for message generation. A meaningful use for a sentence retrieval system is story telling. It is very important to give the AAC user not only the ability to express his needs and wants, or to react to what was told to him beforehand, but also to initiate a conversation or a social chat, as well as to tell his own stories. The slow rate of communication and the limited range of symbols on a display can narrow this ability. The use of story-telling systems encourages users to take a more active part in discussions, and improves literacy skills, especially when the messages are edited online by the user [Waller et al., 2000b]. [Waller et al., 2000a] have addressed this issue in their research, developing Talk:About, a commercial system (produced by the Mayer-Johnson company), which enables a user to store personal stories in advance and allow these stories to be quickly edited and retrieved during a conversation. Selected sentences are vocalized with a speech synthesizer. Stories are categorized by topics or people, and a list of possible stories are offered based on frequency and the history of story telling. The system includes both the Quick:Chat (a Don Johnston, Inc. product) for quick access of commonly used phrases, and Co:Write, a word prediction software. [Pennington et al., 1998] discusses SchemaTalk [Vanderheyden et al., 1996], a software designed to access large amounts of prestored text in an efficient and intuitive manner to speed up com- munication in predictable conversation, based on psychological research. This program offers a set of stereotypical conversation schemas with slots to be filled and for each slot a set of pre-defined fillers. For example, a schema for “buying food at a store” would include the following template:

29 I want to buy . The would then be associated to a list of candidate filler words for this slot. [Vanderheyden et al., 1996] shows evaluation of SchemaTalk made by two subjects - an AAC user and a speaking person, in an artificial job interview. Evaluation shows an increase, that improved with the course of the study, of both the number of words per turn (from an average of 10 to 22.6 for the AAC user) and the overall speech rate (from 4.5 words per minute to 5.3).

2.3 Summary

NLP and AAC relate to the use of language from two very different points of view, but have very much in common. Both fields of research and assessment search a path to produce language in a non-natural way, as well as to make language easier to understand when the ability to understand is damaged or absent. AAC systems can be evaluated along three main dimensions: intelligibility (how the system helps users be understood and understand converstions), efficiency (how the system speeds up communication to overcome physical and cognitive impairments), and social value (how the system enables users to participate in social interactions). AAC systems vary according to the target population, the communication goals, the technology used, the input language, its layout, the selection methods, the output, and the processing methods in computerized systems. NLP systems also vary in their goals, knowledge representation, and processing methods. Pro- cessing refers either to different levels of understanding and analyzing text, or to the production of texts for various communication goals. This work investigates the potential of using NLP techniques to improve AAC systems. NLP techniques are used to (i) enhance communication via prediction or expansion, (ii) to generate full messages, and (iii) to simplify text. We focus on the field of Natural Language Generation and, Specifically, on the construction of a dynamic display to generate messages using the Bliss symbolic language, for both Hebrew and English speakers. The key objectives of the work are to improve intelligibility by providing an explicit represen- tation of the semantic content of the interaction, and improve efficiency by exploiting semantic

30 representation to generate well-formed messages taking into account linguistic knowledge. The next chapter presents in detail the objectives of this research and the approach of using semantic authoring for message generation in the AAC context.

31 Chapter 3

Objectives

In this work we investigate ways to exploit natural language generation (NLG) techniques for designing communication boards or dynamic displays for AAC users. The scenario we consider is the following: an AAC user selects a sequence of symbols; his partner then reads out the sequence and utters a natural language sentence. We interpret this scenario as a typical natural language process: content planning is performed by the AAC user and content is expressed by the sequence of selected symbols; linguistic realization is performed by the interlocutor. We use NLG techniques to produce utterances automatically from the sequence of symbols, while the content determination is done by the AAC user. Previous works on NLG-AAC systems ([Vaillant, 1997], [Copestake, 1997], [McCoy et al., 1998], for example) have adopted a technique of first parsing a telegraphic sequence, then re-generating a full sentence in natural language. The initial message is of a telegraphic nature because it lacks the main cues of morphological and syntactic structure that exist in natural language. As a consequence, reconstruction of the intended meaning is made difficult. Deep semantic and lexical knowledge sources are required to recover the meaning. In general, such resources are not readily available and, as a result, systems with only a reduced vocabulary have been demonstrated. The main question we address in this thesis is whether generation is possible, not through the process of parsing and regeneration, but through a controlled process of authoring, where each step in the selection of symbols is controlled by the input specification defined for the linguistic realizer. In addition, we address the need to implement a wide coverage lexicon, which will not restrict

32 the system to a small vocabulary. We investigate how a reusable, wide coverage lexicon can be integrated with existing syntactic realizers and within the AAC usage scenario. The third aspect we address is multilingual (English/Hebrew) generation. In a continuation of our previous work ([Dahan-Netzer and Elhadad, 1998a], [Dahan-Netzer and Elhadad, 1998b], [Dahan-Netzer and Elhadad, 1999]) – the aim is to develop a system that can generate text in both Hebrew and English from the same sequence of symbols. We have chosen Bliss symbols as the input language of the communication board. Bliss is an iconic language which is used world-wide by AAC users. Bliss is composed of a set of approxi- mately 200 atomic meaning-carrying symbols. The rest of the symbols (approximately 2500) are combinations of these atomic symbols. This compositionality is a very important characteristic of Bliss as a language, and we designed a lexicon which captures the strong connection between the meaning and the form of the symbols. We investigate how the explicit, graphic meaning of words can be used in the process of language generation. Finally, a practical objective of our work was to provide Bliss tools for Hebrew speakers. When Bliss was adopted for use in Israel, a decision was taken to write Bliss symbols from right to left as the Hebrew writing system, and consequently to invert the display of the symbols (or at least of most of them). As a result, most software developed in the world for Bliss (either commercial or experimental) could not be used by Hebrew-speaking users 1. As part of this research, we have developed a set of tools (lexicon, composition) to work with Hebrew Bliss.

3.1 Generation from Telegraphic Input

Existing NLG systems for AAC purposes share a common architecture: a telegraphic input sequence is first parsed, and then a grammatical sentence that represents the message correctly is generated. The main difficulty in this method is that when parsing a telegraphic sequence of words or symbols, many of the hints that are used to capture the structure of the text and, accordingly, the meaning of the utterance, are missing. Moreover, as an AAC device is used not only for typing text, but also for real-time conversations, the interpretation of the utterance relies to a large extent on pragmatics – such as the time of a mentioned event, omitting syntactic roles and making reference to the immediate environment.

1See the report by Judy Seligman-Wine: http://www.blissymbolics.org/canada/pg10 30th.htm

33 In [McCoy et al., 1994], a set of examples from real therapists and AAC users’ conversations are given: each example is a pair that includes the user’s telegraphic utterance and the therapists’ full sentence interpretation (as confirmed by the user). The paper analyzes the syntactic and semantic inferences that were made by the therapist. These data were used when designing the Compansion system [McCoy et al., 1998]. For example: S: T: Girl will make the eggs in the pan for breakfast In this example, the therapist (marked with T) added the future tense, plural number to the egg and the for preposition. In other examples, the original word order was changed by the therapist (1), missing agent (2) or verb (3) were inferred, conjunctions (3) and even content words (4) were added. (1) S:

T: Boy is dusting the table and the grandmom is sweeping the floor. (2) S: T: They are washing clothes. (3) S: T: They have toys. (4) S: T: The girl makes up the bed and the boy helps the girl make up the bed. (5) S: T: Girl clothes up. She’s hanging the clothes up.

The main questions at stake are: how good can a semantic parser be, in order to reconstruct the full structure of the sentence and are pragmatic gaps in the given telegraphic utterances recoverable. In order to answer the questions, we must investigate the knowledge and inference tools that

34 can serve the purpose:

1. Rich lexical information

2. Data representation and unification tools

3. View of context

Since telegraphic text contains mostly content words and lacks functional words as well as morphological inflections that are used to identify the word’s part of speech, a most reasonable method to parse the utterance is by using dependency relations and therefore using data structures that support such dependencies. Such methods were used in several works dealing with translating telegraphic text, mostly military messages. In [Grishman and Sterling, 1989], the parsing grammar was enriched with rules which allowed omitting prepositions. However, this method considerably increases structural ambiguities (since all NPs can be interpreted as PPs as well) and, therefore, requires both rich semantic coding in the lexicon and good scoring functions [Lee et al., 1997]. Rich lexical knowledge is needed to identify the possible dependencies in a given utterance, i.e., to find the predicate and to apply constraints, such as selectional restrictions to recognize its arguments. In the sequence - the animate girl is most likely the agent of the make process, and the egg is its theme. But in structurally similar sentences, recovering the semantics of the process and the possible relations between its arguments can be more complicated 2: (1) There is a teacher. (2) The teacher is Dina. (3) The teacher is grumpy.

2The star notation means the sentence following it is ungrammatical or semantically-ill

35 (4) The teacher is in the room. There is a teacher in the room.

The table is in the room.
The table is in the room. *The room is in the table.
The book is on the table. *The book is in the table. *The table is in the book.

The verb to-be can be used for several semantic purposes: existential (1), equative (2), attribu- tive (3), and locative (4). Furthermore, locative relations are recoverable by the nature of the located and the location, and very particular attributes of the location must be known in advance (such as surface, container, size) (4). For this purpose, a rich ontology, which supplies relevant features (not necessarily lexical), is required in addition to the lexicon. Inferring missing verbs depends heavily on the context of the utterance. While shopping for food, the message could be interpreted as “Lets buy cheese”; but during lunchtime at school it may be “I have cheese in my sandwich.” In a more immediate question of context, a system must also supply defaults and make inferences on deictic properties of the references in the message. The question of the verb tense and definiteness of nouns depends on the history of the conversation as well as on the immediate happenings in the conversation context. It is crucial to understand these obstacles, in order to conclude that full automatic generation based on parsing only is not possible (as such a process cannot be fed with all context variables which can be perceived by the senses). The questions, therefore, are how can text be generated in an optimal way despite these obsta- cles, while enhancing communication rate, keeping the process easy, and allowing a wide expressive possibility.

36 Figure 3.1: Dynaxov °c sentence starters

3.2 Generation as Semantic Authoring

Our approach to the generation process in an AAC context is based on the scenario of semantic authoring. In this method, each step of input insertion is controlled by a set of constraints and rules, which are drawn from an ontology. The system offers, at each step, only possible complements to a small set of concepts. Generation is an incremental process and the full utterance’s input for the syntactic realizer is revised with each step taken. If the final input is not complete, missing constituents are given default values – either syntactic (such as pronouns) or using a set of pre- defined participants. The system also preserves a view of context by an underlying management of references – both to entities that were mentioned in the conversation and to propositions in general. Following the paradigm of dynamic displays as introduced by Gayle Porter in the Dynavox °c communication board (see Figure 3.1) – we also allow sentence starters such as I’m going to or I’ like to. We view such sentence templates as pre-defined partial semantic structures. This approach avoids the difficulty of semantic parsing by constructing a semantic structure explicitly while constructing the input sequence incrementally. It combines three aspects into an integrated approach for the design of an AAC system:

• Semantic authoring drives a natural language realization system and provides rich semantic

37 input.

• Communication board is updated on the fly as the authoring system requires the user to select options.

• Ready-made inputs, corresponding to predefined pragmatic contexts are made available to the user as semantic templates.

In the following chapters, we investigate each of the problems raised by message generation with Bliss symbols using semantic authoring. We cover the definition of the input language, the organization of the communication board, the definition and acquisition of the required ontology, the use of a large-scale lexicon, and, eventually, the evaluation of the effectiveness of this approach to (1) speed up communication and (2) extend the range of expressible content that an AAC system can support.

38 Chapter 4

Usage Scenario

We present in this chapter sample interaction scenarios between users and the AAC system we have designed. These usage scenarios illustrate the requirements that the system must meet. Overall, the system appears to the user as a dynamic communication board, which shows symbols in Bliss and produces fluent output in either Bliss, English, or Hebrew. As the user selects new symbols, the communication board is re-organized to ease the selection of further symbols. The sequence of symbols entered by the user is “translated” into fluent language incrementally – so that at each stage, output text appears on the screen. The main display is initialized in three possible manners: (1) most frequently used words, (2) most frequent shapes/symbols (e.g., person, water, activity), (3) possible scenarios (e.g., school, home, family). The board is initialized with a selection of candidate Bliss symbols and of references to entities that are likely to be useful in the interaction (we call these “participants”). Each user can tailor the initial set of participants for different scenarios. In addition, for different settings, defaults, can be specified. Defaults provide values for attributes such as the tense or the mood of the clauses. Defaults need not be repeated for each sentence, but can be specified only once at the beginning of the session. They can be overridden in specific sentences (but at the cost of extra typing). Defaults can be changed while editing.

To understand the manner in which our system works, we show how we deal with the following issues:

1. Choosing an initial board.

39 2. Setting defaults - tuning participants and speech acts for various environments for the user (school, home, shopping).

3. Selecting symbols and changing the board accordingly.

4. Generating text: adding function words, dealing with morphology, Aggregation, and referring expressions.

Assume the desired sentence is “A girl is making eggs in the pan for breakfast.”1 The first word to be chosen can be either an action or a noun. Once the symbol is chosen, the display changes and shows only symbols of verbs that have an animate agent. Since the list may include more symbols than can be shown – they are sorted according to frequencies (most frequent symbols are shown first). If the food/drink category is chosen, the symbols that are shown are filtered by the constraint of being labelled as food/drink. The food/drink category is made available as a selection if the situation home or shopping is activated. At any point while entering the input, it is possible to return to the main display and choose symbols from other categories. The determiner a will be generated by the grammar since an instance of the word girl was not used previously in the interaction. Once the verb make is chosen, the display will now show categories of nouns that can act as the theme to this verb. Again, if the context food/drink was already chosen, then the symbols displayed will be from this category. Next, the system offers sentence complements that are realized as circumstentials or adverbials such as where what for.

Consider the following descriptions of photos, given by a Bliss user:2 Pablo and I are playing. We are watching TV The following sections describe how the system provides the tools to generate the desired sen- tences. 1This example is deliberately different from the example presented in the previous chapter, since the input insertion is done differently, as will be explained. 2Examples taken from http://www.blissymbolics.org/canada/readingroom/english/text/filip contents.htm?Innhold.x=18&Innhold.y=32

40 4.1 Maintaining a View of Context

References (entities, symbols) that will be used frequently in the discourse are located on the board and are given default properties. Entities mentioned in previous utterances are shown as well, and in subsequent utterances these may be matched to complete an utterance if it has missing entities, using selectional restriction constraints. The symbols that appear on the board are not just words, but internally, they are connected to a semantic representation, including attributes and types from an underlying ontology. In the rest of the section, we present these semantic structures as expressions in the Conceptual Graphs (CG) formalism (which is described in further chapters). The user first selects the “participants” to the discourse s/he wants to introduce. This is done by selecting entities from the default pane and adding them to the “participants” context. If two participants are selected, the output shows a conjunction structure in the text pane. For example, the selected participants are described by the following semantic description (using the CG notation): [Boy: #I]-(Name) --> [Word: "Felipe"] -(Age) --> [12] [Boy: #Pablo]- (Brother-of) --> [Boy: #I] - (Name) --> [Word: "Pablo"] - (Age) --> [10] And the generated text will be Pablo and I. Note how internally, the system encodes participants as complex conceptual graphs – and not as words. The graphs displayed above indicate that Pablo is the brother of Felipe, who is now speaking. Reference planning determines that the forms “Pablo” and “I” are appropriate in the current discourse context to refer to these two entities.

4.2 Argument Structure Specification

After choosing the participants, the main pane shows symbols referring to verbs (activities) that require a Boy or one of the super-concepts of Boy (up to the semantic type Animate in the underlying ontology) as one of their arguments. The search for candidate activities is driven by the semantic type of the selected participants (in

41 WordNet Senses play(5 34) Thematic Roles Actor1[+animate] Actor2[+animate] Frames Intransitive (+ with-PP) "Brenda met with Molly." Actor1 V Prep(with) Actor2 Intransitive (plural subject) "The committee met." Actor1[+plural] V Simple Reciprocal Alternation Intransitive () "Brenda and Molly met." Actor1 and Actor2 V With Preposition Drop Alternation () "Anne met Cathy." Actor1 V Actor2 Verbs in same (sub)class [consult, meet, play, visit]

Figure 4.1: VerbNet entry for Play our example Boy). Since there may be many such verbs, they are filtered and ordered by frequency of usage, or by context. If the symbol to play is chosen, the text ’Pablo and I are playing’ will be generated immediately, using the right inflection for the verb ’to be’ and the default progressive tense. The verb ’to play’ belongs to the ’meet’ class of verbs [Levin, 1993]. This class defines possible syntactic alternations (e.g., the Understood Reciprocal Object Alternation) (see Figure 4.1). This information is encoded in the lexicon of the system and drives the generation of the sentence in one of the possible syntactic structures: I played with Pablo, Pablo and I played (with each other). The second option is chosen for the default realization; however, the system can provide the other alternation by a single push of a button. Once all participants required for the verb categorization structure are given, the system offers to generate circumstantial adjuncts (such as location and time).

4.3 Referring Expressions

In a subsequent reference to Pablo, either a pronoun will be used (he) or, in case of possible ambiguity, the phrase my brother will be generated. We use the algorithm of [Reiter and Dale, 1992] for choosing among the possible forms of referring expressions (pronoun, full definite expression, partial definite expression, one-anaphora). This algorithm relies on discourse context information, which we maintain in the form of the list of CGs referred to in the previous discourse.

42 The system also performs aggregation – and combines utterances into a single one following the algorithm defined in [Shaw, 1995]. For example, for the above sentences – the system can generate one clause: Pablo and I are playing and watching television.

4.4 Lexical Choice and Syntactic Realization

Internally, Bliss symbols are mapped to conceptual graphs. As symbols are entered, the graphs are joined to form a larger graph depicting a complex entity or a situation. Each time the graph under construction is modified (by typing a new symbol or modifying one of the symbols in the concept), the full generation chain is re-executed. This happens with no noticeable delay – the output sentence is just updated, in English and/or Hebrew. Concepts are associated to a lexical entry in the lexical chooser. A default lexeme is specified in the lexicon, together with synonyms. In most cases, the default lexeme is selected by the lexical chooser, but sometimes collocation constraints override the default. For example, for the selected symbols to see television, we generate “We are watching television” and not “We are seeing television.” To sum up – the user proceeds to produce the two sentences according to the following steps: An initial board is populated according to the “family” scenario. Symbols corresponding to family members become accessible in the “Participants” pane. Select the symbols Pablo and I in the “participants” pane. The expression “Pablo and I ” appears in the output pane. Possible actions are presented in the main board. Candidate actions are selected based on selectional restrictions, frequency information, and the current scenario. The clause “Pablo and I are playing” appears in the output pane. The tense is selected by default (as specified by the current scenario). A new sentence is started. The action see is selected. The subject for this action is matched by looking up the context – and the group “Pablo and I” is provided as a default. It is now rendered in text as “we” – since the reference is now recoverable. The sentence “we are seeing” appears in the output pane. Possible complements for see are proposed on the board. We search the ontology for concepts that match the selectional restriction of see (encoded in the VerbNet lexicon as the WordNet

43 synset stimulus). We narrow the search according to the current scenario (family). The participant TV is now selected. The lexical chooser adapts the sentence to “we are watching TV ” based on a collocational constraint. Let us see how to deal with the set of sentences given in the previous chapter.

Figure 4.2: Bliss sequences for to be yeS verbs

(1) There is a teacher. (2) The teacher is Dina. (3) The teacher is tall. (4) The teacher is in the room. There is a teacher in the room.

44

The table is in the room.
The table is in the room. *The room is in the table.
The book is on the table. *The book is in the table. *The table is in the book. In the case of using Bliss symbols as the input language, the process is somewhat simpler. First, there are distinct symbols for some of the possible relations that represents. There is/are is a distinct symbol from to be or to have (see Figure 4.2). In the case of spatial relations such as on or in, the relation itself is available on the display and the generation process generates the complete sentence accordingly. However, our ontology is still not specific enough to recognize whether an object has the properties of being a container or a surface.

4.5 Summary

As we have shown in this chapter, the generation of a message is done incrementally and without the need for parsing. At each step during the message production, the system processes an internal representation of the ontological and lexical data, generates a partial sentence and the display is updated according to the context and according to selectional restrictions. The system uses default values based on pre-defined settings and using reference expression planning. The next chapter presents the system architecture and the flow of information among the components of the system.

45 Chapter 5

System Architecture

A typical system of natural language generation contains two main components with distinct (but strongly connected) functions: content planning addresses the question of what to say, i.e., producing content, and surface realization determines how to say this content (see Section 6.1). In a system that automatically generates text from a telegraphic symbolic message (an AAC- NLG system), content determination is practically performed by the speaker, and surface realization is performed automatically by the system (see Figure 5.1). This chapter presents the structure of our system by presenting in turn:

1. The infrastructure upon which the system is built, including a set of lexical databases, real- ization grammars, and ontologies.

2. The User Interface presented to the AAC user.

3. The internal process of generating a message.

5.1 Infrastructure Development

The construction of this project is based on a set of tools, which have been developed separately, then integrated into the AAC tool:

• Lexicons -

1. Bliss lexicon

46 Figure 5.1: General architecture and flow of information

2. Integrated verbs lexicon

• Ontology - concepts and relations database; we developed an ontology acquisition tool using online lexical resources.

• SURGE/HUGG - English/Hebrew syntactic realizer for natural language generation.

• Semantic Authoring Tool (SAUT) - a platform to design semantic authoring tools, where the user edits a semantic representation and is presented with realtime feedback in natural language.

Each of these tools has been used in different contexts than the AAC system we present here, and has been evaluated separately. The details of the tools will be presented in the following chapters in turn. In this chapter, we explain the overall flow of data from one component to the next within the AAC usage scenario. The Bliss lexicon we have designed and built enlists the symbols (which we interpret as the atomic conceptual units of the system) that are available in our system with their graphical rep- resentation (see Section 7.2.2). The lexicon comes with graphical tools to display or create new

47 symbols, a search engine to retrieve symbols given parts of the symbol or their translation in En- glish or Hebrew, and an engine to compute semantic relations among symbols based on their shared structure. The integrated verbs lexicon combines structural and semantic information from several sources (the WordNet Lexical Database [Miller, 1995], English Verb Classes and Alternations (EVCA) [Levin, 1993] and the COMLEX syntax dictionary [Grishman and Sterling, 1989]). The various sources have been merged and we have built a single, rich lexicon of English verbs. This lexicon has been formatted as an extension to the SURGE realization grammar for English [Elhadad, 1992] (see Section 7.4). We use this rich source of knowledge also as part of the ontology acquisition method. We developed an ontology which serves as the basis for the semantic authoring process. The ontology includes a hierarchy of concepts and the information it encodes interacts with the con- ceptual graphs processing performed as part of content determination and the lexical chooser. The ontology is described together with the other lexical knowledge bases in Chapter 7. We developed a semi-manual ontology acquisition tool which relies on the lexical knowledge databases WordNet and VerbNet [Kipper et al., 2000]. This module is presented in details in section 7.3. In order to allow output text in both English and Hebrew, we have extended the development of HUGG [Dahan-Netzer, 1997] – a syntactic realizer for Hebrew. Chapter 6 presents in details the process of natural language generation and the extensions of HUGG. Finally, chapter 8 describes SAUT [Biller, 2005] [Biller et al., 2005] – a system of semantic authoring and then use of the SAUT technique in a communication board with additional design decisions for the unique properties of the AAC usage, such as initializing the display and setting conversation defaults.

5.2 Flow of Information

In this section, we describe how data flows from the user input, the various knowledge sources used by the system and through the various processing components. The initial display set is based on user configuration, and according to the desired situation of use. The user begins her conversation by choosing a symbol (either from the main display or by pressing a hyperlink key which leads to a specific domain display, then choosing a symbol). Once

48 Figure 5.2: System Architecture a symbol is chosen from the display, the system looks up the symbol in the ontology and retrieves its information. A fragment of the ontology is shown in Figure 5.3. Each entry in the concepts ontology contains the object’s name, a string that is later used for lexical choice (including its synonyms), its immediate parent in the hierarchy, and a binary value that indicates whether it is an internal node i.e., it has no symbol to represent it. The hierarchic information was retrieved from WordNet (see chapter 7 on the acquisition process). If a symbol represents a predicating concept, it includes information about the outgoing relations (e.g., the serve-6 concept in Figure 5.3). A fragment of the relations ontology is shown in Figure 5.4. Once the user has selected a symbol, the system creates an object of the concept type that was chosen. Figure 5.5 shows a display after the choice of to play symbol. After the object is created, the system triggers two processes:

1. Change of display.

2. Surface realization of the partial (or complete) utterance

5.2.1 Changing Displays Dynamically

For each symbol chosen, in accordance to its type (action, object, attribute, preposition), a new display is structured. If the symbol is an activity, i.e., a concept with relations, then the new

49

Figure 5.3: Ontology fragment for the concepts: pan, breakfast, girl, egg, serve

50

Figure 5.4: Ontology fragment of relations display will show symbols that are compatible with the selectional restrictions of its relations. For instance, if the symbol to play is chosen, the ontology is looked-up for the concept: The entry indicates that two relations must be instantiated to build a valid conceptual graph (Actor1 and Actor2). The system looks up the relation in the relations ontology and retrieves the information which determines that the relations are connected to concepts of type living:

51 Figure 5.5: The display after the choice of the to play symbol

5.2.2 Lexical Choice and Syntactic Realization

Each time a symbol is chosen, the system converts the current expression to a conceptual graph (CG), maps the CG to a FUF Functional Description (FD), which serves as input to the lexical

52 chooser; lexical choice and syntactic realization are performed, and feedback is provided in English or Hebrew. If the chosen symbols so far were I and to play, the conceptual graph built is: [Play]-(Actor1)->[Person:{I}] This CG is transformed into an FD of the appropriate form and is unified with the lexical chooser, using the information on the verb play as embedded in the concept representation: The intransitive structure is chosen since there is only one participant given, and the resulting string generated is I play. However, once Pablo was chosen as the second actor relation and the CG is complete: [Play]- (Actor1)->[Person:{I}] (Actor2)->[Person:Pablo] The system consults the lexical chooser again and unifies the given input with the verb’s possible syntactic structures following its alternation, in this case: alternation alternation-of-verb-play-simple_reci_intrans [struct with-np] [struct subj-and-np-v] The [STRUCT WITH-NP] argument structure means that the input specification that is given to the syntactic realizer will be of the form:    struct with-np       cat clause           type accompaniment       proc       lex “play”           cat personal-pronoun          located           person first               cat pp      · ¸     partic               prep lex “with”      location                       cat proper        np      lex “Pablo”

53 This syntactic alternation indicates that the clause: I play with Pablo can be generated. Alter- natively, following the possible choice of the alternation which is available for the verb play with its current sense, the structure (STRUCT SUBJ-AND-NP-V) can be chosen as well, with the final output Pablo and I play. In the GUI of the system, a button can switch the generation of the clause from one argument structure to the next, according to the alternations supported by the verb. Next, the system offers the opportunity to add sentence modifiers such as time and location and other possible circumstances. Once the utterance is complete, the done button is pressed and the final sentence is generated. The sentence is generated with reference to previous utterances, i.e., the system handles referring expressions and performs aggregation (see Section 8.1 and 4.3). To this end, the system maintains a data structure encoding the entities which are referred to in each clause. As the discourse pro- ceeds, the discourse context is updated with the conceptual graph representation of each entity that is mentioned. This context representation is used by the reference planning module to deter- mine whether further references are to be realized as pronouns, definite noun phrases, or partial descriptions.

5.3 Summary

Two aspects of the system’s architecture were discussed in this chapter: the underlying components that compose the system, and the internal process of generating a message. The main knowledge sources of our system are the lexicons (Bliss, English, and Hebrew), the on- tology (derived from lexical resources) and the syntactic realization grammars (SURGE for English and HUGG for Hebrew). The flow of information in our system is typical of an NLG system, driven by a semantic input interactively authored by the user. Bliss symbols are entered together with a minimal input syntax. Internally, a semantic structure corresponding to the intended meaning is constructed in the Conceptual Graph formalism (CG). The CG is then mapped to a lexicalized structure in English or Hebrew using a lexical choice module. The structure is then realized into a fluent sentence using a realization grammar. The discourse context is maintained as new utterances are entered, and reference planning and aggregation are performed on each utterance during the generation process, thus improving the fluency of the conversation, and speeding up the selection of entities to which

54 past discourse has already referred. In the next chapter, we provide more details on the Natural Language Generation components of the system, with special attention to the Multilingual Generation aspect and introduce our contribution to NLG in Hebrew.

55 Chapter 6

Natural Language Generation and Syntactic Realization

In face-to-face communication, a speaking partner and a speech disabled partner use paper-based displays with lists of symbols. The speech disabled partner selects a sequence of symbols (by pointing at them), and the speaking partner interprets and pronounces the desired sentences out loud, according to the symbols that are chosen by the AAC user, while adding function words and inflecting verbs and nouns following the syntax of the spoken language. With computerized AAC systems with textual or vocalized output, dedicated devices, or soft- ware in a personal computer, the speaker aims to reach autonomous communication. He must, however, explicitly choose all symbols, including morphological inflections, function words or prepo- sitions in order to get a full grammatical sentence. This may not be possible for those who lack literacy skills, and in any case requires additional keystrokes and slows the communication rate. Pre-stored sentence retrieval [Waller et al., 2000a] is a method which aims to avoid this burden. However, the sentence retrieval method suffers from a restricted pool of utterances and limits the user’s ability to express himself. It has, therefore, limited applicability (it is most useful when quick responses or fluent conversation are required [Vanderheyden and Pennington, 1998]). Natural language generation (NLG) techniques can be used to generate full sentences from telegraphic-style messages. Merging this capability within AAC presents an attractive route of investigation. This chapter surveys the field of NLG and introduces our own contribution. We present the

56 typical architecture of an NLG system and methods used for multilingual generation. Section 6.2 focuses on the syntactic realizer, which is responsible for producing the linear form of the words. Section 6.3 presents our implementation of HUGG, a unification-based grammar for the generation of Hebrew. HUGG is the first available realization grammar for Hebrew. In the next chapter, we present our further contribution in NLG in the form of a reusable large-scale lexicon for generation.

6.1 Natural Language Generation

Natural language generation (NLG) is a subfield of Natural Language Processing (NLP), studying the process of language production from non-linguistic representation of data. The NLG process can be viewed ([Reiter and Dale, 2000]) as goal-driven communication: the production of an utterance in natural language is the attempt to satisfy a set of communicative goals of the speaker. The generation process consists of making a series of decisions – starting from planning the content and ending with lexical and syntactic decisions. Use of NLG techniques is growing in various fields. For instance, systems which deal with vast volumes of data, and need expertise to interpret it and rewrite it in spoken language, are good candidates for an NLG system. The main uses of NLG are (1) to make data understandable (expert systems, reports), and (2) to produce routine documents that must be updated often. In some NLP applications, NLG techniques complement other NLP tasks, such as Machine Translation (MT) ([Dorr et al., 1998] [Temizsoy and Cicekli, 1998]) or automatic summarization ([Barzilay et al., 1999], [Hovy and Lin, 1998]). In all these applications, the generated text can be in various languages, leading to multilingual generation (MLG). Multilingual generation (MLG) systems generate text in several languages from a single source of information, without using translation.

6.1.1 The Architecture of an NLG System

Traditional NLG systems address the following tasks: content planning (content determination and document structuring), and surface realization (lexicalization, aggregation, referring expres- sion generation, and finally syntactic realization) [Reiter and Dale, 2000]. The content planner includes several sub-modules:

57 Content determination is the module that decides which information should be communi- cated in a text. The decision depends on the communication goals of the intended text, on the intended reader of the text (expert reader, children, etc.), the size of the text, and the nature of the underlying information. Document Structuring is the process of ordering and structuring the chosen information in a text: such as deciding where to put paragraph boundaries, and determining the rhetorical structure. The surface realization module contains the following sub-processes: Lexicalization is the process where content words are chosen to represent the meanings that should be conveyed. This process may first aggregate data into meaning components (Concep- tual Lexicalization) and then find the words in the target language to express them (Expressive lexicalization). Aggregation can be performed at various stages of linguistic generation (in addition to the conceptual lexicalization) – several concepts can be expressed in a single word, two sentences can be aggregated into one (if, for example, they differ in subject only). Referring Expression Generation is the process of determining how to produce a reference to an entity that should be mentioned in the utterance. Syntactic realizer generates the final linear form of words, deals with morphology and is responsible for the uttered output being syntactically correct. This process is elaborated more below. All of these missions are generally constructed in a pipe-line architecture: Document Planner - Content determination, document structuring and conceptual lexicaliza- tion. Microplanner - responsible for the expressive lexicalization, linguistic aggregation, and referring expression generation. The two processes provide a text specification which is realized by the syntactic realizer. Section 6.2 further elaborates on the syntactic realizer.

58 6.1.2 Multilingual Generation (MLG)

Writing documents in different languages in parallel is a common task. This is done daily for weather reports or software manuals. Automatic MLG refers to the production of documents in several languages from a single database (and not translating from one source language to other target languages). The production of technical manuals was found to be an effective application of MLG ([VanderLinden and Scott, 1995] [Paris and Linden, 1996]). A central question in MLG relates to the representation of the input to the syntactic realizer (SR). This representation corresponds to the “” used in machine translation (MT). Several recent systems have explored this issue to some depth (WYSYWYM [Scott et al., 1998], Drafter [Paris and Vander Linden, 1996], kpml [Bateman, 1997], UNITRAN [Dorr, 1994]). The question can be rephrased as – what is the highest level of information that can be common to all languages? These questions concern the interface between the knowledge source (an ontology for instance) and the lexicon (mapping from terms and concepts to lexemes) with reference to the various languages. In [Stede, 1996], multilingual generation is viewed as a paraphrasing problem in a single lan- guage. This work refers, though, to the lexical level only, and does not refer to a unified input specification for the SR. Since not all multilingual generation systems depend on a stable ontology, and all knowledge that is used is of the specific application (like [Callaway et al., 1999]), a more ’shallow’ approach is often needed – that is, the interlingua must be established at a level closer to the observed syntactic level of the various languages. In [Dahan-Netzer and Elhadad, 1999] [Dahan-Netzer and Elhadad, 1998b], we have established an input representation for the generation of Hebrew/English noun phrases, starting from the same input structure but with different lexemes only. The methodology we pursued there was to express the syntactic form of the noun phrase in the two languages as a set of minimal distinctions (for example: does the noun phrase include a compound construct - smixut - in Hebrew? does the determiner express a vague or an exact quantity, etc.?). We then analyzed the knowledge required to make these decisions. The role of the input specification to the SR is to provide the minimal set of answers that can guide the generation process. By comparing the set of decisions required for Hebrew and English NPs, we were able to produce a compact set of semantic features which provides answers to the decisions required by both languages in their various syntactic forms.

59 MLG Systems aim to be as domain-independent as possible (since development of such systems is expensive), but are usually applied to a narrow domain, since the design of the interlingua refers to domain information. MLG systems share a common architecture consisting of the following modules:

• A language-independent underlying knowledge representation: knowledge represented as AI plans [Rosner and Stede, 1994] [Delin et al., 1994], [Paris and Vander Linden, 1996], knowledge bases (or ontologies) such as OWL, the Penman Upper-model, and other (domain-specific) concepts and instances [Rosner and Stede, 1994].

• Micro-structure planning (rhetorical structure) - language independent - is usually done by human writers using the MLG application GUI.

• Sentence planning - different languages can express the same content in various rhetorical structures, and planning must take it into consideration: either by avoiding the tailoring of structure to a specific language [Rosner and Stede, 1994] or by taking advantage of knowl- edge on different realizations of rhetorical structures in different languages at the underlying representation [Delin et al., 1994].

• Lexical and syntactic realization resources (e.g., English PENMAN/German NIGEL in [Rosner and Stede, 1994])

6.1.3 AAC as an MLG Application

Our approach in this work is to consider the AAC application of message generation as an MLG application: symbols are entered and are interpreted as a semantic specification of the intended meaning. From this point on, we apply MLG techniques to translate the semantic specification into several languages – a fully specified Bliss sequence, English, and Hebrew. The English and Hebrew versions of the message are intended for the communication partner (who may not be fluent in Bliss), and the three versions of the message are intended as a feedback tool for the disabled partner producing the message, to confirm the validity of his input. We call this approach semantic authoring – that is, our tool provides an environment where the user can specify a semantic expression of the intended message interactively and in context.

60 As an MLG system, our system [Biller, 2005] [Biller et al., 2005] also includes similar modules. We have chosen to use Conceptual Graphs as an interlingua for encoding document data [Sowa, 1987]. We use existing generation resources for English – SURGE [Elhadad, 1992] for syntactic realization and the lexical chooser described in [Jing et al., 2000] and the HUGG grammar for syntactic realization in Hebrew (see [Dahan-Netzer, 1997] and below). For micro-planning, we have implemented the algorithm for reference planning described in [Reiter and Dale, 1992] and the aggregation algorithm described in [Shaw, 1995]. The NLG components rely on the C-FUF implementation of the FUF language [Kharitonov, 1999] [Elhadad, 1991] – which is fast enough to be used interactively in realtime for every single editing modification of the semantic input.

6.2 The Syntactic Realizer

Syntactic realizers are best characterized by the structure of their input. The input for SR varies from a pure syntactic structure (RealPro [Lavoie and Rambow, 1997]) to, at the other extreme, semantic inputs founded on a generic ontology (called an upper model [Bateman, 1997]). We use an intermediate approach, implemented in the fuf/surge [Elhadad, 1993] environment. Syntactic realizers are also distinguished by their theoretical basis: Some are dedicated to a single theory, like RealPro on MTT [Mel’cuk. and Pertsov, 1987], or kpml on SFL [Halliday, 1994]. surge uses the SFL theory, but also descriptive grammars [Quirk et al., 1985], and others like MTT and HPSG [Pollard and Sag, 1987]. Nitrogen [Langkilde and Knight, 1998] is based on an n-gram model learned from corpus analysis, but it still relies on the SFL approach for the characterization of the input language (it is used in [Dorr et al., 1998]). The syntactic realizer we use as a model, a basis for extension and a development environment (both theoretical and implementation framework) is the English syntactic realizer surge [Elhadad and Robin, 1996] implemented in FUF [Elhadad, 1993]. surge is a reusable grammar – it pro- vides a compositional input specification language and defaults, determines function words based on functional descriptions, orders constituents, performs morphological processing and syntactic pronominalization.

61 6.2.1 Input for Surface realization module

The syntactic realizer (SR) is the component that maps an input set of communication goals into a natural language utterance. The input contains knowledge, possibly at various levels of abstraction, of a linguistic phrase. Making a syntactic realizer available for many applications with different needs, requires it to allow a flexible input specification, without a commitment to a single lexicon or ontology. This flexibility allows one to plug the SR in to a system that might provide its own knowledge sources. In natural language generation, surface realizers are the front-end modules that convert an abstract semantic representation into a linguistic utterance. Several plug-in syntactic realization components are available for English sentence generation: surge surge implemented with FUF [Elhadad and Robin, 1996], NITROGEN, which uses a statistical model of lexical collocations and syntactic relations [Langkilde and Knight, 1998], RealPro which is based on the MTT formalism, and nigel, which evolved from the penman project [Mann, 1983]. nigel evolved into the multilingual text generation workbench kpml [Bateman, 1997].

6.3 HUGG

In a text generation system, a syntactic realizer is the last module of the process and is responsible for adding function words, controlling the linear order of the words in the utterance, and handling morphology. The design of a syntactic realizer depends heavily on the type of input that is given to it by preceding modules in the process. A basic assumption in developing an input specification is to to keep syntactic knowledge in the syntactic realizer, i.e., the input should be as semantic as possible so preceding modules can be as language-independent as possible, and relatively free of re-coding linguistic knowledge. The motivation here is to consider, in advance, multilingual generation as well as to allow non-linguist experts to design generation systems. HUGG (Hebrew Unification Grammar for Generation) is a syntactic realizer for the generation of Hebrew. We have developed HUGG as a Hebrew version of SURGE [Elhadad and Robin, 1996]. Our objective in designing the HUGG input specification was to keep the input given to the Hebrew SR as similar to the parallel English SR - SURGE, with the exception of language-specific lexemes. We have found that, although meaningful differences exist between Hebrew and English, it is possible to use the input as was defined for SURGE with minor changes, usually by raising the

62 level of abstraction in the specification. We have reviewed some of these phenomena in the Noun Phrase syntax in a previous work [Dahan-Netzer, 1997]. In this work, we have expanded the HUGG grammar to the clause level and found that by using the transitivity system, as defined for verbs in SURGE, we can take care of various phenomena of the Hebrew clause and especially the use of copula.

6.3.1 FUF/SURGE

FUF

In this formalism, all linguistic knowledge is represented as a set of features called a functional description (FD). Each feature is a pair [a:v] – which is composed of a, a unique attribute and v, a value, which is either an atomic symbol, an FD, or a path (which points to another feature in the overall FD and means that the two features must share the same value at all times). If an attribute is not present in an FD, it is equivalent to being present with the value nil. The process of unification is defined on two FDs and, unlike structural unification, is not based on size or order of the terms being unified. Basically, unification X ∪Y is the smallest FD containing both X and Y (i.e., while preserving the FD requirement preventing contradictory values for the same attribute). A grammar in FUF is a meta-FD – a set of FDs with additional features. These features control unification, and further processing of the FD: CSET lists the immediate linguistic constituents of the FD, for further unification. Pattern constrains the linear order of the constituents, and ALT (ALTernation) allows non-deterministic decisions. Unification of an (ALT list-of-FDs) with an FD, is done with the first FD of the list. If it does not succeed, it proceeds to the next one; if all fail unification fails as well. Some special values are possible, such as ANY which enforces a value to be instantiated by the end of the unification process, NONE which indicates that an attribute cannot have any value other than NIL, and GIVEN which requires an attribute to have a value different than NIL at the time of unification. Further elaborations for the FUG formalism in FUF includes types, the usage of FSET, and the special feature CAT (see [Elhadad, 1991]). Practically, we use CFUF [Kharitonov, 1999], a time-efficient implementation of FUF.

63 SURGE

SURGE is a comprehensive, domain-independent portable syntactic realizer for the generation of English, written in FUF. SURGE draws its linguistic sources mostly from the systemic-functional theory [Halliday, 1994], but incorporates other linguistic theories such as HPSG [Pollard and Sag, 1987] and MTT [Mel’cuk. and Pertsov, 1987] as well. The input specification of SURGE was designed in consideration of the overall text generation process and especially with reference to the former process of lexical choice. The input FD con- tains linguistic constituents with functional attributes which mark their functionality in the overall context - such as process and participants. Each constituent includes a special attribute cat which indicates the value of the syntactic category of its syntactic head.

6.3.2 SURGE input of a clause

In SURGE, linguistic constituents are labelled by their thematic role. Nuclear roles refer to the process described by the clause, and its participants, and, therefore, depend on the type of pro- cess described. Satellite roles are the adverbials that describe where/when/why/how the process happened, and do not depend on the process type. The Clause sub-grammar is composed of several orthogonal systems:

• Transitivity system - the ideational system - maps thematic roles into core syntactic roles.

• Voice system - the textual system - which handles possible syntactic alternation which changes the order and function of core syntactic roles.

• Mood system - the interpersonal system - handles variations that are affected by the commu- nication goal of the utterance - i.e., interrogative, declarative, imperative clauses - or by its syntactic function (matrix or relative clause).

• Adverbial system - responsible for the ordering of satellite constituents of the clause.

The transitivity system is based on a basic dichotomy of verbs into simple and composite processes. Simple processes can be events (such as material, mental, and verbal) or relations (ascriptive, possessive, temporal, spatial).

64    cat clause         type [type]                lex [verb]        process          tense [tense]          polarity [polarity]            agent [agent]    participants    affected [affected]

Composite processes involve both an event and a relation. Following Fawcet’s unified analysis of three role thematic structures as a causal superposition of two two-role structures sharing a common role [Fawcett, 1987]:    cat clause         type [type]                lex [verb]    process              tense [tense]          polarity [polarity]            agent [agent]             affected [1][affected]        participants          possessor [1]       possessed [possessed]

An additional approach of designing inputs for surge is by using a lexical process. This approach allows the input to define a process not in terms of transitivity, but with the use of subcategorization. In this approach, based on dependency-grammar like Meaning Text Theory (MTT) [Mel’cuk. and Pertsov, 1987], and following HPSG - a lexical head subcategorizes its constituents (SUBCAT in short) and determines their order.

65 The input in this case has the following structure:    cat clause         type lexical                lex [verb]        process          tense [tense]          polarity [polarity]            role1 [role1]    lex-roles    role2 [role2]

We have further elaborated the possible inputs for surge to allow syntactic inputs as well:   cat clause    · ¸     verb lex [verb]               cat [syn-cat]      subject           lex [lex]          synt-roles           cat [syn-cat]      object     lex [lex]

6.3.3 Main Issues in Hebrew Generation

To date, HUGG is the only syntactic realizer that has been developed for Hebrew generation. One of our objectives is to investigate constraints on the design of the input specification language to a syntactic realization component through a contrastive analysis of the requirements of English and Hebrew. By design, we are attempting to keep the input to HUGG as similar as possible to the one we defined in the SURGE syntactic realization for English [Elhadad, 1992]. We referred to various problems specifically for Noun Phrase generation in previous papers [Dahan-Netzer, 1997], [Dahan-Netzer and Elhadad, 1999], [Dahan-Netzer and Elhadad, 1998a]. We have shown that since a variety of lexical, semantic, syntactic, and pragmatic constraints affect the generation of construct-state (smixut), semantic information must be included in the input to enable paraphrasing when such a construction is not possible. We have also refined the classification of quantifiers and determiners into a new set of determiners, partitive, and quantifier words. In this work, we have pursued the same methodology to the level of the clause.

66 6.3.4 Hebrew Clause

Hebrew’s unmarked order of words in a clause is SVO (Subject-Verb-Object) but word order is relatively free. In addition, subjects are not always explicitly present and several clause structures do not have any verb. Hebrew verbs are inflected by gender, number, and person, and show agreement with the subject as follows:

Past full agreement except for third person plural: hen/hem Axlu (they(fem/masc) ate);

Present agreement of gender and number;

Future full agreement except for second and third person plural.

Definite Objects are marked with a case marker et.

6.3.5 Subjectless Clauses

There are several cases in which a subject is not explicitly pronounced in Hebrew:

Subject Pro-drop is the case where subjects are dropped when they are recoverable. Since Hebrew verbs are inflected for gender, number, and person, and show agreement with the subject, the latter can be omitted in first and second persons in past and future tense.

Imperative clauses, as in English, no explicit subject is pronounced.

General subject - with plural third person subject ”bonym batym hadaSym ba-sxunah” ((Someone- dropped) is building new houses in the neighborhood).

Raising verbs with sentential complement yaZA ba-sof S-lO hiZlaHnu le-hagyaw (it came to be at the end that we didn’t manage to arrive).

Intransitive Clauses

Intransitive clauses have no objects and tend to have an SV order. Unergative verbs are verbs with an agentive subject and express volitional acts ha-yeladym ZaHaku (the children laughed). The SV order is not mandatory and VS is possible as well bA ha-davar ve-ZilZel ba-pawamon (came the postman and-ringed the-bell).

67 Unaccusative verbs have a theme (non-volitional) subject and usually expresses a change of state actions. Hebrew allows both SV and VS orders in this case.

6.3.6 Existential, equative, possessive, and attributive clauses

There is a variety of uses of the Hebrew copula, which are integrated into all kinds of relations in the transitivity system. ha-melex hayah/hu’/yihiyeh semel - ascriptive relation, equative mode. (The-king was/is/will- be a-symbol) ha-melex hu’/[NULL] werom - ascriptive type, attributive mode. (The-king is/[dropped] naked). hayah/yeS/Eyn melex - existential type (There-was/There-is/There-is-no king) hayah/YeS/Eyn le-yarden melex - possessive type of relation. (there-was/there-is/there-is-no to-Jordan a-king) Existential clauses are characterized with the use of the word YeS (”there is”) in the present tense. YeS is considered to be an adverb (for instance, in Rav-Milim online dictionary), but is mostly treated as a semi-verb (or a verboid). However, since modern Hebrew enables sentences such as Yes ly Et ha-sfarym (there-is to-me [case-marker] the-books)- i.e., Yes with the objective case marker Et, it is considered to be a verb as well [Henkin, 1994]. In the past and future tense, the use is of the Hebrew copula to be with inflection, i.e., haya/yihiyeh. YeS + Negation in the present tense is realized by the word Eyn. In past and future tenses, negation is realized with the word lO and the tensed copula. Possessives are also expressed with the word YeS but the possessor is realized with a preposition phrase object, with the preposition . Agreement is determined by the possessed: haytah ly Hatulah (there-was to-me a-kitten) hayu lanu Hatulym (there-were to-us cats) hayu ly Hatulym (there-were to-me cats) Order within possessive clauses is flexible and is affected by various factors such as definiteness of the possessed NP and casuality of the speech act. le-savtA yeS kapryzot (to-grand-ma there-are caprices) YeS le-savtA savlanut biSvily (there-is to-grand-ma patience for-me) Yes le-ImA carTysym la-sereT (there-are to-mother tickets for-the-movie)

68 le-ImA hayu Et ha-carTysym la-sereT (to-mother there-were [case marker] the-tickets for-the- movie) Eyn la-Hatul te’avon (there-is-no to-the-cat appetite) The word yeS is also used for expressing modality, with an infinitive verb as a complement. In ascriptive clauses, a copula is used as well to mark the relation, however, the present tense is realized with the pronoun, which agrees with the subject (the carrier in an attributive mode or that identified in the equative mode). The word order in attributive clauses depends on the definiteness of the carrier – the subject of the clause: ha-sereT hayah mewanyen (the-movie was interesting) hayah sereT mewanyen (was a-movie interesting – it was an interesting movie) *hayah ha-sereT mewanyen (* was the-movie interesting) *sereT hayah mewanyen (* a-movie was interesting) In the present tense, if the carrier is definite then the unmarked structure is verbless: ha-sereT mewanyen (the-movie interesting – the movie is interesting) (agreement of noun and adjective would be understood as a noun phrase). In the marked clause, a copula is used (and in the past/future tense as well): ha-melex hu’ werom (the-king he-is naked) In equative clauses, the unmarked structure is SVO with a copula as the verb, however - there are some cases in the present tense where a the copula can be omitted. ha-morah Selanu hy’ rynah (the-teacher of-us she-is Rina – Our teacher is Rina) Samawatem? rynah ha-morah Selanu (Heard-you? Rina the-teacher of-us)–(Did you hear? Rina is our teacher) In summary, Hebrew relations are realized in most cases with a copula, but differ in the type or relation in the present tense and word order. The distinctions that are defined in the inputs for SURGE (which correspond to the SFL analysis of simple relational processes) enables the correct realizations.

69 Figure 6.1: A fragment of Hspell database for the word celev (dog)

6.3.7 Morphology

Hebrew morphology is quite complex. Several broad-coverage and robust systems exist that handle morphology: RAV-MILIM, 1 a commercial system developed by Yaacov Choueka, AVGAD, which was developed in IBM [Bentur et al., 1992]; recently, Hspell has appeared as a very useful source for morphology analysis2, and we use this resource in our generator. Hspell was developed neither for analysis nor generation of Hebrew morphology, but as a speller in the IVRIX project – a free open-source project that was initiated to establish Hebrew support in the Linux environment. The developers of Hspell hand-coded a list of approximately 22,000 lexemes on which they semi- automatically activated inflectional rules, resulting in a list of 444,400 words. In addition, they collected a set of rules for word disassembling in order to identify prepositions, definite markers, and other infixed words. Hspell linguistic processing includes general inflection rules and a set of exceptions. During the compilation of Hspell, all possible inflections are generated and stored in files (see Figure 6.1). We have indexed the inflected words and their attributes (root, gender, number, possessive, construct

1http://www.ravmilim.co.il 2http://www.ivrix.org.il/projects/spell-checker/

70 state for nouns, and additional tense information for verbs) and in the linearization process, retrieve them on request. This mechanism basically means we use a table-driven morphology generation mechanism.

6.4 Summary

Natural Language Generation, in general, is the process of mapping communication goals to a sur- face realization that satisfies the goals [Reiter and Dale, 2000]. Practically, it is used for generating text in human/computer interfaces and to represent data in a readable manner. An NLG system is traditionally composed of a content planner and a surface realizer. This work is mostly concerned with the surface realizer - lexical chooser and syntactic realization. We reviewed the field in general and especially the FUF/SURGE method. Since our system is planned for generating both Hebrew and English outputs, we expanded the Hebrew syntactic realizer HUGG to deal with sentences and took special care with clauses that represent relational processes. We found that the existing input specification formalism used in SURGE is appropriate to cover the wide variation of surface structures observed in modern Hebrew relational clauses, and obtained an abstraction level for syntactic realization which can be mapped both to Hebrew and English, for Noun Phrases, and for most clauses. In the next chapter, we focus on the lexical knowledge bases that were built for the system – these lexicons function in the basic process of the message generation (the ontology and the Bliss lexicon) and in the lexical choice phase.

71 Chapter 7

Lexical resources

The process of generating text from a telegraphic message (textual or symbolized) is heavily based on lexical information, whether the telegraphic message is being parsed and re-generated, such as in the case of Compansion system [McCoy et al., 1998], or in the tactical approach we have taken in the semantic authoring method. The lexical knowledge encoded for this system is actually the heart of the system. We compiled three lexicons for this work.1:

1. A Blissymbols lexicon

2. An ontology and a lexical chooser

3. A large-scale, reusable verb lexicon for text generation (joint work with Hongyan Jing, Michael Elhadad, and Kathleen McKeown) [Jing et al., 2000].

Each lexicon in this list is intended for a different layer in the system architecture (see Figure 5.2) but all three are interrelated by means of origin and representation, and complete each other in the overall knowledge acquired, resulting in a system with rich lexical information. The Bliss lexicon was designed for the symbols to be presented in our AAC display. It was designed in a way that considers the unique characteristics of Bliss, and although it explicitly contains only the concepts/words, categories they belong to, their part of speech, and their graphical presentation, the connections which they are based on provide semantic information as well. The

1We include here the ontology since its content is derived from lexical sources (WordNet, VerbNet)

72 Bliss lexicon can also be used as a stand-alone Web application and is used as a basis for an editor of the type of ”Writing with Symbols” °c . 2 Section 7.2.1 presents the Blissymbol language and 7.2.2 describes the Bliss lexicon. The ontology is the backbone of the semantic authoring process; it provides a hierarchical struc- ture for the concepts, relations, and properties that are used in the process (in our case, the words represented by the Bliss symbols: verbs, nouns, , and adverbs), and contains information on the conepts/words such as synonyms, parent in the hierarchy, and required relations if such exists. The knowledge acquisition for the ontology originated from WordNet and VerbNet. Section 7.3 describes this process. The lexical chooser (Section 7.4) is partially hand coded (for nouns and adjectives) and partially automatically built. It includes specific knowledge on the syntactic characteristics of the words such as gender, countability, and subcategorization. Much effort was put into the verbs lexicon. A large- scale and reusable lexicon of verbs, which draws on information from various lexical resources such as WordNet, Levin’s verb classes, and ComLex, enables an input specification which contains a verb and a list of arguments. The possible alternations for each verb are given together with shallow information on selectional restrictions. Each alternation is mapped into a set of corresponding sentence structures (called structs) and accordingly to surge inputs. This chapter first gives general background on the use of lexicons in NLG, then presents the lexicons that were compiled for our system (Blissymbols lexicons in Section 7.2.2, ontology in Section 7.3 and verbs lexicon in Section 7.4). In each section we describe the lexical sources which were used for their construction.

7.1 Lexicons in NLG

Contemporary grammatical theories are becoming more lexically-driven, realizing that lexical knowl- edge and lexical semantics play a central role in the overall structure of an utterance, acting as the interface between meaning and form (concepts and syntax) [Faber and Us´on,1999]. In general, the link between the conceptual structure and the syntactic function is called the linking theory. There are three main approaches to the lexical function [Faber and Us´on,1999]:

1. Role-centered approach (Government Binding Theory for instance): in this approach, a set of

2http://www.widgit.com/products/wws2000/

73 thematic roles are considered to capture the generalizations concerning the relation between syntax and semantics (which thematic role can be used as which syntactic function).

2. Predicate-centered approach (Levin’s verb classification for example): predicates are com- posed of a set of primitive elements. Thematic roles depend on the primitives and on the event structure of the word, and words are arranged in classes accordingly. The clause struc- ture is determined by the composition of the primitives and the eventuality.

3. Constructionist approach: states of affairs are classified into states, events, actions – and these determine the thematic roles and overall structure of the clause.

Whichever approach is taken, in an NLG system, the function of the lexicon is to mediate between meaning and form. The lexicon must be adjusted both to the tactics of the syntactic realizer (i.e., the level of abstraction of its input specification) and to the meaning representation. In practice, the vocabulary and its coding depend strongly on the domain of the system realized, since most systems are domain-specific; broad-coverage lexicons which could be adjusted to both sides (meaning and form) are not available. Lexicons are usually hand-coded with the specific senses of words in the system’s context. Since our system is not domain-dependent and has a relatively rich vocabulary (approximately 2,200 words, the vocabulary found in the Blissymbols lexicons we use), we need to find and identify available lexical sources which can be used both for semantic authoring, which relies heavily on the thematic structure (the predicate-centered approach), but can be easily transformed to the constructionist approach of the input specification of SURGE (the transitivity system). We have achieved this objective by using existing lexical knowledge bases for constructing a robust, reusable lexicon for generation and adjusting it to the input specification of SURGE/HUGG. In the next sections, we survey the existing lexical knowledge sources which have provided us with the required knowledge: Levin’s verb classes [Levin, 1993], WordNet, VerbNet, and FrameNet.

7.1.1 Levin’s verb classes

Levin [Levin, 1993], in her influential work, has sorted English verbs into classes which share com- mon syntactic structure. Levin showed that there is a very strong connection between the meaning

74 of the verb and the possible alternations it allows in a clause. For example, consider the Sub- stance/Source Alternation:

1a. Heat radiates from the sun.

1b. The sun radiates heat.

This alternation is possible only for verbs of substance emission which take two arguments: a source and a substance emitted by it. The object of the intransitive form (1a above) has the same semantic relation to the verb as the object of the transitive form (1b above). These two arguments must be expressed in both transitive and intransitive uses. The source is expressed as the subject in the clause with the transitive occurrence of the verb, and as the object of the preposition from in its intransitive use. Levin defined 80 such semantic classes, listed all verbs of each class, the semantic constraints of each, exceptions, and other idiosyncrasies. ¿From an NLG perspective, this knowledge is very useful as part of the transitivity system of the syntactic realizer in mapping the various semantic classes to possible syntactic structures.

7.1.2 Online Resources

¿From a computational point of view, there are five main categories of knowledge that should be included in a Lexical Knowledge Base (LKB) [Faber and Us´on,1999]:

1. Phonological information (e.g., sound system, intonation, stress).

2. Morphological information (part of speech, irregularities).

3. Syntactic information (subcategorization).

4. Semantic information (selectional restrictions, relationships with other words).

5. Pragmatic information (casuality, communicative intentions, register, and genre).

Choosing a representation of the lexical knowledge is a crucial step in the construction of an NLG system in general and in our work in particular. The choice must consider:

75 1. availability,

2. adjustability,

3. reusability,

4. multilinguality.

At this date, there are several online reusable lexicons (mostly of verbs) that are used for NLP research. Not all lexicons contain all of the information specified above, and many lexicons are structured in an application-driven manner, i.e., containing only words and information necessary for the particular application. The most widely used lexicons are WordNet, which includes nouns, adverbs, adjectives, and verbs, and some verb lexicons FrameNet, VerbNet, and ComLex. Verb lexicons are meaningful since they link the semantic content of concepts that have to be realized with the syntactic structure that determines subcategorization and, therefore, the sentence structure.

WordNet

Wordnet ([Miller et al., 1990], [Miller, 1995]) 3 is an online lexical database which includes (in version 2.1) 11,488 verbs, 117,097 nouns, 22,141 adjectives, and 4,601 adverbs in English. Each entry in WordNet includes a list of synonyms (a synset), a glossary and some examples of usage. Word entries are determined according to orthography and, therefore, different senses (as in the case of bank or table) are enumerated and may belong to different synsets. The strength of WordNet relies on the fact that words or synsets are interconnected with additional lexical-semantic relations such as hyponyms (the subclass relation) and hypernyms (the superclass relation) between synsets, and antonyms (opposites) between words. For verbs, there are two main additional relations: the troponym (from events to their subtypes) and entailment (from an event to events they entail). In this approach, WordNet forms a semantic net of synsets, and each synset actually represents a semantic concept. The hyponymy (such as the relationship between a pear and fruit) relation is transitive and forms a hierarchy with a supertype (entity). This overall structure is very useful for knowledge acquisition.

3http://wordnet.princeton.edu/

76 5 senses of girl Sense 1 girl, miss, missy, young lady, young woman, fille -- (a young woman; "a young lady of 18") => woman, adult female -- (an adult female person (as opposed to a man); "the woman kept house while the man hunted") Sense 2 female child, girl, little girl -- (a youthful female person; "the baby was a girl"; "the girls were just learning to ride a tricycle") => female, female person -- (a person who belongs to the sex that can have babies) Sense 3 daughter, girl -- (a female human offspring; "her daughter cared for her in her old age") => female offspring -- (a child who is female) Sense 4 girlfriend, girl, lady friend -- (a girl or young woman with whom a man is romantically involved; "his girlfriend kicked him out") => woman, adult female -- (an adult female person (as opposed to a man); "the woman kept house while the man hunted") => lover -- (a person who loves or is loved) Sense 5 girl -- (a friendly informal reference to a grown woman; "Mrs. Smith was just one of the girls") => woman, adult female -- (an adult female person (as opposed to a man); "the woman kept house while the man hunted")

Figure 7.1: Wordnet entry for the word girl

A similar lexicon for European languages (MultiWordNet [Pianta et al., 2002]) is being developed, as well as for Hebrew [Ordan and Wintner, 2005].

VerbNet

VerbNet[Kipper et al., 2000] 4 is a verb lexicon compatible with WordNet, and enriched with additional semantic and syntactic information, mainly derived from Levin’s verb classes [Levin, 1993]. This knowledge connects semantic and thematic information of the verbs with their syntactic structure and selectional restrictions. The syntax information is coded in the Lexicalized Tree Adjoining Grammar (LTAG) [Schabes et al., 1988], formalism, and information is further expanded with knowledge about the eventuality structure of each verb. Each sense of a verb in VerbNet refers to a particular class. Selectional restrictions are explicitly assigned, as well as additional semantic characterization if this was not captured by the verb’s class. The additional semantic information refers to the eventuality of the verbs (i.e., whether the predicate is true in the preparatory, culmination, or consequent stage of an event).

4http://www.cis.upenn.edu/group/verbnet/

77 build-26.1-1 WordNet Senses make(6 11 12 17 28 32 33 39) Thematic Roles Agent[+animate OR +machine] Asset[+currency] Beneficiary[+animate OR +organization] Material[+concrete] Product[+concrete] Frames Basic Transitive Benefactive Alternation (double object) Benefactive Alternation (for variant) Material/Product Alternation Transitive (Material Object) Material/Product Alternation Transitive (Product Object) Raw Material Subject Alternation () Sum of Money Subject Alternation (Agent Subject) Sum of Money Subject Alternation (Asset Subject) Unspecified Object Alternation Verbs in same (sub)class [build, carve, cut, make, sculpt, shape] WordNet Senses watch(1 2 3 4 5 6) Thematic Roles Experiencer[+animate] Stimulus[] Frames Basic Transitive () "The crew spotted the island" Experiencer V Stimulus Verbs in same (sub)class [descry, discover, espy, examine, eye, glimpse, inspect, investigate, note, observe, overhear, perceive, recognize, regard, savor, scan, scent, scrutinize, sight, spot, spy, study, survey, view, watch, witness, sniff]

Figure 7.2: VerbNet entries for make - build-26.1 and watch

FrameNet

FrameNet [Baker et al., 1998] 5 is an online lexicon for English developed at Berkeley University, based on frame semantics and supported by corpus evidence. As of October 2005, it contains about 8,900 lexical units, with about 6,100 of them fully annotated, with 625 semantic frames exemplified in about 135,000 annotated sentences. A FrameNet entry lists every set of arguments it can take, including the possible sets of thematic roles, syntactic phrases and their grammatical function. A lexical unit is a pair . Each sense of a word belongs to a different semantic frame: a structure that describes the particular type of event, object, or situation and possible participants if the word is predicating. For instance, the Apply heat frame describes a common situation involving a Cook, Food, and a Heating Instrument, and is evoked by words such as bake, blanch, boil, broil, brown, simmer, steam. The roles of semantic frames are called frame elements and they usually describe syntactic dependents of a word [Ruppenhofer et al., 2005]. Relations between words are expressed by several relations defined on frames: Inheritance (IsA), Using relation (for instance, the Speed frame uses the Motion frame), and Subframe (e.g.,

5http://framenet.icsi.berkeley.edu/

78 abstain.v Frame: Forgoing Definition COD: restrain oneself from doing something Frame Elements and their Syntactic Realizations The Frame elements for this word sense are (with realizations): Frame Element Number Annotated Realizations(s) Desirable (15) PP[from].Dep (10) PP[on].Dep (2) PPing[from].Dep (3) Forgoer (15) NP.Ext (15) Valence Patterns: These frame elements occur in the following syntactic patterns: Number Annotated Patterns 15 TOTAL Desirable Forgoer (10) PP[from] Dep NP Ext (2) PP[on] Dep NP Ext (3) PPing[from]Dep NP Ext

Figure 7.3: FrameNet entry of the verb abstain the Criminal process frame has subframes of Arrest, Arraignment, Trial, and Sentencing).

ComLex

Comlex [Macleod and Grishman, 1995] 6 is an English computational lexicon developed at New York University, which contains approximately 36,000 lexical items (21,000 nouns, 8,000 adjectives, and 6,000 verbs). Each entry is organized as a nested typed feature-valued list, with a predefined set of possible features and complements. Each entry contains morphological data and subcatego- rization for predicate words. Subcategorization is marked with syntactic features only such as the complement phrase type and control features (e.g., NP, NP-PP).

(verb :orth "abstain" :subc ((intrans) (pp :pval (("from"))) (p-ing-sc :pval (("from")))))

Figure 7.4: ComLex entry of the verb abstain

7.1.3 Choice of Lexical Sources

In our implementation, we have used Levin’s verb classes and ComLex (in the verbs lexicon for generation), WordNet was used for both ontology and for the verbs lexicon. We have also mapped each of the Bliss symbols in the Bliss lexicon to WordNet senses. This is a somewhat problematic decision since it can narrow the variety of meanings that the Bliss symbol may represent. VerbNet was used for the ontology since it refers to both lexical sources that were used in the verbs lexicon: Levin’s alternations and WordNet senses. We have not used FrameNet since it does not refer to jeither WordNet or Levin, which was impractical for our application.

6http://nlp.cs.nyu.edu/comlex/index.html

79 Figure 7.5: Hebrew and Bliss Medical Words

7.2 Bliss Lexicon

The Bliss lexicon provides the list of Bliss symbols accessible to the user, along with their graphic representation, semantic information, and the mapping of symbols to English and Hebrew words. Bliss is constructed to be a written-only language, with basically non-arbitrary symbols. The form of Bliss symbols is rooted in their meaning in an iconic manner [Ducrot and Todorov, 1983]. Because words are structured from semantic components, the graphic representation by itself pro- vides information on words’ connectivity. For example, the written form of the words in Figure 7.5 indicates nothing about their meaning (for non Hebrew-readers) or their semantic relatedness. In contrast, the Bliss forms of the words suggest a possible meaning connection (in this example: doctor, nurseand hospital). The next section provides a thorough description of the language. Section 7.2.2 describes the implementation of the Bliss lexicon which is the basis for the graphic representation and to the vocabulary of the communication board.

7.2.1 Overview on Blissymbolics

Blissymbols7 is an iconic language founded as a written language by Charles . Bliss, adopted in the 1970s for communication of non-speaking children. Although located low on the transparency scale of symbols, we have chosen to implement our system for Blissymbols for various reasons. While it is not used as much as PCS, for instance, people that use it find it not as a set of symbols, but as a language. An Israeli user claims “I speak

7I use the terms Blissymbols, and Bliss for Short, interchangeably.

80 three languages: Hebrew, English, and Bliss” [Nir, 2005]. From the linguistic point of view, Bliss is a challenging language. Its semantic structure is appealing and provides a useful basis for the structuring of a computerized lexicon for the process of natural language generation. In addition, the lack of up-to-date software for Bliss in Hebrew without doubt affects the number of users of Bliss symbols in Israel.

The History of Bliss

Blissymbolics (bliss in short) is a graphic meaning-referenced language, created by Charles Bliss to be used as a written . It was first published in 1949 and elaborated later in 1965 in his book Semantography [Bliss, 1965]. Bliss, a survivor of the Holocaust, was influenced by the Chinese orthography system and his life experience, and wished to establish an understandable written language that could be used by people of different nations and languages – as he believed that language misunderstanding is a main cause of wars in the world. In 1971, the bliss symbol system was first used for communication with severely language- impaired children, as the staff of The Ontario Crippled Children’s Center (OCCC) realized that a set of symbols, more abstract than pictures, will enable non-speaking children to communicate more effectively. Shirley McNaughton from the OCCC found about Blissymbols and the center adapted their use, and new symbols were especially developed, since many words that were in use by the handicapped children were missing in the language. Charles Bliss visited OCCC in 1972 and helped to improve and revise the new symbols. Ever since, it has been widely used for communication with children who cannot (yet) learn or read sound-referenced words [McDonald, 1982]. Bliss usage is standardized and new symbols are added by an international committee of the BCI - Blissymbolics Communication International. The authority of the BCI is based on its usage since 1971, through legal agreements with Charles K. Bliss [(BCI), 2004]. Bliss is used in more than 33 countries worldwide and has been translated to use 17 languages [Beukelman and Mirenda, 1998]. The use of Bliss symbols is possible with three approaches [Hunnicutt, 1986]:

1. The telegraphic style – word-to-word, no morphological or syntactical analysis.

2. Bliss syntax – the original intention of Charles Bliss was to make the language as simple as

81 possible: (I) SVO order, (II) negative marker placed before the verb, (III) modifiers precede modified word, (IV) question marks are located as the first symbol of a sentence, the rest is as a declarative sentence, (V) exclamations are prefaced with an exclamation mark and, finally, (VI) place and time are located at the beginning of a sentence (place first, then time).

3. Natural spoken language syntax - following the language’s syntactic rules.

In most cases, the decision was to adopt the spoken language syntax with Bliss symbols, since it assists with literacy skills and eases reading and writing acquisition later on. In the adoption of Bliss to Hebrew (and in as well), the decision was to write Bliss right-to-left like the written form of the spoken language. This forced not only writing a sequence in the opposite direction, but also changing the direction of the symbols. However, not all symbols were reflected - and the lack of uniformity caused problems in trials to adjust Bliss software to Hebrew. Most commonly, Bliss symbols were used on carton displays but [Waller and Jack, 2002] point to two electronic devices that are recently being used by Bliss symbols users: dedicated devices like Dynavox and Bliss communication board software such as WinBliss, Bliss For Windows with Clicker. However, none of these electronic devices generate full sentences, but only pronounce the names of the symbols.

The Bliss Language

Bliss was designed as a “complete pictorial symbol language” [McDonald, 1982]. Bliss symbols are meaning-referenced (as opposed to the sound of referenced symbols of the spoken language). Each symbol represents a thing, an action, an evaluation, or an abstract meaning. Symbols are composed from a relatively small number of atom symbols (“symbol elements”) of several types (see Figure 7.6). Following BCI’s publication on the fundamental rules of Bliss, [(BCI), 2004] there are two main types of Bliss symbols: Bliss-characters, which are the building blocks of the language and are indivisible (such as book or medical), and Bliss-words, which can be Bliss-characters with a particular meaning or a sequence of Bliss-characters, separated from each other with a Blissymbolic quarter space (Bliss-words are separated from each other with a Blissymbolic full space, or a half-

82 Figure 7.6: Example for Bliss symbol types space away from punctuation). It is important to note that there are no different fonts or character variations (such as italic or Sans) since the meaning of the symbol can change with these small variations. The traditional types of bliss symbol words [(BCI), 2004] [McDonald, 1982] are:

Arbitrary symbols – symbols with no pictorial relationship between form and meaning (such as a-an, the, that, digits 1, 2, .. and mathematical signs +, -, ×). Some of the arbitrary symbols (which Bliss invented) were rationalized: such as “action” reminiscent of a volcano shape.

Ideographs – symbols that create a graphic association between the symbol and the concept it represents (such as before, after, in, on, down, up).

Pictographs – symbols whose drawing resembles what they intend to symbolize (such as house, animal, flag) and usually refer to concrete objects.

Compound Symbols – group of symbols arranged to represent objects or ideas (such as home, happy, angry, sad, school, university, teacher).

83 Figure 7.7: Usages of Pointers for Meaning Selection

The meaning of a symbol depends on four main parameters, in addition to its shape: size, position, and configuration which is composed of the orientation and spacing. All four are relative to a square with a grid (which can be of any size). The base of the square is the earthline and its top is the skyline. Each symbol can appear in three sizes: full size, half size and quarter size. Size changes meaning as in the case of a circle: a full size represents sun and half-size is a mouth (see Figure 7.9.A). Position is also meaningful as in the case of belongs to, and/also, and with. The configuration of symbols consists of direction (forward, backward, down, up, for instance) and spacing far, near, high, low. An important ideograph is the pointer, which is part of a symbol and is used to point to a specific attribute of the whole meaning, i.e., a selector. For instance body, chest, waists, crotch, shoulder (see Figure 7.7). Symbols may be grouped together in order to form meaning. The two main modes of grouping are by superimposing symbols (wheelchair, rain) or by a sequential position, either separated or touching (aunt, school). Indicators are special symbols that Charles Bliss invented to mark certain qualities of the words represented, and aimed to reduce possible ambiguity of the grammar. Although indicators can be identified as part of speech markers, they were not intended to be interpreted as such. Indicators are symbols of quarter size and located above the skyline of the square.

Thing Indicator refers to a chemical thing as Charles Bliss defined it. This is an object that can be seen, touched or weighed, i.e., the symbol corresponds to a concrete noun. Practically, it is not required to use the thing indicator, unless it is essential for distinguishing with competing abstract nouns (time vs. clock).

Action Indicator - a quarter-sized action symbol indicating actions taking place in the present

84 Figure 7.8: Example: mind, minds, brain, thoughtful, think, thought, will think.

(i.e., these symbols correspond to verbs of activity).

Past Action Indicator – a quarter-sized past symbol.

Future Action Indicator – a quarter-sized future symbol.

Description (evaluation) Indicator – evaluations or judgments of qualities (that may change in time).

Plural Indicator – a quarter-sized multiplication symbol to indicate a plural number of things.

Figure 7.8 exemplifies the change of meaning through the use of indicators. Several symbols function as modifiers, and are prefixed or suffixed to the meaning-carrying symbols. Such symbols are the multiplier which is used for augmentations, the opposite symbol, and the intensifier (see Figure 7.9.B). Indicators are not located above modifiers. There is not necessarily a one-to-one mapping between symbols and words, and symbols may have more than one meaning, depending on context. For instance, the meaning of the symbol to speak may also be to say, to tell, to narrate, to talk, to report.

7.2.2 The Design of the Bliss Lexicon

We have designed and implemented a lexicon of Bliss-Hebrew-English words that takes into ac- count the special characteristics of Bliss symbol language.The lexicon can be searched by keyword (doctor), or by semantic/graphic component: searching all words in the lexicon that contain both person and medical returns the symbols aiding tool, artificial insemination, dentist, doctor, nurse, etc. The design of the lexicon enables easy manipulation of the symbols (graphical editing, adding new synonyms) and an easy way to insert new symbols (by combining existing symbols or by

85 Figure 7.9: Semantic modifiers: much, intensifier, opposite. drawing a new one). It contains both Hebrew and English words and adjusts the representation according to the language. In addition to its structure and composing components, each word in the lexicon is, in addition, assigned one or more domain tags which are ordered in a hierarchy: this addition was required for a more efficient structuring of the communication board itself: if a user chooses the school context, the system uses the subset of words that are labelled under the school tag in its dynamic presentation of symbols. A somewhat similar approach was taken in the development of BlissWord [Andreasen et al., 1998], where symbols are represented in a picture format augmented with the symbol’s name, synonyms, ISO number, basic shapes (e.g., wavy line), key components (e.g., water), indicators, and categorization (e.g., Quick → being alive → things we do → Moving and Staying still → Moving). A symbol can be retrieved by specifying shapes or components contained in the desired symbol, by searching the hierarchy of the symbol categories or by their combination, as well as by the English reference or in the most frequent symbols list. When Bliss symbols were adopted to Hebrew it was decided that they would be presented from right to left, a decision that is valid for most but not all symbols. This forced us to add a marker

86 to each symbol to indicate whether it has to be drawn inverse. Figure 7.10 shows the diversity in representation: sequences may be inverse, but also superimposed symbols as well as atoms.

Figure 7.10: Hebrew vs. English Representation of Symbols

The Bliss lexicon is available as a Web application – users can connect to the site and search for words by drawing parts of the symbol or by English or Hebrew translation, and group words according to topic. It is also possible to insert new words (including drawing the Bliss symbol).8 (See Figure 7.12)

7.2.3 Bliss Lexicon Software Development

Since the Bliss lexicon includes graphic symbols and we decided to make it available online as a Web application, its development required specific attention. The Bliss lexicon library is written 9 in Java 1.5, the front-end is implemented using JSP and Applets. The visualization of symbols is done using SVG - an XML-based language for the description of vector graphics. The mappings of symbols to words in natural language and back to symbols, the basic relations between symbols, and the visual representation of basic symbols are stored in XML files. The more complex relations between symbols and the visual representation of complex symbols are inferred programmatically.

8The lexicon is available at http://www.cs.bgu.ac.il/∼bliss 9The Bliss Lexicon has been implemented by Yoav Goldberg, the department of Computer Science, Ben Gurion University

87 Figure 7.11: Hierarchy of Bliss Objects

The vocabulary of our lexicon contains approximately 2200 symbols, as found in the Hebrew and English Bliss lexicons [Shalit et al., 1992] [Hehner, 1980]. For implementation, all symbols in the Bliss lexicon were entered into a database, according to their structure: Symbols are either atoms (ideograph), pictographs, superimposed symbols, or a sequence of symbols or symbols touching each other. The database was then checked for coherency and was revised accordingly. In the written lexi- cons (for both Hebrew and English) symbols are represented and interpreted by their components. However, not all components are present in the lexicon and those had to be added to preserve coherency. Each symbol was implemented as an object with a unique ID. All symbols (whether atoms or composite) can be manipulated in the same manner. The object includes information about the graphic representation (but not the graphics), information about the Hebrew and English words, and relatedness to other symbols. Visualization is done by a separate module.

7.3 Using Lexical Resources for the System Lexical Chooser

For the acquisition of the concepts/relations database, we use two main sources: VerbNet [Kipper et al., 2000] and WordNet [Miller et al., 1990]. WordNet was chosen as the source of information for the concepts that are linguistically real-

88 Figure 7.12: A snapshot of the Bliss Lexicon Web Application ized as nouns, adjectives, and adverbs since it provides hierarchy information (using the hypernym relations of synsets). WordNet’s information on verbs was enriched with VerbNet’s data. VerbNet was chosen as the source knowledge for the realization of processes for the following reasons:

• It refers both to WordNet senses of verbs and to Levin’s alternations. These two sources of information are easily mapped in the form of lexical chooser we wanted to implement.

• Thematic roles - the coding of selectional restrictions in VerbNet relies on a feature hierarchy where for instance, animate subsumes animal and human, concrete subsumes both animate and inanimate. This description of selectional restrictions fits the concept of ontology as we constructed it.

We use the information in WordNet and VerbNet for bootstrapping concept and relation hierarchies. For all words of the Bliss Lexicon, we manually choose the word’s sense (synset) according to WordNet. For all nouns, we induce the hypernym hierarchy from WordNet,

89 resulting in a tree of concepts – one for each synset appearing in the list of words (see Figure 5.3). In addition to the concept hierarchy, we derive relations among the concepts and predicates by using the VerbNet lexical database. VerbNet supplies information on the conceptual level, in the form of selectional restrictions for the thematic roles (see Figure 5.4). These relations allow us to connect the concepts and relations in the derived ontology to nouns, verbs, and adjectives. The ontology is used as the basis for the CG construction and supplies the selectional restriction information that is needed in the authoring process. The concepts hierarchy contains both objects (nouns) and events (verbs). Separate, but strongly connected to the ontology, a lexical chooser is also structured to include specific lexical information on the concepts that have to be lexicalized: their lexemes, syntactic information (such as subcategorization, gender for nouns). For verbs, we use the integrated lexicon (see below). Information on nouns is retrieved from WordNet, and for Hebrew is hand-coded.

7.4 Integrating a Large-scale Reusable Lexicon for NLG

The lexicon of an NLG system is a significant component since it links the semantic content to its final syntactic representation. Verbs determine the clause structure by constraining the arguments: their number, order, and selectional restrictions. Nouns affect the selection of collocational adjec- tives (e.g., strong tea and not powerful tea and not strong juice). In most NLG systems, knowledge is hand-coded anew for the specific domains of the applications. We have integrated a large-scale, reusable lexicon with the FUF/SURGE [Elhadad and Robin, 1996] syntactic realization system. The integration of the lexicon with FUF/SURGE has various benefits, including the possibility of accepting semantic input at the level of WordNet synsets, the production of lexical and syn- tactic paraphrases, the prevention of non-grammatical output, reuse across applications, and wide coverage. Natural Language generation starts from semantic concepts and then finds words to realize such semantic concepts. Most existing lexical resources, however, are indexed by words rather than by semantic concepts. Such resources, therefore, can not be used for generation directly. Moreover, generation needs different types of knowledge, which typically are encoded in different resources. However, the different representation formats used by these resources make it impossible to use

90 them simultaneously in a single system. To overcome these limitations, we built a large-scale, reusable lexicon for generation by combining multiple existing resources. The resources that are combined are WordNet, Levin’s English Verb Classes and Alternations (EVCA), and COMLEX. In combining these resources, we focused on verbs. The combined lexicon includes rich lexical and syntactic knowledge for 5,676 verbs. It is indexed by WordNet synsets as required by the generation task. The knowledge in the lexicon includes:

• A complete list of subcategorizations for each sense of a verb.

• A large variety of alternations for each sense of a verb.

• Frequency of lexical items and verb subcategorizations in a version of the Brown corpus tagged by Wordnet synsets.

• Rich lexical relations between words

The sample entry for the verb “appear” is shown in Figure 7.13. It shows that the verb appear has eight senses (the sense distinctions come from WordNet). For each sense, the lexicon lists all the applicable subcategorization for that particular sense of the verb. The subcategorizations are represented using the same format as in COMLEX. For each sense, the lexicon also lists applicable alternations, which we encoded based on the information in EVCA. In addition, for each subcate- gorization and alternation, the lexicon lists the semantic category constraints on verb arguments. In the figure, we omitted the frequency information derived from the Brown corpus and lexical relations (the lexical relations are encoded in WordNet). The construction of the lexicon is semi-automatic. First, COMLEX and EVCA were merged, producing a list of syntactic subcategorizations and alternations for each verb. Distinctions in these syntactic restrictions according to each sense of a verb are achieved in the second stage, where WordNet is merged with the result of the first step. Finally, the corpus information is added, complementing the static resources with actual usage counts for each syntactic pattern. For a detailed description of the combination process, refer to [Jing and McKeown, 1998].

91 appear: sense 1 give an impression ((PP-TO-INF-RS :PVAL ("to") :SO ((sb, −))) (TO-INF-RS :SO ((sb, −))) (NP-PRED-RS :SO ((sb, −))) (ADJP-PRED-RS :SO ((sb, −) (sth, −))))) sense 2 become visible ((PP-TO-INF-RS :PVAL ("to") :SO ((sb, −) (sth, −))) ... (INTRANS THERE-V-SUBJ :ALT there-insertion :SO ((sb, −) (sth, −)))) ... sense 8 have an outward expression ((NP-PRED-RS :SO ((sth, −))) (ADJP-PRED-RS :SO ((sb, −) (sth, −))))

Figure 7.13: Lexicon entry for the verb appear

((SENSE 1) (RALT STRUCTS-VERB-WATCH-SENSE-1 (((STRUCT NP-ING-OC) (ARGS ((ALT SELECTIONAL-WATCH-1-NP-ING-OC (((1 ((ANIMATE YES))) (2 ((ANIMATE NO)))) ((1 ((ANIMATE YES))) (2 ((ANIMATE YES))))))))) ((STRUCT NP-NP-PRED) (ARGS ((ALT SELECTIONAL-WATCH-1-NP-NP-PRED (((1 ((ANIMATE YES))) (2 ((ANIMATE NO)))) ((1 ((ANIMATE YES))) (2 ((ANIMATE YES))))))))) ((STRUCT NP) (ARGS ((ALT SELECTIONAL-WATCH-1-NP (((1 ((ANIMATE YES))) (2 ((ANIMATE NO)))) ((1 ((ANIMATE YES))) (2 ((ANIMATE YES))))))))))))

Figure 7.14: VerbNet make - build-26.1

92 7.5 Summary

We have presented in this chapter the three lexicons that were constructed for the system (Bliss Lexicon, Ontology derived from lexical resources, and an NLG resource for lexical choice of English verbs). The common ground for all lexicons is the use of a WordNet sense: in the Bliss lexicon, in the verbs lexicon and in the ontology. Referring to the knowledge that LKBs encode (Section 7.1.2), our system includes morphological information (part of speech, irregularities), syntactic information (subcategorization), and semantic information (selectional restrictions, relationships with other words). We have not yet encoded phonological knowledge or pragmatic knowledge. Pragmatics in the system are expressed in the choice of context on the communication board and the choice of defaults that are later encoded in the SR. The Bliss lexicon contains the graphic information of the symbols and their POS (which is needed for the task of finding the right sense for the words when creating the ontology). The ontology connects the meaning to the possible syntactic structure as it controls the possible symbols through selection constraints and communicating with the lexical chooser. The next chapter presents SAUT - the semantic authoring system and the communication board that uses SAUT as its processing engine.

93 Chapter 8

Communication Boards

This chapter describes the overall generation process in the AAC system as a form of semantic authoring. We present the SAUT system as a general prototype for semantic authoring, and its adaptation to a dynamic communication board for Bliss. Section 8.1 describes first SAUT as a general tool. Section 8.2 presents the NLG-AAC commu- nication board, based on the authoring tool; the overall layout of the display is presented.

8.1 The SAUT Semantic Authoring Tool

SAUT [Biller, 2005] [Biller et al., 2005] is an authoring system for logical forms encoded as con- ceptual graphs (CG). The system belongs to the family of WYSIWYM (What You See Is What You Mean) [Scott et al., 1998] text generation systems: logical forms are entered interactively and the corresponding linguistic realization of the expressions is generated in several languages. The system maintains a model of the discourse context corresponding to the authored documents. The user edits a specific document by entering utterances in sequence, and maintaining a representation of the context. While the user enters data, the system performs the standard steps of text generation on the basis of the authored logical forms: reference planning, aggregation, lexical choice, and syntactic realization – in several languages (we have implemented English and Hebrew and discuss Bliss below). The feedback in natural language is produced in real-time for every modification performed by the author. The architecture of the system is depicted in Figure 8.1. The two key components of the system are the knowledge acquisition system and the editing

94 Figure 8.1: Architecture of the SAUT System

component. The knowledge acquisition system is used to derive an ontology from sample texts in a specific domain (see section 7.3). In the editing component, users enter logical expressions on the basis of the ontology.

8.1.1 Conceptual Graphs

Overview

Conceptual graphs are logical knowledge representation developed by John Sowa [Sowa, 1984]. CGs are an understandable model that can express natural language utterances, unlike, for instance, first order logic predicates. CGs enable authors to model linguistic phenomena such as quantification, determination in a formal way. Sowa based his work on the existential graphs of Charles S. Peirce [Roberts, 1973] and the semantic networks of artificial intelligence [Sowa, 1987]. Conceptual graphs are widely used in various research fields such as information retrieval, NLP, and expert systems. A conceptual graph is a directed bipartite graph with two kinds of nodes:

• Concepts

• Relations

95 [Cat] -> (On) -> [Mat].

Figure 8.2: Linear representation of a Conceptual Graph

In a graphical representation, concepts are drawn as squares and relations are circles. In a linear representation of a graph, square brackets are used for concepts and curved parentheses for relations. 1 Concepts represent objects, events, and abstract entities, while relations represen the relation- ships among concepts. Concepts and relations are typed; types are taken from an ontology and are structured in a hierarchy. Concept types are ordered in a lattice, with Entity (or the supertype, universal type) at top and Absurdity (the absurd type) at the bottom. In a conceptual graph, a concept node can represent either an entire class, a type or a referent to a particular instance of the class. The ] symbol represents a definite article i.e., [Cat:#] means the cat. A node containing [Cat] only means the indefinite a cat, and represents a generic type. It can also contain a referent to a named entity as in the case of [Cat:Mitzi]. A concept in a graph may contain additional features and the value of a feature may be a conceptual graph as well. Each relation has a type which determines an arity 2 which is expressed with the number of concepts it is connected to. The arcs between the nodes are directed, and direction is influenced by the meaning of the relation. A concept of a form

[Con1] -> [Rel] -> [Con2] is to be read: the REL of Con1 is Con2 [Mann, 1996]. Monadic relations such as (NOT) are attached to one concept only, but most relations have a larger arity. The type of relation also determines the type of concepts to which it connects. Possible basic operations on conceptual graphs to form new graphs from existing graphs, were defined by Sowa [Sowa, 1984]: restrict, join, simplify, and copy, and of higher operations such as projection and unification (maximal join).

1Examples are taken from http://www.jfsowa.com/cg/cgexampw.htm 2Arity is the number of arguments to a term

96 8.1.2 Authoring Tools

The input data to an NLG system can be either derived from an existing application database or it can be authored specifically to produce documents. Applications where the data are available in a database include report generators (e.g., ANA [Kukich, 1983], PlanDoc [Shaw et al., 1994], Multimeteo [Coch, 1998], FOG [Goldberg et al., 1994]). In other cases, researchers identified application domains where some of the data are available, but not in sufficient detail to produce full documents. The WYSIWYM approach was proposed ([Power and Scott, 1998], [Paris and Vander Linden, 1996]) as a system design methodology where users author and manipulate an underlying logical form through a user interface that provides feedback in natural language text. The effort invested in authoring logical forms – either from scratch or from a partial application ontology – is justified when the logical form can be reused. This is the case when documents must be generated in several languages. When documents must be produced in several versions, adapted to various contexts or users, the flexibility resulting from generation from logical forms is valuable.

WYSIWYM

In an influential series of papers [Power and Scott, 1998], WYSIWYM (What You See Is What You Mean) was proposed as a method for the authoring of semantic information through direct manipulation of structures rendered in natural language text. A WYSIWYM editor enables the user to edit information at the semantic level. The semantic level is a direct controlled feature, and all lower levels which are derived from it are considered presentation features. While editing content, the user gets feedback text and a graphic representation of the semantic network. These representations can be interactively edited, as the visible data is linked back to the underlying knowledge representation. Using this method, a domain expert produces data by editing the data itself in a formal way, using a tool that requires only knowledge of the writer’s natural language. Knowledge editing requires less training, and the natural language feedback strengthens the confidence of users in the validity of the documents they prepare. The semantic authoring system we have developed belongs to the WYSIWYM family. The key aspects of the WYSIWYM method we investigate are the editing of the semantic information. Text is generated as a feedback for every single editing operation. Specifically, we evaluate how

97 Figure 8.3: Snapshot of editing state in the SAUT system ontological information helps speed up semantic data editing.

8.1.3 The SAUT Editor

To describe the SAUT editor, we detail the process of authoring a document using the tool. When the authoring tool is initiated, the next windows are presented (see Figure 8.3):

• Input window

• Global context viewer

• Local context viewer

• CG feedback viewer

• Feedback text viewer

• Generated document viewer

The user operates in the input window. This window includes three panels:

98 • Defaults: rules that are enforced by default on the rest of the document. The defaults can be changed while editing. Defaults specify attribute values which are automatically copied to the authored CGs according to their type.

• Participants: a list of objects to which the document refers. Each participant is described by an instance (or a generic) CG, and given an alias. The system provides an automatic identifier for participants, but these can be changed by the user to a meaningful identifier.

• Utterances: editing information proposition by proposition.

The system provides suggestions to complete expressions according to the context in the form of pop-up windows. In these suggestion windows, the user can either scroll or choose with the mouse or by entering the first letters of the desired word; when the right word is marked by the system, the user can continue, and the word will be automatically completed by the system. For example, when creating a new participant, the editor presents a selection window with all concepts in the ontology that can be instantiated. If the user chooses the concept type ”Dog” the system creates a new object of type dog, with the given identifier. The user can further enrich this object with different properties. This is performed using the ”.” notation to modify a concept with an attribute. While the user enters the instance specification and its initial properties, feedback text and a conceptual graph in linear form are generated simultaneously. When the user moves to the next line, the new object is updated on the global context view. Each object is placed in a folder corresponding to its concept type, and will include its instance name and its description in CG linear form. In the Utterances panel, the author enters propositions involving the objects he declared in the participants section. To create an utterance, the user first specifies the object which is the topic of the utterance. The user can choose one of the participants declared earlier from an identifiers list, or by choosing a concept type from a list. Choosing a concept type will result in creating a new instance of this concept type. Every instance created in the system will be viewed in the context viewer. After choosing an initial object, the user can add expressions in order to add information concerning this object. After entering the initial object in an utterance, the user can press the dot key which indicates that he wants to enrich this object with information. The system will show the user a list of expressions that can add information to this object. In CG terms, the system will fill the list with items which fall in one of the following three categories:

99 • Relations that can be created by the system and their selectional restrictions, such that they allow the modified object to be a source for the relation.

• Properties that can be added to the concept object, such as name and quantity.

• Concept types that expect relations, the first of which can connect to the new concept. For example the concept type ”Eat” expects a relation ”Agent” and a relation ”Patient.” The se- lectional restriction on the destination of ”Agent” will be, for example, ”Animate”. Therefore the concept ”Eat” will appear on the list of an object of type ”Dog”.

The author can modify and add information to the active object by pressing the dot key. An object which itself modifies an object previously entered, can be modified with new relations, properties, and concepts in the same manner. The global context is updated whenever a new instance is created in the utterances. When the author has finished composing the utterance, the system will update the local context and will add this information to the generated natural language document. The comma operator (“,”) is used to define sets in extensions. For example, in Figure8.3, the set ”salt and pepper” is created by entering the expression #sa,#pe. The set itself becomes an object in the context and is assigned its own identifier. The dot notation combined with named variables allows for easy and intuitive editing of the CG data. In addition, the organization of the document as defaults, participants, and context (local and global) provides an intuitive manner for organizing documents. Propositions, after they are entered as utterances, can also be named, and therefore can become arguments for further propositions. This provides a natural way to cluster large conceptual graphs into smaller chunks. The text generation component proceeds from this information, according to the following steps:

• Pronouns are generated when possible, using the local and global context information.

• Referring expressions are planned using the competing expressions from the context infor- mation, excluding and including information and features of the object in the generated text, so the object’s identity can be resolved by the reader, but without adding unnecessary information.

100 • Aggregation of utterances which share certain features using the aggregation algorithm de- scribed in [Shaw, 1995].

Consider the example cooking recipe in Figure8.3. The author uses the participants section to introduce the ingredients needed for this recipe. One of the ingredients is ”six large eggs”. The author first chooses an identifier name for the eggs, for example, ”eg”. From the initial list of concept types proposed by the system, we choose the concept type ”egg”. Pressing the dot key will indicate we want to provide the system with further information about the newly created object. We choose ”quantity” from a given list by typing ”qu”, seeing that the word ”quantity” was automatically marked in the list. Pressing the space key will automatically open brackets, which indicates we have to provide the system with an argument. A tool tip text will pop up to explain the function of the required argument to the user. After entering a number, we will hit the space bar to indicate we have no more information to supply about the ”quantity”; the brackets will be automatically closed. After the system has been told no more modification will be made on the quantity, the ”egg” object is back as the active one. The system marks the active object in any given time by underlining the related word in the input text. Pressing the dot will cause the list box to pop up with the possible modifications for the object. We will now choose ”attribute”. Again the system will open brackets, and a list of possible concepts will appear. The current active node in the graph is ”attribute”. Among the possible concepts we will choose the ”big” concept, and continue by clicking the enter key (the lexical chooser will map the “big” concept to the collocation ”large” appropriate for ”eggs”). A new folder in the global context view will be added with the title of ”egg” and will contain the new instance with its identifier and description as a CG in linear form. Each time a dot or an identifier is entered, the system converts the current expression to a CG, maps the CG to a FUF Functional Description which serves as input to the lexical chooser; lexical choice and syntactic realization are performed, and feedback is provided in both English and Hebrew. The same generated sentence is shown without context (in the left part of the screen), and in context (after reference planning and aggregation). When generating utterances, the author can refer to an object from the context by clicking on the context view. This enters the corresponding identifier in the utterance graph.

101 8.2 Bliss Communication Board

The description of the Bliss communication board involves three aspects which are described sep- arately in this work:

1. The input language - in this case Bliss symbols (presented above in section 7.2.1).

2. The overall layout of the display (section 8.3).

3. The processing method - adjusting the SAUT methods to the communication board (section 8.4).

8.3 Implementing a Communication Board

The main objective in the design of a communication board is to make it efficient: both in reducing the number of selections (especially when selection of symbols is not direct) while preserving a logical order of selection and keeping the attention of the user tuned, and allowing a wide expressive capability. There are several strategies to enable this desired design:

1. Displaying most frequent symbols first – symbols that are rarely used should be reachable but not be placed in main or initial displays, to avoid overload and to reduce the list of choices.

2. Designing displays by categories – conversations are conducted in different contexts and there- fore distinct vocabulary may be used. Specifying the context in which the conversation is conducted, again, reducing the possible symbols to be displayed and is therefore desired.

3. Displaying symbols through paradigmatic relations – the paradigmatic axis – displaying sym- bols which are possible in the current context (for instance using selectional restrictions).

4. Displaying symbols through syntagmatic relations – progress on the syntagmatic axis, i.e., displaying the symbols according to syntactic context.

5. Hierarchical view of ideas – taking into consideration the structure of a conversation; for instance, focus first on the representation of an event, then provide more specific details about it.

102 6. Hierarchical view of form – identifying a rhetorical structure and the manner in which it affects possible syntactic structures.

7. User and context dynamic adjustment – learning the user’s preferences both of vocabulary and grammatical structures.

Taking advantage of the special characteristics of Bliss symbols, our system provides an answer to the first four strategies in this list. Frequency lists of words are available online 3 or can be calculated through keeping track in log files (as was done, for example, by [Copestake and Flickinger, 1998]). An initial (main) display is set in advance for each particular user. There can be three main methods of display arrangements. The first one is the typical AAC method of pre-defined boards (dynamic boards following [Burkhart, 2005]). The second is by using Blissymbols unique characteristics, i.e., a virtual keyboard with atomic shapes displayed, and when a symbol is selected, all connected symbols are retrieved (i.e., if the money symbol is chosen then bank, business, cheap, clerk, coin, convenience store, expensive, fee, poor, price, prostitution, rich, shekel, store, to buy, to earn, to finance, to pay, to sell, wallet are displayed). Both methods are implementable with the tools built in this work. The third method dynamically changes the symbols on the display following the SAUT author- ing method. We now provide details about this approach. As in the regular SAUT system, the display is divided into four main areas:

1. A list of participants and of defaults.

2. Buttons (as will be elaborated below)

3. A text pane where the chosen symbols are displayed as the sequence is entered and the (possibly partial) sentence generated.

4. Text in context.

When initializing a conversation, the display can be set to the participants of the conversation (to allow quick reference) and defaults can be set – such as the tense or mood of the conversation. These contexts can be saved to a file and can be reloaded as needed. 3For instance, http://www.aacinstitute.org/Resources/ProductsandServices/PeRT/040615GeneralCoreVocabulary.txt

103 The display contains the set of buttons (or keys) which are further divided into three types:

1. function keys

2. hyperlink to other displays

3. symbols

. Function keys control editing function such as delete, back (previous screen), reset. Pressing a hyperlink button leads the user to other displays according to his desire (context displays such as home, food, and school, properties for modifying symbols or utterances (with adjectives or adverbs), sentence starters (following the dynamic displays approach), which allow generation from pre-defined sentence structures (represented as CGs at the authoring level) that can be filled as templates. The symbols buttons display Bliss symbols (with the Hebrew/English word written below). To allow control on grammatical factors, each display presents a constant set of language function symbols (Bliss indicators, such as past/future/present indicators) as well as mood symbols functions (to indicate whether the sentence is a question or an imperative).

8.4 The Processing Method - Adopting the SAUT Technique

Our objective is to view the operation of selecting Bliss symbols in context as a form of semantic authoring. We aim to adapt the general semantic authoring method implemented in SAUT to the context of symbol selecting in Bliss. However, SAUT is a textual system and follows conventions that are common in computer language editors, such as Intellisense in Microsoft Visual Studio application. Since a Bliss communication board is not a textual system, the conventions must be adjusted to the symbols set. In SAUT, the dot key is used when the user wants to add information about an entity he has chosen, for instance Boy.Attribute(American) for generating the phrase “An American boy” or Boy.Plural for “Boys”. SAUT will offer in a pop-up menu all possible relations that can have “boy” as their argument and all concepts which can stand in relation with the given word. In the case where the chosen symbol refers to a concept that requires one or more arguments, the SAUT system opens brackets and offers in the pop-up menu all items that can be inserted as

104 the argument (following the selectional restrictions and following the arguments order as given in the ontology). In the case of a communication board, neither the dot key nor the space bar are used. When a symbol is chosen and is looked up in the ontology, the same two possibilities exist: if the sym- bol requires arguments, the next display will show symbols that are compatible with the se- lectional restrictions. Problems arise when an argument needs to be modified, such as in the case of the sentence The boy I met yesterday lives here. The SAUT input representation is Live(Boy.Meet(I).Time(yesterday) Here). If dot/space are not used, and the display changes according to the arguments, the symbols of the display that will be generated once the symbol “boy” was chosen will contain the location of symbols (as the selectional restrictions of the verb live requires). This problem can be solved in three manners:

1. By using a dot-like button to add properties to the last chosen symbol

2. By semantic parsing

3. Using editing options in the presentation of the symbols in the text pane

The second option was rejected since we did not want to use parsing in the process of the text generation. This could have been done by building partial conceptual graphs for each symbol inserted, then using unification to find the best possible assembly and generating the sentence following the existing process. The third option of using editing options on the text pane requires additional keystrokes and is inefficient for the purpose. The compromise solution we have adopted is to add a properties hyperlink button to the display, which can be chosen after the symbol that needs to be modified. The properties that are displayed are determined in the same manner the dot key offers the possible complementizers in the SAUT system. This method is also compatible with existing dynamic displays which use a hyperlink button of properties that link to pages with qualities (such as adjectives, adverbs). Once the main verb and its arguments are chosen, the system offers possible modifiers (imple- mented as circumstentials and adverbials in the SURGE/HUGG syntactic realizers [Elhadad and Robin, 1996]) and represented as relations in the ontology.

105 Symbols that were used in previous sentences in a current conversation are added to the par- ticipants list and can be referenced again with a direct selection. Using these symbols enables generation of reference expressions and aggregation where needed.

8.5 Summary

Computerized AAC devices are characterized by four aspects [Hill and Romich, 2002] [McCoy and Hershberger, 1999]:

1. Selection method

2. Input language

3. Processing method

4. Output medium

In this chapter we have discussed our implementation of the processing method that is used in the display. We surveyed the process of NLG through Semantic Authoring and the methods that are used in SAUT system for that purpose. Adoption of SAUT technique for the processing method of the communication board and the overall properties of the Bliss dynamic display distinguishes our work from previous NLG-AAC systems. The next chapter compares such systems from these two aspects (input language and processing method) and compares the overall techniques to this current work.

106 Chapter 9

Comparison with Existing NLG-AAC Systems

This section surveys existing NLG-AAC systems. We highlight the common architecture underlying the various systems, and compare the elements in which our system differs from existing ones.

9.1 Blisstalk

A first attempt to generate text from a sequence of Bliss symbols was done by Sheri Hunnicutt [Hunnicutt, 1986], in a system called Blisstalk. BlissTalk is a dedicated communication board with a grid of 504 squares. Most of them are dedicated for lexical items (Swedish) and a few are reserved for general system functions and tuning. The symbols are arranged according to their part of speech. Names and words without a known symbol can be added to the display. Additional symbols on the board refer to functions such as Bliss indicators (that can be used to modify the part of speech of a chosen symbol), syntactic functions such as tense or number, and a special set of symbols that can add information to the concepts on the board, such as combination or similar-to. In addition, the display includes letters and digits. The underlying lexicon includes pronunciation information for the words represented by the symbols on the board, their part of speech, and additional morphological information. The strategy that was taken in Blisstalk was to adapt the speaker’s syntax as the input language. Blisstalk uses a phrase structure grammar to parse the given sequence of symbols.

107 The parsing is done gradually by introducing phrase markers, grouping the input symbols into verb or noun phrases. Each phrase type determines which words it can contain. Noun phrases can be further processed, using ordering conventions, into double objects, subject-object pairs, or both. Delimiting the phrases is intended to avoid ambiguity in the processing of the complete sentence: if a symbol that represents a noun is located in a verb slot, it is inflected as a verb with the morphological rules. Blisstalk relies on a syntactic parsing solution to complete and revise the input sequence and make it more fluent. This approach suffers from two limitations: syntactic parsing on noisy input can only have limited success; in particular, because semantic information is not used (but only part of speech data for each symbol), there is not enough knowledge to recover from parsing errors. The second limitation is that the revision approach works only after the input sequence has been composed – and as a result can only improve fluency, but cannot improve the input and selection rate. In contrast, our approach relies on semantic authoring, and provides tools To both assist input composition and produce more fluent output. Semantic authoring avoids the complexity of parsing (syntactic or semantic) by controlling the input composition process.

9.2 compansion

Compansion (Compression-Expansion) was developed to expand un-inflected sequences of content words (telegraphic text in other words) into syntactically and semantically well-formed sentences [McCoy, 1997]. For example, John go store yesterday is transformed to the well formed sentence John went to the store yesterday [Pennington and McCoy, 1998]. The system was originally developed for enhancing the communication rate of people who use telegraphic/iconic inputs, but can be also used for supporting literacy skills or for correcting writing errors. The main difficulties in accurately transforming an ill-formed sentence into a well-formed para- phrase is in detecting multiple errors in one utterance, as well as the possible ambiguity of inter- pretation. For example, John gone to the store can be interpreted as John went to the store or John had gone to the store. A possible solution to this problem is to generate all possible sentences

108 for a given input. The selection of the best suggestion can rely on the history of the inputs produced by the user. The solution applied in compansion is to perform semantic parsing of the input sequence. The process begins with a “word order parser” which groups words into sentence-sized chunks and indicates words’ part of speech. Modifiers are attached to words they most likely modify. The semantic parser of the compansion system is based on the use of case frames, i.e., con- ceptual structures that represent the meaning of the content words and the relationships among them. More specifically, the parser builds the case frame structure of the verb in the utterance, filling the slots with the rest of the content words given. The case frames are similar in spirit and definition to the structures encoded in FrameNet (see section 7.1.2). A list of semantic roles was chosen for the purpose:

AGEEXP - Agent/Experiencer (no intentionality required) - John is happy

THEME - object acted upon

INSTR - object used to perform the action of the verb

GOAL - a receiver of the action

BENEF - the beneficiary of the action. e.g., John gave the book to Mary for Jane

LOC - event location (can further be decomposed to TO-LOC, FROM-LOC, and AT-LOC).

TIME - time of event and tense.

The semantic parser constructs the most likely interpretation of the given input, using a set of scoring heuristics based on the semantic types of the input words (such as preferring animate agents for actions), and a preference set of the most important slots that should or must not be filled for a verb (for telling apart transitive/intransitive verbs for example). The parser generates all possible structures, which are later scored. Some options are discarded if their score does not reach a pre-defined cut-off measure. The rest of the candidates are ordered by scoring rules (e.g., “prefer animate agent for the verb eat”). A further improvement for the system is handling the choice of most likely syntactic structure for a given input. The input Apple eat John can be generated as John ate the apple or John was

109 eating an apple, etc. For this purpose the system was combined with statistical information from corpora, frequencies of subcategorization from [Ushioda et al., 1993], and lexical information from WordNet [Miller, 1995]. In the absence of a verb in a telegraphic message, the slot is filled with either to be or to have, and in the absence of an agent - the pronouns I/You are inserted. Once the semantic structure was determined, a translator/generator generates the compatible sentence in English. The approach of semantic parsing investigated by compansion has much higher relevance to parsing the style of telegraphic input characteristic of AAC situations than syntactic parsing. As in our approach, the system relies on a semantic representation to re-generate fluent text, relying on lexical resources and NLG techniques. Our approach differs in that, with the model of semantic authoring, we intervene during the process of composing the input sequence, and thus can provide early feedback (in the form of communication board composition and partial text feedback). We have not yet performed user evaluation to assess the difference in performance between seman- tic parsing and semantic authoring. Eventually, we expect that even semantic authoring in the context of an AAC application will require some semantic parsing (to avoid introducing even the “simple” addition of syntax upon which we relied in SAUT for the non-AAC application – use of disambiguating operators such as the dot, comma, or parentheses in the SAUT input language). [McCoy et al., 1998] further investigated the integration of the Compansion technique into Minspeak°c [Baker, 1984], and more specifically into Communic-EaseTM , one of Minspeak’s Ap- plication programs (MAPsTM ), which contains vocabulary for children. It contains 580 words classified into 38 general categories, which are coded in the traditional Minspeak method (see section 2.2). In addition, it handles some morphology, for example, by adding plural marks. The system runs on a the PRC LiberatorTM dedicated AAC device, with an Interface Display to present the textual output. The method takes advantage of the Icon Prediction of the Minspeak device,

(70 DECL (VERB (LEX EAT)) (AGEXP (LEX JOHN)) (THEME (LEX APPLE)) (TENSE PRES))

Figure 9.1: The preferred semantic structure for the input Apple eat John

110 but adds an engine with a simplified version of Compansion as an Intelligent Parser/Generator (IPG). The IPG works incrementally and in parallel, as the icons are selected, and provides fur- ther constraints on the Icon Prediction process. Based on an analysis of logged transcripts of Communic-EaseTM users, it was found that very simple sentence structures are mainly used. A set of transformation rules was developed (which can be tailored later for individuals). If several interpretations are found for a given input, all possible realizations are offered on the display. The pros and cons of incremental and non-incremental processing are discussed in the paper. In the case of incremental processing, if the system generates, for instance, a definite article instead of an indefinite as was intended by the user, it can be fixed before the rest of the sentence is entered. In contrast, if the process parses the complete message, the revision will be done only in retrospect. On the other hand, constraints that are enforced on the Icon Prediction can become a burden to the user, especially children (who form the target population of the system): the assumption is that it is unlikely that the intended user will be able to keep the sentence in his mind word-by-word, select icons, and evaluate the system’s output, and, therefore, the system offered uses the non-incremental method. Our system does use incremental processing, but as a controlled process and without doing parsing. We have not tested it on any particular AAC population. However, we checked usability of the SAUT system and the semantic authoring system was found to be easily learned. In addition, we have put much effort on developing a wide-coverage vocabulary which is presented both in the ontology and the lexical choice modules. The results of the Compansion evaluation certainly need to be pursued in the context of our proposed semantic authoring approach.

9.3 Transforming Telegraphic Language to Greek

The system presented in [Karberis and Kouroupetroglou, 2002] generates full sentences from tele- graphic input, possibly from a sequence of Blissymbols or MAKATON symbol set. The system includes the following components:

• An input device for telegraphic input (either text or symbols)

• The TtFs telegraphic-to-Full-Sentence module – the main component of this system which transforms a compressed, incomplete, and ill-formed Greek text into a full grammatical Greek

111 sentence

• The output device - either a text-to-speech or a written device (e-mail, printing device)

The two main components of the TtFS module are:

1. Pre-processor: assigns each word its part-of-speech and adds function words to the sequence. The output of this component is a full but agrammatical sentence. This process itself has three substeps:

(a) dividing the sentence into sub-clauses if it includes conjunction words

(b) adding missing articles to nouns located before the verb

(c) adding missing articles to nouns that follow the verb, according to their semantics

Verb transitivity features and noun semantics are encoded in the lexicon, as well as each word’s morphological data.

2. Translator/Generator: applies a set of syntactic and semantic rules on the un-grammatical input and generates a well-formed Greek sentence.

This process has the following substeps:

(a) The lexicon is consulted to assign five features for each word: tense, case, gender, person, and number.

(b) A set of syntactic patterns are checked to find the syntactic functions of each word, or to add words if those are missing. For instance, in the absence of a subject, the pronoun I is added. For subordinated clauses with no subjects, the subject of the main clause is assumed.

(c) After the assessment of syntactic functions, the five features attached to each word are processed to form the right structure of agreement (for each part of speech some particular features are assigned, and the rest are NULL).

(d) Finally, words are inflected according to their features.

The system assumes a fixed word order, i.e., SVO and the presence of all content words in the input. This approach combines Bliss symbols with syntactic and semantic parsing. It is quite

112 similar in scope and intention with the system we have presented. The method is quite different in that it does not rely on an explicit NLG framework, and relies on specific pattern matching rules for syntactic and semantic parsing. As in the previous systems reviewed, it does not attempt to intervene during the composition and selection stage, thus it does not attempt to improve the input rate, but only the output fluency.

9.4 pvi Intelligent Voice Prothesis

The PVI ([Vaillant and Checler, 1995] [Vaillant, 1997]) system is a communication tool that aims to expand a sequence of Bliss symbols into sentences in French. The underlying assumption of the system design is that textual icon-to-word is not sufficient to represent the meaning of the desired message, nor can a Context Free Grammar distinguish the different structures that should be generated from very similar input without semantic parsing, for example: boat to eat (I eat in a boat) steak to eat (I eat a steak)

Therefore, in order to achieve good automatic interpretation and re-generation in natural lan- guage from a sequence of symbols, semantic analysis must be made, finding the best words that convey the meaning of the icons, and a syntactic realizer to produce the full utterance in a natural language is required. To prepare the system, a thorough corpus analysis was performed. The corpus was collected from speech acts of children with Cerebral Palsy in the Kerpape Rehabilitation Centre in France, in distinctive pragmatic situations (spontaneous speech, training situations, supervised communi- cation sessions). The set of chosen icons became the basis of the lexicon. The words were then analyzed through their paradigmatic dimension, and divided into taxemes of semantic domains such as food, alimentation, movement, game, and more. A syntagmatic analysis was also conducted, i.e., the occurrence of words in a sequence. For example, this analysis identified that the to eat symbol subcategorizes for two casual functions, the agent and the object. Each icon was assigned a semantic content which includes its taxeme (classification item) and the elementary features that are specific and unique for its taxeme. Each icon was also assigned

113 its own features that distinguish it from other icons, mostly binary features which constitute the semantic primitives of the system. The set of features that belong to an icon and those of the word that may represent it are not necessarily identical. Once a sequence of icons is entered in the system, a semantic analysis process attempts to recover its meaning by building a meaning representation of the utterance. First, the analyzer scans the input sequence from left to right, searching for a predicative icon. Then, the rest of the sequence is searched for an icon that can be a filler to one of the free predicate slots, using a process of unification, conforming the semantic features of its functors with that of the identified arguments. There may be a recursive situation where an icon that was identified as a functor of one predicative icon is predicative by itself, therefore, unification proceeds until all possible free slots are instantiated. This process continues until the entire sequence is processed. The compatibility of features during unification process is a binary relation of two types:

1. Inclusion: if the semantic constraint expressed by the case feature is mandatory, i.e., C(a,b) = 1 if all features of a are present in the features of b and have the same value, 0 otherwise.

2. A scaled product of the two sets of features if more or less acceptable solutions are found. In that case, C(a,b) = number of features of a with the same value as in b / the total number of features in a

During the parsing process, all possible solutions are evaluated, each is scored using the above mentioned measures, and the best solution is eventually chosen. The output of this process is a linear form of a semantic network, based on the order given by the input utterance. Since there is no one-to-one mapping between icons and words, or between the structure of the semantic network and the syntactic structure, a lexical choice module is crucial before syntactic realization. The output of the parsing process consists of semantic networks (semioms in the author’s terminology), which are clusters containing semantic features. These clusters may not match with any linguistic entity. PVI’s lexical choice strategy is either short-circuit, i.e., distinct semantic entities are unified into a single word, or derivation, i.e., icons with rich content are expressed with more than one word. Once a lexical choice is accomplished, three mechanisms of syntactic realization are applied:

• word order determination

114 • inflection

• insertion of functional morphemes (determiners, prepositions, etc.)

The lexicon contains elementary syntactic trees representing possible phrase construction, each tree contains information of the lexeme corresponding to the scheme, and the morphosyntactic structure expressing its casual structure. The semantic network is traversed following the semantic links. For every node (sememe), a corresponding elementary tree is selected, later assembled using:

• substitution (for mandatory functors)

• adjunction (for optional functors)

These two operations define a Tree Adjoining Grammar (TAG) [Joshi, 1987], eventually generating the output in French, pronouncing it with a voice synthesis device. The PVI display (communication boards) includes the symbols presented with their correspond- ing word and possibly a sound. In addition to the symbols, the display includes action buttons that refer to the display parameters. PVI is designed for multiple access options: direct through pointing devices, keyboard or scanning with switches. The system was evaluated [Vaillant, 1997] with a lexicon of 300 icons only, which is a very limited set. The results show 80% success as measured by levels of correctness of both semantic and syntactic analysis. PVI is the system most similar to our system, differing mainly in the following aspects: we rely on a standard meaning representation framework (the Conceptual Graph formalism) and the operations CG provides, instead of the specific mechanism used in PVI; we use existing NLG resources (lexical chooser, syntactic realization) and, in particular, a large-scale lexicon for both Bliss symbols and for the English and Hebrew realization. Finally, as in the case of all the systems discussed above, the approach of semantic authoring, as opposed to semantic parsing, allows us to intervene in the input construction process.

9.5 cogeneration

Cogeneration [Copestake, 1996] [Copestake, 1997] was developed as a tool to enhance communica- tion of people who suffer from ALS (Amyotrophic Lateral Sclerosis), and tend to prefer using their language with textual AAC means and not to use symbolic communication.

115 The system combines template-based sentences, statistical Information, and NLG techniques. A set of pre-defined templates are stored and categorized by particular dialogue situation labels. A user chooses a template and a list of slots to fill is offered to him. Some slots have default values (tuned by previous inputs), some slots are optional. Word or phrase prediction is possible while instantiating the slot. The cogenerator combines the constraints on the given slots with syntactic constraints and, with statistical information, eventually generates the desired output utterance. For instance, in a template for request, the user may enter the sequence open kitchen window so that the generated text will be Please, could you open the window, or, if an urgent label is chosen, the output will be Open the kitchen window! The information of the underlying structure can, later on, instruct the voice synthesizer about the appropriate intonation of the utterance. The knowledge that is required for this process includes:

1. a set of application dependent and independent templates

2. statistical information about collocations and preferred items

3. a syntactic realizer and a lexicon, syntactic structures

While input is being entered to the template, a word prediction program uses statistical infor- mation to find the most probable word and to offer completions. Compounds and collocations (such as kitchen window) are recognized so they are not split, and the right stress is given in vocalizing. However, in order to recognize collocations or compounds that were not seen earlier in a corpus, statistical information is backed off with lexical-semantic information. The cogeneration system addresses the objective of speeding up entry rate using NLG and machine-learning techniques. The techniques presented are mostly orthogonal to the approach of semantic authoring we investigate. The integration of these techniques together with our approach seems a promising avenue of research.

9.6 Summary

The four systems described above share the common ground of using NLG techniques for message generation. Our system, PVI, and the Greek system were implemented for Bliss symbols (they also claim to be compatible with other symbolic languages). compansion was implemented for textual tele-

116 graphic input, and the Cogeneration system possesses characteristics of systems with prestored sentences, but since it combines NLG techniques with templates, it has evolved into an NLG sys- tem. Although they differ in the above mentioned manners, all systems share a common architecture of a typical information flow:

• Insertion of iconic/telegraphic input

• Identification of the internal semantic representation (through parsing or unification)

• Re-generation in natural language - lexical choice and syntactic realization

Our system differs from this approach in the processing method: the semantic representation is built during the insertion of the symbols and therefore no parsing is conducted. As discussed in previous chapters (see 3), parsing telegraphic text, especially with a free vocabulary, causes a variety of problems. Our system imposes a less natural manner of symbol choice and insertion, but the method of semantic authoring we investigate provides both the potential to improve input rate and output fluency, while avoiding most of the difficulty of semantic parsing inherent in post-processing approaches.

117 Chapter 10

Evaluation

The hardest task we encountered in evaluation was to determine what should be evaluated. For both aspects of the system, NLG and AAC, the definition of measures for evaluation remains the topic of ongoing discussion in both research communities. An AAC system must address three main functions: allowing a user sufficient expressive power, enhancing the rate of communication, and improving the ease of communication [Cornish and Higginbotham, 2000a]. Expressive power depends on the set of symbols that are used and the vocabulary that is offered and depends very much on the cognitive and physical abilities of the user. In addition, the scope of the vocabulary offered depends on the device and its limitations. Rate enhancement is required since the average rate of communication of an AAC user is very low relative to a speaking person. Since our system is constructed from distinct modules such as HUGG, Bliss lexicon, integrated verbs lexicon, ontology, and the SAUT authoring system, it would have been desirable to evaluate each constituent by itself. Instead, we focused on an evaluation of the integration of these tools into a single AAC application, which is the main focus of this work. As evaluation of the AAC application proceeds, we intend to pursue more component evaluation as well. The next two sections present the difficulties in evaluation of NLG and AAC systems. Section 10.3 discusses the aspects that are to be evaluated in our system. Section 10.4 presents an experi- ment that was conducted to evaluate the SAUT system - as the closest simulation of the full AAC scenario we intend to eventually support.

118 10.1 Evaluation of NLG systems

Evaluation of an NLG system is a difficult task - and there are still no definite criteria for doing so. The difficulties are due to several aspects in the nature of an NLG system such as the various forms of inputs which are affected by the purpose, domain, and knowledge sources to which the system refers. It is also unclear what to evaluate in the output, as evaluation in terms of quality and coverage are not always appropriate. [Dale and Mellish, 1998] discuss the main questions the NLG evaluation raises, and primarily divide the evaluation task into three main categories:

1. Evaluation of properties of the theory – measuring properties of the linguistic underlying theory such as coverage and domain-independence

2. Evaluation of properties of the system – comparing characteristics of a system such as speed, coverage and correctness with similar systems

3. Application potential – evaluating the system in an appropriate environment to determine whether NLG provides a better solution than alternative systems.

[Bangalore et al., 1998] distinguish between:

• Intrinsic evaluation: judging quality criteria of the generated text and its adequacy relative to the input. This is usually performed by asking human judges to evaluate these criteria and assessing agreement among the judges. The key criteria tested are accuracy, fluency, and coverage.

• Extrinsic (task-based) evaluation: judging the way the generated text helps people perform specific tasks. For example, in our case, an extrinsic evaluation would consist of measuring the time it takes an AAC user to order goods over a chat conversation with an on-line store.

• Comparative evaluation: comparing the performance of the system with similar systems, by comparing the output (one system’s output is used as a benchmark or gold standard) and the performance of the systems.

119 These three aspects were measured in [Miliaev et al., 2003] for a system producing technical in- structions on how to operate electronic equipment. A similar large-scale evaluation was performed in the AGILE system [Hartley et al., 2000]. When available, a corpus of data, and parallel text representing it, can serve as a basis for comparisons but this is most often not available. Even in a corpus-based technique, differences between the computer-generated texts and written human ones can occur in the various levels of generation (such as misinterpretation of data, wrong choice in lexicalization, usage of other syntactic structures). Evaluation in this case can be measured by human judges or in terms of post-editing [Sripada et al., 2005]. Stochastic-based methods of generation such as [Bangalore et al., 2000], enable evaluation methods that are more similar to evaluation of natural language understanding systems. Two methods for a quantitative evaluation were defined: string-based metrics and tree- based metrics, however, these metrics are only possible when a corpus is available and dependency trees of the target sentences are structured. [Callaway, 2003] [Callaway, 2005] provide thorough analysis of the coverage and performance of the SURGE realization grammar. To this end, the author used the parsed corpus of the Penn Treebank and automatically converted the syntactic parse tree to a SURGE input FD. SURGE was then used to regenerate the sentences from these input FDs and the generated sentences were compared with the original sentences. The analysis measures a coverage of 98.5% for SURGE, 69.3% of the sentences were generated with an exact match. In approximately 50% of the sentences without an exact match, the errors were caused from the transformation process of the inputs. The main errors that are caused by the SR are syntactic (handling inversions, missing verb tense, mixed conjunctions, mixed type of NP modifiers, direct and indirect questions, mixed level quotations, complex relative pronouns, and topicalization). The rest of the errors are due to mistaken ordering of the sentence constituents or wrong punctuation.

10.2 Evaluation of AAC systems

Evaluating an AAC system is also challenging. Like in an NLG system, an AAC system is also composed of several components and it is not always clear which component is responsible for the results of the evaluation [McCoy and Hershberger, 1999]. Moreover, an AAC system is defined by several aspects (such as the input language, selection method, processing technique, and its output,

120 see section 2.1.4), while a specific prototype AAC system may have to focus on one particular aspect. As a consequence, it is often not possible to evaluate the full system performance. In compansion, for example, a system that focused on processing the telegraphic input, the evaluation was in terms of keystroke savings. Each root word was considered a keystroke and inflection morphemes were considered additional keystrokes. The measurement of the system’s performance was the ratio between the number of words in a full sentence and the number of words that were inserted in the telegraphic message [McCoy and Hershberger, 1999]. However, it was recognized that the evaluation must also refer to the quality of the text generated and the adequacy of its meaning. This kind of evaluation was performed in the pvi system. The evaluation in [Vaillant, 1997] refers to the number of utterances that were interpreted correctly and did not refer to the enhancement of communication rate. [Vaillant, 1997] showed 80.5% acceptability of sentences generated from a set of 300 Bliss icons. Sentences were considered acceptable if they managed to represent meaning correctly (i.e., correct semantic analysis), but were not necessarily realized correctly (including clumsy generation). An additional point to consider in evaluation is its performance by AAC users. In [Higginbotham, 1995], it is argued that use of nondisabled subjects in evaluations of AAC research, when ap- propriately used, can be easier and cheaper to obtain, and in some cases viable and preferred. Higginbotham’s claims are that the methods of message production are not unique to AAC users and analogous communication situations can be found for both disabled and nondisabled users. Nondisabled subjects can contribute to the understanding of the cognitive processes underlying the acquisition of symbol and device performance competencies. On the other hand, when the pvi system was tried on AAC subjects (who suffered from cere- bral palsy) some problems were encountered that could not have been detected with nondisabled subjects:

1. Frustration from error in generation

2. Lack of vocabulary

3. Interface adjustability

[McCoy and Hershberger, 1999] analyzed AAC user-therapist conversations to identify the va-

121 riety of cases to be considered and processed in a message production system, as a basis for further evaluation. However, the limitations of this method are also listed: previous knowledge of the therapists affects sentence production; people may show changes in behavior when interacting with a computer; telegraphic style may be intentional or it may be caused by lack of syntactic skills or due to absence of syntactic markers on the display. [McCoy and Hershberger, 1999] conclude that evaluation must be made by choosing a specific population and the interface must be tailored to each individual in order to assure smooth usability of the system. If a system claims to enhance communication rate, it must be realized that the rate may be reduced (or enhanced) by factors that are not necessarily the novel processing method. Moreover, a system may be found to slow communication but to increase literacy skills. As a possible solution, the authors suggest taking advantage of system components that were proven as useful for a given population and to change only the tested component. Rate measures have been expressed in terms of number of selections, switch activation, or lin- guistic units in a time unit (minute or second), mostly for typing tasks [Cornish and Higginbotham, 2000b] and in non-interactive experimental environments. [Cornish and Higginbotham, 2000b] offer a segmentation method to distinguish omission of small and large units in a utterance, in order to calculate rate measures with reference to the full sentence. Big units (BU) are full phrases (such as a transitive verb with both subject and object, an adjunct preposition phrase or idiomatic expressions) and small units (SU) are unique function words such as determiners or prepositions. The proposed analysis is to use BUs to determine communication efficiency by calculating BUs for time unit per user in an interaction or in a turn of interaction. SUs can be used in calculation of message complexity in a BU/SU ratio. Omissions can be calculated per time unit as well. Calculating efficiency gained by omission can consider the amount of BU and SU in the message against its full sentence interpretation [Cornish and Higginbotham, 2000b]. [Cornish and Higginbotham, 2000a] define a selection savings measurement and compare four systems. In addition, four linguistic metrics to measure quality of message generation are defined. Test utterances are taken from natural speech corpora. The metrics are:

1. Surface match – number of shared words between the corpus and the generated message. Articles are excluded, and lexical match is calculated with another measure.

122 20:37:00 "I need " 20:37:05 "*[VOLUME UP]*" 20:37:06 "*[VOLUME UP]*" 20:37:07 "*[VOLUME UP]*" 20:37:14 "something " 20:37:16 "to drink " 20:37:19 "i" 20:37:20 "m" 20:37:28 "ediately"

Figure 10.1: Output of LAM [Hill et al., 2001]

2. Pragmatic function match – measures the ability of the message generated to fill the same communicative goal as the utterance from the corpus. Utterances were tagged with speech acts tags such as statement, reply, and answer, then percentage of matching calculated.

3. Lexical item match – this measures relatedness between words in the source and produced utterances such as synonym, hypernym, and hyponym, and coordinate terms (which are hyponyms of the same lexeme).

4. Perceived match – criteria given to nondisabled judges to rate (i) surface form match, (ii) pragmatic function, and (iii) an overall match.

Measuring communication rate is done in many cases with the Automated language activity monitoring (LAM) performance tool that is used to collect language quantitative data (see Figure 10.1). Fourteen measures such as average communication rate, peak communication rate, and selection rate were defined [Hill et al., 2001] [Hill and Romich, 2001]. Measuring the usability of a system, a la [Cornish and Higginbotham, 2000a], can be done by (i) asking subjects to produce a set of utterances, (ii) giving subjects a general task which requires the generation of the message, (iii) giving a device to be used as the communication tool in simulated situations, and (iv) use in real situations.

10.3 Evaluation our System

We evaluate our system as an AAC application for message generation from communication boards. From an NLG evaluation perspective, this corresponds to an intrinsic evaluation. Since the prototype of our system is not yet adjusted to interact with alternative pointing devices, we could not test it on actual Bliss users, and could not perform a full extrinsic (task- based) evaluation.

123 Moreover, the approach we offer for generation of messages is novel and requires a user to plan a sentence in advance. It cannot be compared with systems of NLG-AAC. In any case, as we have shown above, there is no uniformity in evaluation techniques in such systems. [McCoy et al., 1998] discuss the possible usability of incremental generation of a message in a system designated for children and assumes that the need to plan a message in advance and the cognitive load of the possible Icon Prediction will be too much of a burden on the user. We have evaluated the use of semantic authoring on nondisabled subjects and can give an approximation of the possible learning curve and usability of the system in general. Section 10.4 presents an evaluation of the SAUT system which provides a good indicator of the usability potential of our AAC system. Section 10.5 defines a detailed evaluation scenario for calculating efficiency (or enhancement rate) to be carried in the future.

10.4 Evaluating SAUT

The objectives of the SAUT authoring system are to provide the user with a fast, intuitive, and accurate way to compose semantic structures that represent the meaning s/he wants to convey, then presenting the meaning in various natural languages. Therefore, an evaluation of these aspects (speed, intuitiveness, accuracy, and coverage) is required, and we have conducted an experiment with human subjects to measure them. The experiment measures a snapshot of these parameters at a given state of the implementation. In the error analysis we have isolated parameters which depend on specifics of the implementation and those which require essential revisions to the approach followed by SAUT.

10.4.1 User Experiment

We conducted a user experiment in which ten subjects were given three to four recipes in English (all taken from the Internet) from a total pool of ten. The subjects had to compose semantic documents for these recipes using SAUT.1 The ontology and lexicon for the specific domain of cooking recipes were prepared in advance, and we tested the tool by composing these recipes with the system. The documents the authors prepared are later used as a ’gold standard’ (we refer to

1All subjects were computer science students.

124 Document # Average Time to author 1 36 mn 2 28 mn 3 22 mn 4 14 mn

Table 10.1: Learning time measures of recipe writing in SAUT them as reference documents). The experiment was managed as follows: first, a short presentation of the tool (20 minutes) was given. Then, each subject received a written interactive tutorial which took approximately half an hour to process. Finally, each subject composed a set of 3 to 4 documents. The overall time taken for each subject was 2.5 hours.

10.4.2 Evaluation

We have measured the following aspects of the system during the experiment. Coverage – answers the questions “can I say everything I mean” and “how much of the possible meanings that can be expressed in natural language can be expressed using the input language.” In order to check the coverage of the tool, we examined the reference documents. We compared the text generated from the reference documents with the original recipes and checked which parts of the information were included, excluded, or expressed in a partial way with respect to the original. We counted each of these in number of words in the original text, and expressed these three counts as a percentage of the words in the original recipe. We summed the result as a coverage index which combined the three counts (correct, missing, partial) with a factor of 70% for the partial count. The results were checked by two experts independently and we report here the average of these two verifications. On a total of 10 recipes, containing 1024 words overall, the coverage of the system was 91%. Coverage was uniform across recipes and judges. We performed error analysis for the remaining 9% of the un-covered material as described below. Intuitiveness – to assess the ease of use of the tool, we measured the learning curve for users first using the system, and measuring the time it took to author a recipe for each successive document (1st, 2nd, 3rd, 4th). For 10 users first facing the tool, the time it took to author the documents is shown in Table 10.1. The time distribution among 10 users was extremely uniform. We did not find variation in

125 Semantic Authoring Time Translation Time 14 (minutes) 6 (minutes)

Table 10.2: Translation vs. Semantic Authoring time. the quality of the authored documents across users and across document numbers. The tool is mastered quickly by users with no prior training in knowledge representation or natural language processing. Composing the reference documents (approximately 100-word recipes) by the authors took an average of 12 minutes. Speed – we measured the time required to compose a document as a semantic representation, and compared it to the time taken to translate the same document in a different language. We compared the average time for trained users to author a recipe (14 minutes) with that taken by two trained translators to translate 4 recipes (from English to Hebrew) (see Table 10.2).

The comparison is encouraging – it indicates that a tool for semantic authoring could become cost-effective if it is used to generate two or three languages. The rate of data entry with semantic structures is about half the rate of data entry in natural language for non-disabled users. While this factor of two in slowdown may sound severe for an AAC context, it is in fact very small compared to the other factors that slow down disabled users when selecting symbols. Since the method of semantic authoring focuses on checking the validity of input structures at data entry time, it may in fact speed up selection time – as is investigated specifically in the section below. Accuracy – We analyzed the errors in the documents prepared by the 10 users according to the following break-down:

• Words in the source document not present in the semantic form

• Words in the source document presented inaccurately in the semantic form

• User errors in semantic form that are not included in the former two parameters

We calculated the accuracy for each document produced by the subjects during the experiment. Then we compared each document with the corresponding reference document (used here as a gold standard). Relative accuracy of this form estimates a form of confidence – ”how sure can the user be that s/he wrote what s/he meant?” This measurement depends on the preliminary assumption

126 Document # Accuracy 1 93% 2 92% 3 95% 4 90%

Table 10.3: Accuracy percentage of four documents written in SAUT

Document # Accuracy User error 44% Ontology deficit 23% Tool limitations 33%

Table 10.4: Error analysis in subjects’ generated documents. that for a given recipe, any two readers (in the experimental environment – including the authors) will extract similar information. This assumption is warranted for cooking recipes. This measure takes into account the limitations of the tool and reflects the success of users in expressing all that the tool can express. As Table 10.3 shows, accuracy is quite consistent during the experiment sessions, i.e., it does not change as practice increases. The average 92.5% accuracy is quite high. We have categorized the errors found in subjects’ documents in the following manner (see Table 10.4):

• Content can be accurately expressed with SAUT (user error)

• Content will be accurately expressed with changes in the SAUT’s lexicon and ontology (on- tology deficit)

• Content cannot be expressed in the current implementation, and requires further investigation of the concept (implementation and conceptual limitations)

This breakdown indicates that the tool can be improved by investing more time in the GUI and feedback quality and by extending the ontology. The difficult conceptual issues (those which will require major design modifications, or put in question our choice of formalism for knowledge encoding) represent 33% of the errors – overall accounting for 2.5% of the words in the word count of the generated text.

127 10.5 Evaluating Efficiency

Since the system is not yet in a position to be tested with monitoring tools, it is possible to measure only selection savings. However, we can estimate the keystroke savings of the system (full evaluation will be done in the future) by defining a detailed evaluation scenario. For this estimation, we have collected a set of sentences written in Bliss, found in http://www.blissymbolics.org/canada/readingroom/english/text/filip – available Septem- ber 2005. This site has a collection of sentences written in Bliss and English (and vocalized). Table 10.5 shows a set of 19 sentences as they appear in the Internet site and the SAUT input specification language representation as we have authored it. The second column shows the number of words in the original sentence and the fourth one, number of steps needed for generating the parallel representation in SAUT language. 2 As can be seen, the total number of choice steps is 133, while the total number of words in the sentences is 122. However, counting the number of words does not include morphology which in Bliss symbols requires additional choices. We have counted the words in the sentences considering morphology markers of inflections as additional words, all summing to 138, as was suggested in [McCoy and Hershberger, 1999]. This simple ratio shows no improvement of keystrokes saving using our system. Savings, there- fore, must be calculated in terms of narrowing the choice possibilities in each step of the process. Assuming a display with 50 symbols (and additional keys for functions) – a vocabulary of requires 50 different screens. Assuming symbols are organized by frequencies (first screens present the most frequently used words) or by semantic domain. The overall number of selections is reduced using our communication board since the selectional restrictions narrow the number of possible choices that can be made at each step. The extent to which selection time can be reduced at each step depends on the application domain and the ontology structure. We cannot evaluate it in general, but expect that a well-structured ontology could support efficient selection mechanisms, by grouping semantically related symbols in dedicated displays. This point raises two issues: it is unclear to what extent selection speed is affected by physical

2Each step is a choice point, i.e., either a dot, comma, or space functionalities of the SAUT system.

128 disability and by cognitive factors. An ontologically motivated selection mechanism needs to be adapted both to the cognitive processes of the user and to his/her physical disabilities. Further progress on this issue will require empirical tests with disabled users in the context of a task-based evaluation.

10.6 Summary

In this section, we have reviewed evaluation strategies for both NLG and AAC systems. Both fields struggle with similar issues to define evaluation metrics that can be reproduced and can drive system improvement in a predictable manner. We have presented two aspects of the evaluation of the AAC system we developed: we first performed a user evaluation of the coverage, efficiency, and usability of the semantic authoring ap- proach as implemented in the SAUT system. This evaluation has been performed with non-disabled users in the domain of cooking recipes, and shows that authoring of semantic expressions, which can then be used for multilingual generation, requires about twice as much time as writing text in a natural language; usability is high, even on the rough software prototype we have implemented; coverage was good, given a domain-specific ontology. We then established a detailed evaluation scenario of the potential rate of data entry of the system by analyzing a small corpus of Bliss sentences. We compared a direct Bliss data entry process with our semantic authoring approach and counted the selection steps required. The same number of selection steps are computed. The semantic authoring approach, however, can generate fluent output in other languages (English and Hebrew, beyond the Bliss sequence – without requiring noisy translation). We also hypothesize that ontologically motivated grouping of symbols could speed up each selection step – but this claim must be assessed empirically in a task-based extrinsic evaluation, which remains to be done in the future.

129 # #Words #Morph #Steps Source sentence SAUT representation 1 5 5 I live in a house 3 Live(#I , House) 2 7 8 In the house there are many rooms 7 Exists(Location.In(#house), Room.Quantity(many)) 3 8 8 In the kitchen we make food and eat 9 Make(#we food),Eat(#we #food)),Location.In(kitchen) 4 7 8 The kitchen is yellow with blue doors 10 #kitchen.Attribute.Color(yellow). Have(door.plural.Attribute.Color(blue)) 5 5 6 Pablo and I are playing 3 Play(#Pablo, #I) 6 4 5 We are watching television 4 Watch(#Pablo,#I Television) 7 5 7 We are eating chocolate buns 7 Eat(#Pablo,#I Bun.Plural.Attribute(chocolate)) 8 6 7 The bed stands in the bedroom 3 Stand(Bed, Bedroom) 9 6 6 The bedroom has a green floor 5 Have(#bedroom, Floor.Color(green)) 10 9 9 In the bedroom I sleep and play with Pablo 8 Sleep(#I), Play(#I #Pablo).Location.In(Bedroom) 11 5 5 We have a special playroom 5 Have(#we, Playroom.Attribute(special)) 12 9 10 The playroom has blue walls and a blue floor 6 Have(#Playroom walls,floor.Color(blue)) 13 6 7 The computer stands in the playroom 3 Stand(computer, #Playroom) 14 7 8 On the veranda we have many flowers 8 Have(#we, Flower.Quantity(many)).Location.On(Veranda) 15 6 6 In the autumn the flowers die 5 Die(Flower.Plural).Time(autumn) 16 5 6 I watched football on television 4 Watch(#I football TV) 17 8 9 Today I am going to Heikleivvegen by taxi 7 Go(#I, Location.Name(Heikleivvegen)).Manner(taxi) 18 9 10 On Tuesday I played with Pablo in our room 10 Play(#I#Pablo).Location.In(room.Possessor(#we)).Time(Tuesday) 19 5 6 We played on the swing 6 Play(#I #Pablo).Location.On(swing) Sum 122 137 133

Table 10.5: Sentences vs. SAUT representation, number of words

130 Chapter 11

Contributions and future work

The design of an NLG system for AAC purposes must consider the special characteristics of an augmentative communication device: it is a domain-independent system where the vocabulary is determined by the symbol set that is used. The graphic design must consider possible selection methods (direct or via scanning). Since not all symbols/vocabulary can be accessed directly, there should be efficient ways to make it accessible to the user when needed. Moreover, since a symbol can refer to a concept, and therefore be realized with more than one specific word, the lexical chooser should find the most appropriate word with the use collocational data. The user should be able to control syntactic structures with minimum effort and the system should allow the possibility of doing so. The system was implemented for the set of approximately 2500 symbols found in the Hebrew Bliss lexicon [Shalit et al., 1992], but the automated tools enables changes and expansions of the vocabulary (and possibly change of symbol set to PCS or Rebus). The first steps of the process are language-independent and can be used in our case for both Hebrew and English. The core of this work is an integration of available resources into a new approach for generation from symbolic input, while considering a multi-lingual generation. The considerations in the overall compilation are multifold:

1. Implementing a Bliss dynamic display for AAC purposes, while enhancing communication rate.

131 2. Reducing errors in a symbols-to-text process that originate from parsing telegraphic text.

3. Wide-coverage, domain-dependent lexicon for generation.

4. Hebrew generation of text, with reference to English generation (as a basis of a multilingual generation system).

This work presents an NLG-AAC system that generates text from a sequence of symbols without the need for parsing. For the development of the communication board, we implemented several systems which are interrelated for the purpose but overall can be re-used in other novel systems.

11.1 Bliss symbols lexicon

We have designed and implemented a Bliss lexicon for both Hebrew and English, which can be used either as a stand-alone lexicon for reference or as a part of an application. In this work, it is used for representing symbols in our communication board, but in the future it will also be combined in an editor (in the ”writing with symbols” style). The idea behind the design of the lexicon takes advantage of the unique properties of the language. Technically, only a set of atomic shapes is physically drawn while combined symbols are generated automatically, following the symbol’s entry in a database that was constructed from the Hebrew and English Bliss Dictionaries. The lexicon was implemented in a way that allows searches through either text (a word), semantic components (e.g., ”all symbols that contain a wheel”), or by forms (e.g., ”all symbols that contain a circle”). As a byproduct, this implementation literally allows a visual inspection of words’ connectivity and in the future we will compare word relatedness as can be concluded from the Bliss lexicon vs. connectivity in other lexical knowledge bases such as WordNet.

11.2 HUGG

HUGG is the only syntactic realizer (SR) written for Hebrew generation. HUGG is implemented with FUF and inputs are designed to be as similar as possible to the inputs of the English SR SURGE.

132 The grammar, in the current state of the art, is designed to generate simple clauses, with special care given to realization of relations (possessives, existentials, attributives, and locatives).

11.3 Integration of a large-scale, reusable lexicon with a natural language generator

We have integrated a large-scale, reusable lexicon with FUF/SURGE as a tactical component, so the knowledge encoded in the lexicon and can be reused, as well as to automate to some extent the development of the lexical realization component in a generation application. The integration of the lexicon with FUF/SURGE also brings other benefits to generation, including the possibility of accepting a semantic input at the level of WordNet synsets, the production of lexical and syntactic paraphrases, the prevention of non-grammatical output, reuse across applications, and wide coverage. We have presented the process of integrating the lexicon with FUF/SURGE, including how to represent the lexicon in FUF format, how to unify input with the lexicon incrementally to generate more sophisticated and informative representations, and how to design an appropriate semantic input format so that the integration of the lexicon and FUF/SURGE can be done easily.

11.4 SAUT

SAUT [Biller, 2005] [Biller et al., 2005] is an authoring system for logical forms encoded as con- ceptual graphs (CG). The system belongs to the family of WYSIWYM (What You See Is What You Mean) text generation systems: logical forms are entered interactively and the corresponding linguistic realization of the expressions is generated in several languages. The system maintains a model of the discourse context corresponding to the authored documents. The system helps users author documents formulated in the CG format. In a first stage, a domain-specific ontology is acquired by learning from example texts in the domain. The ontology acquisition module builds a typed hierarchy of concepts and relations derived from the WordNet and Verbnet. The user can then edit a specific document by entering utterances in sequence and maintaining a representation of the context. While the user enters data, the system performs the standard

133 steps of text generation on the basis of the authored logical forms: reference planning, aggregation, lexical choice, and syntactic realization – in several languages (we have implemented English and Hebrew, and are exploring an implementation using the Bliss graphical language). The feedback in natural language is produced in real-time for every modification performed by the author.

11.5 Communication Board

The purpose of this work was to design an NLG symbols-to-text system for AAC purposes. In the design of an AAC system, the main motivation is to provide the user with a communication tool that enables a possibly high rate of communication alongside as wide an expressive power as possible. Using NLG techniques for the purpose is motivated when considering a telegraphic text to be the input for generation system, saving the user avoidable keystrokes for function words (determiners, preposition), or handling morphology (such as inflections, plural markers). The display we designed was inspired by both the semantic authoring technique as implemented in SAUT as well as dynamic displays as studied by [Burkhart, 2005]. The symbols displayed on the screen in each step of symbol insertion depends on the context of symbols as previously seen. For example, if the previous symbol was of a verb which requires an instrumental theme, only symbols that can function as instruments are presented on the current display. A general context of each utterance or conversation can be determined by the user, therefore narrowing the diversity of symbols displayed.

11.6 Future Work

The system presented here is a prototype and there are various issues that still need to be investi- gated and developed.

Lexicons Since there are not yet fully implemented lexical resources such as WordNet, VerbNet, or Comlex for Hebrew, the lexical data is hand-coded and cannot be as comprehensive as the English one. An ongoing project of Hebrew Computational Linguistics is currently being conducted (Knowledge Center for Processing Hebrew of the Ministry of Science in Israel - http://mila.cs.technion.ac.il/) including a Hebrew lexicon of words. However, this

134 lexicon was designed for morphological analyzers and the information does not always answers the needs of text-generation. We intend to develop a VerbNet database for Hebrew verbs.

Another lexical issue should be standardizing the meaning of symbol sets such as Blissymbols, PCS, and Rebus with reference to lexical knowledge bases such as WordNet. From a practical point of view, to use this system with another set of symbols, such as the more common PCS, the ontology which is based on the synsets of the Bliss symbols, will have to be re-built to be adjusted to the PCS symbols. Moreover, since the mapping between Bliss symbols and the WordNet senses were done by the author, it could be judged differently by other subjects.

Bliss Symbols and Communication Board In the Bliss Symbols language, an indicator can change the part of speech of the word that the symbol refers to. For instance, adding an evaluation indicator to the symbol of electricity will shift the meaning of the symbol to electric. In the current version of the lexicon, these two possible meanings of the symbol must be hard-coded. However, adding to the system a morphological module that can do derivations (and not only inflections), will enable a more creative use of the symbols.

An additional application of the Bliss lexicon is an editor, of the Writing In Symbols style, where Hebrew text is inserted and the Bliss symbols are inserted above them. This kind of application requires morphological analysis of Hebrew in order to identify suffixes and prefixes and to find the root of a verb, the tense, and other possible inflections.

The display will be tuned and tested for access with existing selection devices. In the current state of the art, we have not implemented any alternative for access except for direct selec- tion with a mouse. In addition, we did not refer to voice output which is a very important component of a communication board. NLG text-to-speech systems use the deep informa- tion of the sentence structure for determining intonation issues. Moreover, the complexity of morphological analysis in Hebrew, when processed to synthesized speech, can be avoided if the information on the words does not have to be concluded but is given explicitly.

Processing Techniques As works on prestored sentences have shown ([Waller et al., 2000b], [Vanderheyden and Pennington, 1998]), using prestored messages is efficient in several con- texts. Integrating techniques of prestored sentences (and logging utterances online for future

135 use) can make the system more usable. Moreover, applying machine learning techniques on the history of text generation of a single user can make the prediction more accurate (by updating frequencies, for instance).

Evaluation The discussion on evaluation of NLG and AAC systems in Chapter 10 surveyed several possible methods for evaluation. We have evaluated our system in AAC terms, while an evaluating it as an NLG system will require separate measures for each component, such as the syntactic realizer or the lexicon coverage. In addition, a subject-based evaluation of use and communication rate should be conducted with real AAC users.

Since there are several evaluation measures that are common to the two research areas – such as the Pragmatic function match that was defined in [Cornish and Higginbotham, 2000a] and the extrinsic evaluation that was offered by [Bangalore et al., 1998], evaluation that satisfies both criteria is possible.

136 Bibliography

[Andreasen et al., 1998] Andreasen, P., Waller, A., and Gregor, P. (1998). Blissword – full access to blissymbols for all users. In Proceedings of the 8th Biennial Conference of the Int. Society for AAC, pages 167–168, Dublin, Ireland. ISAAC.

[ASHA, 1991] ASHA (1991). Report: Augmentative and alternative communication. Amer- ican Speech-Language-Hearing association, 33(Suppl. 5):9–12.

[Baker, 1984] Baker, B. (1984). Semantic compaction for sub-sentence vocabulary units com- pared to other encoding and prediction systems. In Proceedings of the 10th Conference on Re-habilitation Technology, pages 118–120, San Jose, California. RESNA.

[Baker et al., 1998] Baker, C. ., Fillmore, C. ., and Lowe, J. B. (1998). The Berkeley FrameNet project. In In Proceedings of the COLING-ACL, Montreal, Canada.

[Baldwin, 1995] Baldwin, F. B. (1995). CogNIAC: A Discourse Processing Engine. PhD thesis, University of Pennsylvania, Department of Computer and Information Sciences.

[Bangalore et al., 2000] Bangalore, S., Rambow, ., and Wittaker, S. (2000). Evaluation metrics for generation. In Proceedings of the First International Natural Language Gener- ation Conference (INLG2000), Mitzpe Ramon, Israel.

[Bangalore et al., 1998] Bangalore, S., Sarkar, A., Doran, C., and Hockey, B.-A. (1998). Grammar and parser evaluation in the XTAG project. In Proceedings of Workshop on Evaluation of Parsing Systems, Granada, Spain.

[Barzilay et al., 1999] Barzilay, ., McKeown, K., and Elhadad, M. (1999). Information fu- sion in the context of multi-document summarization. In Proceeding of ACL’99, Maryland. ACL.

137 [Bateman, 1997] Bateman, J. (1997). KPML Development Environment: multilingual lin- guistic resource development and sentence generation. GMD, IPSI, Darmstadt, Germany. www.darmstadt.gmd.de/publish/ komet/kpml.html.

[(BCI), 2004] (BCI), B. C. I. (2004). The fundemental rules of blissymbolics: creating new blissymbolics characters and vocabulary. Technical report, BCI.

[Bentur et al., 1992] Bentur, E., Angel, A., and Segev, D. (1992). Computerized analysis of Hebrew words. Hebrew Linguistics, 36:33–38. in Hebrew.

[Beukelman and Mirenda, 1998] Beukelman, D. R. and Mirenda, P. (1998). Augmentative and Alternative Communication - Management of Severe Communication Disorders in Children and Adults. Paul . Brookes Publishing Co., second edition.

[Biller, 2005] Biller, O. (2005). Semantic authoring for multilingual text generation. Master’s thesis, Department of Computer Science, Ben Gurion University, Israel.

[Biller et al., 2005] Biller, O., Elhadad, M., and Netzer, Y. (2005). Interactive authoring of logical forms for multilingual generation. In Proceedings of the 10th European workshop on natural language generation, Aberdeen, Scotland.

[Bliss, 1965] Bliss, C. K. (1965). Semantography (Blissymbolics). Semantography Press, Sid- ney.

[Boissiere, 2003] Boissiere, P. (2003). An overview of existing writing assistance systems. In French-Spanish Workshop on Assistive Technology.

[Burkhart, 2005] Burkhart, L. J. (2005). Designing dynamic displays for the beginning com- municator. http://www.lburkhart.com/.

[Callaway, 2003] Callaway, C. (2003). Evaluating coverage for large symbolic NLG grammars. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), pages 811–817, Acapulco, Mexico.

[Callaway, 2005] Callaway, C. (2005). The types and distributions of errors in a wide coverage surface realizer evaluation. In Proceedings of the 10th European Workshop on Natural Language Generation, Aberdeen, Scotland.

138 [Callaway et al., 1999] Callaway, C. B., Daniel, B., and Lester, J. C. (1999). Multilingual natural language generation for 3d learning environments. In Argentine Symposium on Artificial Intelligence, Buenos Aires, Argentina. (to appear).

[CallCentre, 1998] CallCentre (1998). Augmentative Communication in Practise: Scotland - An Introduction. http://callcentre.education.ed.ac.uk, second edition.

[Canning et al., 2000] Canning, Y., Tait, J., Archibald, J., and Crawley, R. (2000). Cohesive regeneration of syntactically simplified newspaper text. In 1st Workshop on Robust Methods in Analysis of Natural language Data.

[Carroll et al., 1998] Carroll, J., Minnen, G., Canning, Y., Devlin, S., and Tait, J. (1998). Practical simplification of English newspaper text to assist aphasic readers. In AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology, Madison, Wis- consin. preliminary research report.

[Claypool et al., 1998] Claypool, T., Ricketts, I., Gregor, P., Booth, L., and Palazuelos, S. (1998). Learning rates of a tri-gram based Gaelic word predictor. In Proceedings of the 8th Biennial Conference of the International Society for Augmentative and Alternative Communication, pages 177–178, Dublin, Ireland. ISAAC.

[Coch, 1998] Coch, J. (1998). Interactive generation and knowledge administration in Mul- tiMeteo. In Proceedings of the 9th INLG Workshop, pages 300–303, Canada.

[Copestake, 1996] Copestake, A. (1996). Applying natural language processing techniques to speech prostheses. In Working Notes of the 1996 AAAI Fall Symposium on Developing Assistive Technology for People with Disabilities.

[Copestake, 1997] Copestake, A. (1997). Augmented and alternative NLP techniques for augmentative and alternative communication. In Proceedings of the ACL workshop on Natural Language Processing for Communication Aids, Madrid.

[Copestake and Flickinger, 1998] Copestake, A. and Flickinger, D. (1998). Evaluation of NLP technology for AAC using logged data. In Filip Loncke, J. C. and Lloyd, L., editors, ISAAC 98 research symposium proceedings, London. Whurr Publishers.

139 [Cornish and Higginbotham, 2000a] Cornish, J. and Higginbotham, D. J. (2000a). AAC de- vice testing. http://www.cadl.buffalo.edu/download/DeviceTesting.pdf. CADL Working papers (2000:2, rev 1.).

[Cornish and Higginbotham, 2000b] Cornish, J. L. and Higginbotham, D. J. (2000b). Tool for evaluating communication rate in interactive contexts. http://www.cadl.buffalo.edu/download/BigUnits1.pdf. CADL Working papers (2000:2, rev 1.).

[Dahan-Netzer, 1997] Dahan-Netzer, Y. (1997). Design and evaluation of a functional input specification language for the generation of bilingual nominal expressions (He- brew/English). Master’s thesis, Department of Computer Science, Ben Gurion University, Beer-Sheva Israel. (in Hebrew).

[Dahan-Netzer and Elhadad, 1998a] Dahan-Netzer, Y. and Elhadad, M. (1998a). Generat- ing determiners and quantifiers in Hebrew. In Proceeding of Workshop on Computational Approaches to Semitic Languages, Montreal, Canada. ACL.

[Dahan-Netzer and Elhadad, 1998b] Dahan-Netzer, Y. and Elhadad, M. (1998b). Genera- tion of noun compounds in Hebrew: Can syntactic knowledge be fully encapsulated? In Proceedings of INLG’98, pages 168–177, Niagara-on-the-Lake, Canada.

[Dahan-Netzer and Elhadad, 1999] Dahan-Netzer, Y. and Elhadad, M. (1999). Bilingual Hebrew-English generation of possessives and partitives: Raising the input abstraction level. In Proceedings of the 37th Annual Meeting of ACL.

[Dale and Mellish, 1998] Dale, R. and Mellish, C. (1998). Towards the evaluation of natural language generation. In Proceedings of the First International Conference on Evaluation of Natural Language Processing Systems, Granada, Spain.

[Delin et al., 1994] Delin, J., Hartley, A., Paris, C. L., Scott, D., and Linden, K. V. (1994). Expressing Procedural Relationships in Multilingual Instructions. In Proceedings of the 7th. Int. Workshop on NLG, pages 61–70.

[Dorr, 1994] Dorr, B. (1994). Machine translation divergences: A formal description and proposed solution. Journal of Computational Linguistics, 20(4):597–663.

140 [Dorr et al., 1998] Dorr, B. J., Habash, N., and Traum, D. (1998). A thematic hierarchy for efficient generation from lexical-conceptual. Technical Report CS-TR-3934, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland.

[Ducrot and Todorov, 1983] Ducrot, O. and Todorov, T. (1983). Encyclopedic Dictionary of the Sciences of Language. The Johns Hopkins University Press, Maryland.

[Elhadad, 1991] Elhadad, M. (1991). FUF user manual - version 5.0. Technical Report CUCS-038-91, University of Columbia.

[Elhadad, 1992] Elhadad, M. (1992). Using argumentation to control lexical choice: a unification-based implementation. PhD thesis, Computer Science Department, Columbia University.

[Elhadad, 1993] Elhadad, M. (1993). FUF – the Universal Unifier. Department of Computer Science, Ben Gurion University, 5.2 edition. http://www.cs.bgu.ac.il/˜yaeln/fufman.

[Elhadad and Robin, 1996] Elhadad, M. and Robin, J. (1996). An overview of SURGE: a re-usable comprehensive syntactic realization component. In Proceedings of INLG’96, Brighton, UK. (demonstration session).

[Faber and Us´on,1999] Faber, P. B. and Us´on,R. M. (1999). Constructing a Lexicon of English Verbs. Number 23 in Functional Grammar Series. Mouton de Gruyter, Berlin, New York.

[Fawcett, 1987] Fawcett, R. P. (1987). The semantics of clause and verb for relational pro- cesses in english. In Halliday, M. A. and Fawcett, R. P., editors, New Developments in Systemic Linguistics, volume 1. Frances Pinter, London.

[Garay-Vitoria and Abascal, 1997] Garay-Vitoria, N. and Abascal, J. G. (1997). Word pre- diction for inflected languages. application to . In Proceedings of the ACL workshop on Natural Language Processing for Communication Aids, Madrid.

[Garay-Vitoria and Abascal, 2004] Garay-Vitoria, N. and Abascal, J. G. (2004). A com- parison of prediction techniques to enhance the communication rate. In Stary, C. and Stephanidis, C., editors, Proceedings of the 8th ERCIM Workshop on User Interfaces for

141 All, Vienna, Austria. User-Centered Interaction Paradigms for Universal Access in the Information Society, Springer. Lecture Notes in Computer Science 3196.

[Gatti and Matteucci, 2005] Gatti, N. and Matteucci, M. (2005). CABA2L a Bliss predictive composition assistant for AAC communication software. In Seruca, I. and Filipe, J., editors, Enterprise Information Systems VI. Kluwer Publisher, Amsterdam, The Netherlands.

[Goldberg et al., 1994] Goldberg, E., Driedger, N., and Kittredge, R. (1994). Using natural- language processing to produce weather forecasts. IEEE Expert, 9(2):45–53.

[Grishman and Sterling, 1989] Grishman, R. and Sterling, J. (1989). Analyzing telegraphic messages. In Proceedings of DARPA Speech and Natural Language Workshop, pages 204– 208, Philadelphia.

[Halliday, 1994] Halliday, M. A. K. (1994). An Introduction to Functional Grammar. Edward Arnold, London, second edition.

[Hartley et al., 2000] Hartley, A., Scott, D., Kruijff-Korbayouva, I., Sharoff, S., Teich, E., Sokolova, L., Staykova, K., Dochev, D., Cmajrek, M., and Hana, J. (2000). Evaluation of the final prototype. Technical report, Brighton University.

[Hehner, 1980] Hehner, B. (1980). Blissymbols for use. Blissymbolics Communication Insti- tute. Contributors: Jinny Storr, Peter Reich, Shirley McNaughton and Don Mills.

[Henkin, 1994] Henkin, R. (1994). There is this too. Hebrew Linguistics, (38):41–54. In Hebrew.

[Higginbotham, 1995] Higginbotham, D. J. (1995). Use of nondisabled subjects in AAC re- search: Confessions of a research infidel. AAC Augmentative and Alternative Communica- tion, 11. AAC Research forum.

[Hill and Romich, 2001] Hill, K. and Romich, B. (2001). AAC clinical summary measures for characterizing performance. In Proceedings of Technology and Persons with Disabilities CSUN. CSUN. http://www.csun.edu/cod/conf2001/proceedings/0098hill.html.

[Hill and Romich, 2002] Hill, K. and Romich, B. (2002). A rate index for augmentative and alternative communication. International Journal of Speech Technology, (5):57–64.

142 [Hill et al., 2001] Hill, K., Romich, B., and Holko, R. (2001). AAC performance: The ele- ments of communication rate. In ASHA, New Orleans.

[Hourcade et al., 2004] Hourcade, J., Pilotte, T. E., West, E., and Parette, P. (2004). A history of augmentative and alternative communication for individuals with severe and profound disabilities. Focus on Autism and Other Developmental Disabilities, 19(14):235– 245.

[Hovy and Lin, 1998] Hovy, E. and Lin, C. (1998). Automated text summarization in SUM- MARIST. In Maybury, M. and Mani, I., editors, Automatic Text Summarization. MIT Press, Cambridge.

[Hunnicutt, 1986] Hunnicutt, S. (1986). Bliss symbol-to-speech conversion: Blisstalk. Journal of the American Voice I\O Society, 3:19–38.

[Jing et al., 2000] Jing, H., Dahan-Netzer, Y., Elhadad, M., and McKeown, K. (2000). Inte- grating a large-scale, reusable lexicon with a natural language generator. In Proceedings of the 1st INLG conference, pages 209–216, Mitzpe Ramon, Israel.

[Jing and McKeown, 1998] Jing, H. and McKeown, K. (1998). Combining multiple, large- scale resources in a reusable lexicon for natural language generation. In 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL’98), Montreol.

[Joshi, 1987] Joshi, A. K. (1987). An introduction to tree adjoining grammars. In Manaster- Ramer, A., editor, Mathematics of Language. John Benjamins, Amsterdam.

[Karberis and Kouroupetroglou, 2002] Karberis, G. and Kouroupetroglou, G. (2002). Trans- forming spontaneous telegraphic language to well-formed Greek sentences for alternative and augmentative communication. In SETN ’02: Proceedings of the Second Hellenic Con- ference on AI, pages 155–166, London, UK. Springer-Verlag.

[Kharitonov, 1999] Kharitonov, M. (1999). CFUF: A fast interpreter for the functional uni- fication formalism. Master’s thesis, Ben Gurion University, Israel.

143 [Kipper et al., 2000] Kipper, K., Dang, H. T., and Palmer, M. (2000). Class-based construc- tion of a verb lexicon. In Proceeding of AAAI-2000.

[Kukich, 1983] Kukich, K. (1983). Knowledge-based report generation: A technique for au- tomatically generating natural language reports from databases. In Proceedings of the 6th International ACM SIGIR Conference.

[Langer and Newell, 1997] Langer, S. and Newell, A. (1997). Alternative routes to commu- nication. The Newsletter of the European Network in Language and Speech.

[Langkilde and Knight, 1998] Langkilde, I. and Knight, K. (1998). The practical value of n-grams in generation. In Proceedings of INLG’98, pages 248–255, Niagara-on-the-Lake, Canada.

[Lavoie and Rambow, 1997] Lavoie, B. and Rambow, O. (1997). A fast and portable realizer for text generation systems. In ANLP’97, Washington, DC. www.cogentex.com/systems/realpro.

[Lee et al., 1997] Lee, Y.-S., Weinstein, C., Seneff, S., and Tummala, D. (1997). Ambiguity resolution for machine translation of telegraphic messages. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pages 120–127, Morristown, NJ, USA. Association for Computational Linguistics.

[Levin, 1993] Levin, B. (1993). English Verb Classes and Alternations: Apreliminary Inves- tigation. University of Chicago Press, Chicago Illinois.

[Liben-Nowell, 2000] Liben-Nowell, D. (2000). Text Simplification. PhD thesis, MPhil, Com- puter Speech and Language Processing, University of Cambridge, Churchill College.

[Macleod and Grishman, 1995] Macleod, C. and Grishman, R. (1995). COMLEX Syntax Reference Manual. Proteus Project, NYU.

[Mann, 1996] Mann, G. (1996). Control of a Navigating, Rational Agent by Natural Language. PhD thesis, School of Computer Science and Engineering, University of New South Wales. http://www.it.murdoch.edu.au/˜mann/NL/BEELINE.html.

144 [Mann, 1983] Mann, . C. (1983). An overview of the Penman text generation system. pages 261–265. Also appears as USC/Information Sciences Institute Tech Report RR-83-114.

[McCoy et al., 1994] McCoy, K., McKnitt, W., Peischl, D., Pennington, C., Vanderheydan, P., and Demasco, P. (1994). AAC-user therapist interactions: Preliminary linguistic ob- servations and implications for Compansion. In Proceedings of RESNA ’94 17th Annual Conferenc, Nashville, Tennessee.

[McCoy et al., 1998] McCoy, K., Pennington, C., and Badman, A. L. (1998). Compansion: From research prototype to practical integration. Natural Language Engineering, (4):41–55. Cambridge University Press.

[McCoy, 1997] McCoy, K. F. (1997). Simple NLP techiques for expanding telegraphic sen- tences. In Proceedings of workshop on NLP for Communication Aids, Madrid. ACL/EACL.

[McCoy et al., 2001] McCoy, K. F., Bedrosian, J. L., and Hoag, L. A. (2001). Pragmatic trade-offs in utterance-based systems: Uncovering technological implications. ASHA (American Speech-Language-Hearing Association), Division 12 Newsletter. Guest Editor: Jeff Higginbotham.

[McCoy and Hershberger, 1999] McCoy, K. F. and Hershberger, D. (1999). The role of eval- uation in bringing NLP to AAC: A case to consider. In Loncke, F. T., Clibbens, J., Arvidson, H. H., and Lloyd, L. L., editors, Augmentative and Alternative Communication: New Directions in Research and Practice, pages 105–122. Whurr Publishers, London.

[McDonald, 1982] McDonald, E. T. (1982). Teaching and Using Blissymbolics. Blissymbolics Communication Institute.

[Mel’cuk. and Pertsov, 1987] Mel’cuk., I. and Pertsov, N. (1987). Surface Syntax of En- glish - a Formal Model within the Meaning-Text Framework. John Benjamins, Amster- dam/Philadelphia.

[Miliaev et al., 2003] Miliaev, N., Cawsey, A., and Michaelson, G. (2003). Applied NLG system evaluation, FlexyCAT. In Preceedings of the 9th European Workshop on Natural Language Generation (in conjunction with EACL2003), Budapest, Hungary.

145 [Miller et al., 1990] Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. (1990). Introduction to WordNet: an on-line lexical database. International Journal of Lexicogra- phy (special issue), 3(4):235–312.

[Miller, 1995] Miller, G. A. (1995). WordNet: a lexical database for English. Commun. ACM, 38(11):39–41.

[Moulton et al., 1999] Moulton, B. J., Lesher, G. W., and Higginbotham, D. J. (1999). A system for automatic abbreviation expansion. In Proceedings of the RESNA ’99 Annual Conference, pages 55–57, Arlington, VA. RESNA Press.

[Nir, 2005] Nir, M. (2005). Bliss - is it really a second language? ISAAC-Israel Annual. (in Hebrew).

[Ordan and Wintner, 2005] Ordan, N. and Wintner, S. (2005). Representing natural gender in multilingual lexical databases. International Journal of Lexicography, 18(3).

[Paris and Linden, 1996] Paris, C. and Linden, K. V. (1996). Building knowledge bases for the generation of software documentation. In Meeting of the International Association for Computational Linguistics (COLING-96).

[Paris and Vander Linden, 1996] Paris, C. and Vander Linden, K. (1996). DRAFTER: An interactive support tool for writing multilingual instructions. IEEE Computer, 29(7):49–56.

[Pennington and McCoy, 1998] Pennington, C. A. and McCoy, K. F. (1998). Providing intel- ligent language feedback for augmentative communication users. In et al., V. O. M., editor, Assistive Technology and AI, number 1458 in LNAI, pages 59–72. Springer-Verlag, Berlin Heidelberg.

[Pennington et al., 1998] Pennington, C. A., McCoy, K. F., Bedrosian, J. L., and Hoag, L. A. (1998). Important issues for effectively using prestored text in augmentative communica- tion. In 1998 AAAI Workshop on Integrating Artificial Intelligence and Assistive Technol- ogy, pages 48–54, Madison, Wisconsin.

146 [Pianta et al., 2002] Pianta, E., Bentivogli, L., and Girardi, C. (2002). MultiWordNet: devel- oping an aligned multilingual database. In Proceedings of the First International Conference on Global WordNet, Mysore, .

[Pollard and Sag, 1987] Pollard, C. and Sag, I. (1987). Information-based Syntax and Seman- tics - Volume 1, volume 13 of CSLI Lecture Notes. University of Chicago Press, Chicago, Il.

[Porter, 2000] Porter, G. (2000). Low-tech dynamic displays: User friendly multi-level com- munication books. In Proceedings of ISAAC Ninth Biennial Conference, Washington, DC.

[Power and Scott, 1998] Power, R. and Scott, D. (1998). Multilingual authoring using feed- back texts. In Proceedings of COLING-ACL 98, Montreal, Canada.

[Quinlan, 1992] Quinlan, P. (1992). The Oxford Psycholinguistic Database. Oxford University Press.

[Quirk et al., 1985] Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J. (1985). A compre- hensive grammar of the . Longman.

[Reiter and Dale, 1992] Reiter, E. and Dale, R. (1992). A fast algorithm for the generation of referring expressions. In Proceedings of the 14th COLING, pages 232–238, Nantes, France.

[Reiter and Dale, 2000] Reiter, E. and Dale, R. (2000). Building Natural-Language Genera- tion Systems. Cambridge University Press.

[Roberts, 1973] Roberts, D. D. (1973). The Existential Graphs of Charles S. Peirce. Mouton and Co.

[Rosner and Stede, 1994] Rosner, D. and Stede, M. (1994). Generating multilingual docu- ments from a knowledge base: The TECHDOC project. pages 339–346.

[Ruppenhofer et al., 2005] Ruppenhofer, J., Ellsworth, M., Petruck, M. R. L., and Johnson, C. R. (2005). FrameNet: Theory and practice. Online Book. http://framenet.icsi.berkeley.edu/book/book.html.

147 [Schabes et al., 1988] Schabes, Y., Abeille, A., and Joshi, A. K. (1988). Parsing strategies with Lexicalized Grammars: Application to tree adjoining grammars. In Proceedings of the 12th COLING, pages 578–583, Budapest, Hungary.

[Scott et al., 1998] Scott, D., Power, R., and Evans, R. (1998). Generation as a solution to its own problem. In Proceedings of INLG’98, pages 256–265, Niagara-on-the-Lake, Canada.

[Shalit et al., 1992] Shalit, A., Wine, J., and Yaniv, K. (1992). Hebrew Blissymbols Lexicon. ISAAC-Israel.

[Shaw, 1995] Shaw, J. (1995). Conciseness through aggregation in text generation. In Pro- ceedings of the 33rd conference on ACL, pages 329–331, Morristown, NJ, USA.

[Shaw et al., 1994] Shaw, J., Kukich, K., and Mckeown, K. (1994). Practical issues in auto- matic documentation generation. In Proceeding of the 4th ANLP, pages 7–14.

[Shieber and Baker, 2003] Shieber, S. M. and Baker, E. (2003). Abbreviated text input. In IUI’03, Miami, Florida, USA.

[Sowa, 1984] Sowa, J. F. (1984). Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley.

[Sowa, 1987] Sowa, J. F. (1987). Semantic networks. In Shapiro, S. C., editor, Encyclopedia of Artificial Intelligence 2. John Wiley & Sons, New York.

[Sripada et al., 2005] Sripada, S. G., Reiter, E., and Hawizy, L. (2005). Evaluating an NLG system using post-edit data: Lessons learned. In Proceedings of ENLG-2005, pages 133–139, Aberdeen, Scotland.

[Stede, 1996] Stede, M. (1996). Lexical semantics and knowledge representation in multilin- gual sentence generation. PhD thesis, Graduate Department of Computer Science, Univer- sity of Toronto.

[Temizsoy and Cicekli, 1998] Temizsoy, M. and Cicekli, I. (1998). A language-independent system for generating feature structures from interlingua representations. In Proceedings of INLG’98, pages 188–197, Niagara-on-the-Lake, Canada.

148 [Ushioda et al., 1993] Ushioda, A., Evans, D. A., Gibson, T., and Waibel, A. (1993). Fre- quency estimation of verb subcategorization frames based on syntactic and multidimen- sional statistical analysis. In Proceedings of the 3rd International Workshop on Parsing Technologies (IWPT3), Tilburg, The Netherlands.

[Vaillant, 1997] Vaillant, P. (1997). A semantic-based communication system for dysphasic subjects. In Proceedings of the 6th conference on Artificial Intelligence in Medicine (AIME’97), Grenoble, France.

[Vaillant and Checler, 1995] Vaillant, P. and Checler, M. (1995). Intelligent voice prosthesis: converting icons into natural language sentences. Computation and Language E-print Archive. http://xxx.lanl.gov/aba/cmp-lg/9506018.

[Vanderheyden et al., 1996] Vanderheyden, P., Damesco, P., and McCoy, K. (1996). A pre- liminary study into schema-based access and organization of reusable text in AAC. In Langton, A., editor, Proceedings of the RESNA ’96 Annual Conference, Salt Lake City, UT.

[Vanderheyden and Pennington, 1998] Vanderheyden, P. B. and Pennington, C. A. (1998). An augmentative communication interface based on conversational schemata. In Assis- tive Technology and Artificial Intelligence, Applications in Robotics, User Interfaces and Natural Language Processing, pages 109–125, London, UK. Springer-Verlag.

[VanderLinden and Scott, 1995] VanderLinden, K. and Scott, D. (1995). Raising the interlin- gual ceiling in multilingual text generation. In the Multilingual Natural Language Genera- tion Workshop, International Joint Conference in Artificial Intelligence (IJCAI’95), pages 95–109, Montreal.

[Waller and Jack, 2002] Waller, A. and Jack, K. (2002). A predictive blissymbolic to English translation system. In Proceedings of ASSETS 2002, pages 186–191, Edinburgh, Scotland.

[Waller et al., 2000a] Waller, A., O’Mara, D., Tait, L., booth, L., Hood, H., and Brophy- Arnott, B. (2000a). The development and evaluation of a narrative-based AAC approach. In Proceedings of the Ninth Biennial Conference of the International Society for Augmentative and Alternative Communication, pages 232–234, Washington D.C. ISAAC.

149 [Waller et al., 2000b] Waller, A., O’mara, D., Tait, L., Hood, H., Booth, L., and Brophy- Arnott, B. (2000b). Integrating a story-based aid within curriculum. AAC 2000 Practical Approaches to Augmentative and Alternative Communication.

[Woltosz, 1997] Woltosz, W. (1997). Dynamic vs. static displays: What are the issues? In CSUN 1997 Conference. CSUN Center On Disabilities. http://www.dinf.ne.jp/doc/english/Us Eu/conf/csun 97/csun97 072.htm.

150