Constraint Solving and Language Processing

Constraint Solving and Language Processing International Workshop, CSLP 2004 Roskilde University, 1–3 September 2004 Proceedings Edited by Henning Christiansen Peter Rossen Skadhauge Jørgen Villadsen Preface The purpose of the workshop is to provide an overview of activities in the field of Constraint Solving with special emphasis on Natural Language Processing and to provide a forum for researchers to meet and exchange ideas. Constraint Solving (CS), in particular Constraint Logic Programming (CLP), is a promising platform, perhaps the most promising present platform, for bringing forward the state of the art in language processing. The data subjected to processing via constraint solving may include written and spoken language, formal and semiformal language, and even general input data to multimodal and pervasive systems. CLP and CS have been applied in projects for shallow and deep analysis and generation of language, and to different sorts of languages. The view of grammar expressed as a set of conditions simultaneously constraining and thus defining the set of possible utterances has influenced formal linguistic theory for more than a decade. CLP and CS provide flexibility of expression and potential for interleaving the different phases of language processing, including handling of pragmatic and semantic information, e.g. ontologies. This volume contains papers accepted for the workshop based on an open call, contributions from the invited speakers, and abstract of a tuturial. We are very honoured that a selection of highly distinguished researchers in the field has accepted our invitation to talk at the workshop: Philippe Blache, Veronica Dahl, Denys Duchier, and Gerald Penn. Following the workshop, an edited volume of selected and revised contributions will be produced for wider publication, possibly with other invited contributions in order to cover the field. We want to thank the program committee, which is listed below, the invited speakers, and all researchers who submitted papers to the workshop. The workshop is supported by the CONTROL project, CONstraint based Tools for RObust Language processing, funded by the Danish Natural Science Research Council; CMOL, Center for Computational Modelling of Language at Copen- hagen Business School; and Computer Science Section at Roskilde University, that also hosts the workshop. The editors Roskilde, August 2004 Organizers Henning Christiansen, Roskilde University (chair) Peter Skadhauge, Copenhagen Business School Jørgen Villadsen, Roskilde University Program committee Troels Andreasen, Roskilde, Denmark Philippe Blache, Aix-en-Provence, France Henning Christiansen, Roskilde, Denmark (Chair) Veronica Dahl, Simon Fraser University, Canada Denys Duchier, LORIA, France John Gallagher, Roskilde, Denmark Claire Gardent, LORIA, France Daniel Hardt, Copenhagen Business School, Denmark Peter Juel Henrichsen, Copenhagen Business School, Denmark Jørgen Fischer Nilsson, Technical University of Denmark Kiril Simov, Bulgarian Academy of Science Peter Skadhauge, Copenhagen Business School, Denmark Jørgen Villadsen, Roskilde, Denmark Contents Invited Talks Syntactic Structures as Constraint Graphs Philippe Blache ..........................................................1 An Abductive Treatment of Long Distance Dependencies in CHR Veronica Dahl ........................................................... 4 Invited Talk (no title provided at time of printing) Denys Duchier ..........................................................16 The Other Syntax Gerald Penn ............................................................17 Contributed Papers Gradience, Constructions and Constraint Systems Philippe Blache and Jean-Philippe Prost .................................18 Problems of Inducing Large Coverage Constraint-Based Dependency Grammar for Czech Ondˇrej Bojar ...........................................................29 Metagrammar Redux Benoit Crabbé and Denys Duchier ......................................43 Multi-dimensional Graph Configuration for Natural Language Processing Ralph Debusmann, Denys Duchier, and Marco Kuhlmann ...............59 An Intuitive Tool for Constraint Based Grammars Mathieu Estratat and Laurent Henocque .................................74 A Broad-Coverage Parser for German Based on Defeasible Constraints Kilian A. Foth, Michael Daum, and Wolfgang Menzel ...................88 The Role of Animacy Information in Human Sentence Processing Captured in Four Conflicting Constraints Monique Lamers and Helen de Hoop ...................................102 An Exploratory Application of Constraint Optimization in Mozart to Probabilistic Natural Language Processing Irene Langkilde-Geary .................................................114 A Constraint-Based Model for Preposition Choice in Natural Language Generation Véronique Moriceau and Patrick Saint-Dizier ..........................124 Rapid Software Prototyping of an Arabic Morphological Analyzer in CLP Hamza Zidoum ........................................................139 Tutorials A Tutorial on CHR Grammar Henning Christiansen ..................................................148 Contributed Short Presentation/Poster Representing Act-Topic-based Dialogue Phenomena Hans Dybkjær and Laila Dybkjær ......................................154 Multi-dimensional Type Theory: Rules, Categories, and Combinators for Syntax and Semantics Jørgen Villadsen .......................................................160 Syntactic Structures as Constraint Graphs Philippe Blache LPL-CNRS, Université de Provence 29 Avenue Robert Schuman 13621 Aix-en-Provence, France [email protected] The representation of syntactic information usually makes use of tree-like structures. This is very useful, both for theoretical and computational reasons. However, if such structures are adequate for the representation of simple languages (typically the formal ones), they are not expressive enough for natural language and several difficulties in the processing of NL come from this aspect for different reasons. First, the idea that a complete and homogeneous syntactic structure can be associated to any input is false. In many cases, there is simply no possibility to do this, as illustrated in the example (1) taken from a spoken language corpus. (1) monday washing tuesday ironing wednesday rest This input is a succession of nouns, without relation given at the syntactic level. It would be very artificial to structure it into a tree. In this example, information making it possible the interpretation comes from the lexical level, eventually prosody, but not from syntax. Similar situations occurs frequently in spoken languages: phenomena such as hesitations, repairs, phatics, etc., have to be taken into account and represented, but trees fail to do this (non connectivity, crossing relations, etc.). From a theoretical perspective, the conception of linguistic information in terms of hierarchized structures has deep consequences, especially concerning relations between different domains (phonetics, phonology, syntax, semantics, etc). In most of the cases, such relations are given in terms of correspondences between structures. For example, prosody-syntax interaction is explained by means of relations between syntactic trees and prosodic units. This means that both structures have to be built separately before being possible to implement such relations. This is the same problem in the syntax-semantics interface: the semantic structure is usually built started from the syntactic tree, as it is typically the case in Montague grammars. This option imposes a compositional conception of this interface in which, again, syntactic structure has to be built before. Such a conception does not fit with the fact that the interpretation of an utter- ance consists in bringing together pieces of information coming from different domains. An alternative approach consists in making it possible to represent such spread and partial information by means of structures that are not necessar- ily homogeneous, stable and strictly hierarchized. Constraints can play in this perspective an interesting role. Everyone now agrees on the fact that linguistic information can be represented by means of constraints: all modern linguistic 1 theories make use at one moment or another of constraints. We propose to de- velop this idea in representing all linguistic information by means of constraints. In this way, it becomes possible to exploit constraints not only as filtering process making it possible for example to eliminate unwanted structures (as in OT for example), but as an actual system. A grammar becomes a constraint system and parsing a satisfaction process. What is interesting in this perspective is that hierarchized structures are no more needed. At the difference with HPSG, for example, in which almost all information is stipulated in terms mother/head connection, relations are expressed directly between different objects (features, categories, set of categories, etc.) whatever their function or their role in the structure. This solution is proposed in the formalism of Property Grammars1 (noted her- after PG) in which all syntactic information is represented by means of different types of constraints (also called properties): linearity, exclusion, requirement, uniqueness, dependency and obligation. In our approach, the objects taken into consideration in the grammar are constructions. A construction, as proposed in the Construction Grammar theory of Fillmore, can be any kind of syntac-

Load more