
From Shallow to Deep Parsing Using Constraint Satisfaction Jean-Marie BALFOURIER, Philippe BLACHE & Tristan VAN RULLEN Laboratoire Parole et Langage 29, Avenue Robert Schuman 13621 Aix-en-Provence, France {balfourier, blache, tristan.vanrullen}@lpl.univ-aix.fr Abstract We present in this paper a formalism relying on constraints that constitutes a possible answer to We present in this paper a technique allowing to this problem. This approach allows the use of a choose the parsing granularity within the same approach relying on a constraint-based formalism. same linguistic resource (i.e. a unique grammar) Its main advantage lies in the fact that the same that can be used fully or partially by the parser. linguistic resources are used whatever the This approach relies on the fact that (1) all granularity. Such a method is useful in particular linguistic information is represented by means of for systems such as text-to-speech that usually need constraints and (2) the constraints are of regular a simple bracketing, but in some cases requires a types. The idea consists then in implementing a precise syntactic structure. We illustrate this technique that can make use of some constraints method in comparing the results for three different in the case of shallow parsing, and the entire set granularity levels and give some figures about of them for deep analysis. In our formalism, their respective performance in parsing a tagged constraints are organized into different types. corpus. Tuning the granularity of the parse consists then in selecting the types of constraints to be Introduction verified. Some NLP applications make use of shallow In the first part of this paper, we present the parsing techniques (typically the ones treating property grammar formalism, its main large data), some others rely on deep analysis advantages both in terms of representation and (e.g. machine translation). The respective implementation. In the second part, we describe techniques are quite different: the former usually the parsing technique and the different relies on stochastic methods where the later uses approaches used for shallow and deep parsing. symbolic ones. However, this can constitute a We address in particular in this section some problem for applications relying on shallow complexity aspects illustrating the properties of parsing techniques and needing in some the parsing techniques and we propose an occasions deep analysis. This is typically the evaluation over a corpus. In the third part, we case for text-to-speech systems. Such illustrate the respective characteristics of the applications usually rely on shallow parsers in different approaches in describing for the same order to calculate intonative groups on the basis example the consequences of tuning the parse of syntactic units (or more precisely on chunks). granularity. We conclude in presenting some But in some cases, such a superficial syntactic perspectives for such a technique. information is not precise enough. One solution would then consist in using a deep analysis for 1 Property Grammars some constructions. No system exists The notion of constraints is of deep importance implementing such an approach. This is in in linguistics, see for example Maruyama particular due to the fact that this would require (1990), Pollard (1994), Sag (1999). Recent two different treatments, the second one redoing theories (from the constraint-based paradigm to the entire job. More precisely, it is difficult to the principle and parameters one) rely on this imagine in the generative framework how to notion. One of the main interests in using implement a parsing technique capable of constraints comes from the fact that it becomes calculating chunks and, in some cases, phrases possible to represent any kind of information with a possible embedded organization. (very general as well as local or contextual one) Linearity Det < N; Det < AP; by means of a unique device. We present in this AP < N; N < PP section a formalism, called Property Grammars, Requirement N[com] Þ Det Exclusion N Pro; N[prop] Det described in Bès (1999) or Blache (2001), that Dependency Det ® N; AP ® N; PP ® N makes it possible to conceive and represent all Obligation Oblig(NP) = {N, Pro, AP} linguistic information in terms of constraints over linguistic objects. In this approach, In this description, one can notice for example a constraints are seen as relations between two (or requirement relation between the common noun more) objects: it is then possible to represent and the determiner (such a constraint information in a flat manner. The first step in implements the complementation relation) or this work consists in identifying the relations some exclusion that indicate cooccurrency usually used in syntax. restriction between a noun and a pronoun or a proper noun and a determiner. One can notice This can be done empirically and we suggest, the use of sub-typing: as it is usually the case in adapting a proposal from Bès (1999), the set of linguistic theories, a category has several following constraints: linearity, dependency, properties that can be inherited when the obligation, exclusion, requirement and description of the category is refined (in our uniqueness. In a phrase-structure perspective all example, the type noun has two sub-types, these constraints participate to the description of proper and common represented in feature based a phrase. The following figure roughly sketches notation). All constraints involving a noun also their respective roles, illustrated with some hold for its sub-types. Finally, the dependency examples for the NP. relation, which is a semantic one, indicates that the dependent must combine its semantic Constraint Definition features with the governor. In the same way as Linearity (<) Linear precedence constraints HPSG does now with the DEPS feature as Dependency (®) Dependency relations between categories described in Bouma (2001), this relation Obligation (Oblig) Set of compulsory and unique concerns any category, not necessarily the categories. One of these categories governed ones. In this way, the difference (and only one) has to be realized in a between a complement and an adjunct is that phrase. Exclusion () Restriction of cooccurrence between only the complement is selected by a sets of categories requirement constraint, both of them being Requirement (Þ) Mandatory cooccurrence between constrained with a dependency relation. This sets of categories also means that a difference can be done Uniqueness (Uniq) Set of categories which cannot be between the syntactic head (indicated by the repeated in a phrase oblig constraint) and the semantic one (the governor of the dependency relation), even if in In this approach, describing a phrase consists in most of the cases, these categories are the same. specifying a set of constraints over some Moreover, one can imagine the specification of categories that can constitute it. A constraint is dependencies within a phrase between two specified as follows. Let R a symbol categories other than the head. representing a constraint relation between two (sets of) categories. A constraint of the form a R One of the main advantages in this approach is b stipulates that if a and b are realized, then the that constraints form a system and all constraints constraint a R b must be satisfied. The set of are at the same level. At the difference of other constraints describing a phrase can be approaches as Optimality Theory, presented in represented as a graph connecting several Prince (1993), there exists no hierarchy between categories. them and one can choose, according to the needs, to verify the entire set of constraints or a The following example illustrates some subpart of it. In this perspective, using a constraints for the NP. constraint satisfaction technique as basis for the parsing strategy makes it possible to implement the possibility of verifying only a subpart of this compiling linear precedence, requirement and constraint system. What is interesting is that exclusion properties described in the previous some constraints like linearity provide sections together with, indirectly, that of indications in terms of boundaries, as described constituency. for example in Blache (1990). It follows that The result is a compiled grammar which is used verifying this subset of constraints can constitute by the parser. Two stacks, one of opened a bracketing technique. The verification of more categories and a second of closed categories, are constraints in addition to linearity allows to completed after the parse of each new word: we refine the parse. In the end, the same parsing can open new categories or close already opened technique (constraint satisfaction) can be used ones, following some rules. This algorithm both for shallow and deep parsing. More being recursive, the actions opening, continuing precisely, using the same linguistic resources and closing are recursive too. This is the reason (lexicon and grammar), we propose a technique why rules must have a strict definition in order allowing to choose the granularity of the parse. to be sure that the algorithm is deterministic and always terminates. This shallow parsing 2 Two techniques for parsing technique can be seen as a set of Property Grammars production/reduction/cutting rules. · Rule 1: Open a phrase p for the current We describe in this paper different parsing category c if c can be the left corner of p. techniques, from shallow to deep one, with this · Rule 2: Do not open an already opened originality that they rely on the same formalism, category if it belongs to the current phrase or is described in the previous section. In other its right corner. Otherwise, we can reopen it if words, in our approach, one can choose the the current word can only be its left corner. granularity level of the parse without modifying · linguistic resources Rule 3: Close the opened phrases if the more recently opened phrase can neither continue 2.1 Shallow parsing one of them nor be one of their right corner.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-