<<

PartML: Final Paper

Tuan Do N. Nick Moran Mikel Mcdaniel

May 11, 2013

Abstract 2. member of (senator and senate) Detecting meronymic relationships is member-collection of great use in many natural language  ex: I attend Brandeis ... processing domains, such as process- ing biomedical text, question-answering 3. substance of (gold and ring) systems, text summarization, and text substance-whole understanding. (8) (10) We describe,  PartML, an annotation specification and ex: Air contains nitrogen ... guideline for annotating part-whole re- lationships and related entity and rela- Our specification captures each of these three tionship attributes that are useful for types with the goal of differentiating the exact applying machine learning. We also type of meronym when one is detected. Our find- provide a brief overview of previous ings include differences in the agreement of anno- works and their approaches for defin- tation for these different types, as well as prelim- ing and automatically finding part-whole inary results from machine learning efforts. We relationships in text, then discuss the describe potential improvements to our specifica- strenghts and weaknesses of our annota- tion and guideline to improve agreement, to bol- tion schemes, including inter-annotator ster future machine learning, and to expand the agreement scores, and finally end with results of machine learning methods ap- scope of the task to related areas. plied to our gold-standard corpus. 2 Previous works 1 Introduction Part–whole or meronymy relations are an im- PartML is an annotation scheme designed to cap- portant binary semantic relation that have been ture meronymic, or part-whole, relationships and worked on from the time of the atomists (Plato, related attributes that are useful for applying Aristotle, and the Scholastics) and further in- machine learning to automatically detect these vestigated in the beginning of the 20th century. relationships in unannotated texts. To cover a Though it is always considered a fundamental large knowledge domain that is both wide and relation, there is no consensus on formal defi- deep, PartML’s guidelines are tuned to work nition, representation, and classification and of well with annotated plain-text versions of English meronymy relations in previous works. Studies Wikipedia articles, but should be directly appli- of logic and philosophy of meronymy relations cable or easily translatable to other languages also disagree on its basic properties, especially and types of text. the transitivity property (whether the relation is transitive or not). While different sources have varying views on the classes of part-whole relationships, we use In regards to formal representation and for- the three basic types defined by WordNet: mulization of meronymy relations, there is a simple approach that models the relations as a transitive binary relation in Description Logic. 1. part of (hand and arm) That includes works of Artale et al. (1996) (1),

1 Sattler 1995(2). For example, to say that car has jects in the extent of the concepts, which is fun- wheels and that in turn the wheels have tires as damental in part-whole relationships, it ignores their parts, it could represented as: other important attributes such as the physical existance of these objects (discrete or abstract) Car = ∃  (W heel u  T ires) that could also provide insight on the differentia- tion of meronymy relation. It also doesn’t explic- lead to itly describe the inherently dependent or func- tional relation between involved objects in each Car  T ires @ subclass. Winston et al. (1987) (6) described six types Priss, in his discussion on construction of Word- of meronymic relations that are (1) COMPO- Net (3), used a formal definition of meronymy NENT – INTEGRAL,(2) MEMBER – COL- relations, together with other semantic relations LECTION,(3) PORTION – MASS,(4) STUFF (i.e. hypernymy, synonymy) based on Relational – OBJECT,(5) FEATURE – ACTIVITY, and Concept Analysis. The relation is defined among (6) PLACE – AREA. In addition, they proposed synsets, that is differentiated from lexical relation (i.e. antonymy). His work characterizes the re- three attributes of relation called relation ele- lation with three attributes irreflexive, antisym- ments i.e. functional, homeomerous, separable. metric and acyclic. The relation is parameterized For example, PORTION–MASS is homeomerous by quantificational tags, that are simple quanti- because portions are similar to each other, and fiers, such as for all, exactly 1, and some. separable as portion could be disconnected from mass (cut a slice from a pie). They also suggest c Rr[Q1,Q2;]c : ⇐⇒ Q1 Q2 : g rg 1 2 g1∈Ext(c1) g2∈Ext(c2) 1 2 a hierachical categorization of semantic relations, r 3 4 3 4 giving a distinction between meronymic relation c1R [; Q ,Q ]c2 : ⇐⇒ Qg2∈Ext(c2)Qg1∈Ext(c1) : g1rg2 and some other ‘part-of’ relations, including spa- c Rr[Q1,Q2; Q3,Q4]c : ⇐⇒ c Rr[Q1,Q2;]c 1 2 1 2 tial inclusion, class member, attributes and pos- r 3 4 andc1R [; Q ,Q ]c2 sesion. Keet et al. (2007) distinguishs the use of two There is also no agreement on the classification terms part-whole relation and meronymic rela- of part-whole relations. Cruse (4), using the same tion. They include meronymic relation as a sub- notion as Priss, described four subclasses, consid- class of part-whole relation while the other class ering the cardinality of two word concepts in the is a mereological part-of relation. The latter is a relation. Iris et al.(5) specified four other classes: transitive relation that encompasses a wide range functional component (exactly one whole), Seg- of functional and spatial part-of relations. mented whole/mass nouns/is-substance-of (mul- Most of previous works apply a predefined set tiple homogenous parts - one whole), member- of lexical-syntatic patterns, such as the follow- ship relation (multiple whole), individual con- ing pattern from (8) to construct the meronymic cepts: (one part - one whole). Although this corpus: approach captures the cardinality of involved ob-

2

2 Cluster Patterns Freq. Coverage Examples

C1. genitives NPX of NPY eyes of the baby and NPY ’s NPX 282 52.71% girl’s mouth verb to have NPY have NPX The table has four legs. C2. noun NPXY 86 16.07% door knob compounds NPYX turkey pie C3. preposition NPY PPX 133 24.86% A without wings cannot fly NPX PPY A room in the house. C4. other others 34 6.36% The Supreme Court is a branch of the Government.

This rule-based approach for extraction of also occur in abstract entities, in which there is meronymy relation suffers from insufficiency be- some sort of underlying pattern structure for the cause many patterns that signify part-whole re- part to participate in. lations are complicated and involve longer struc- The second type of meronym is the substance tures. relationship. This is differentatied from the part- whole type by the fact that the entity in the part role is often homogenous and is the material out 3 Task of which the part is made, rather than playing a specific structural role. It is possible for an en- While many previous works focus on identi- tity to have be made of multiple substances, as fying meronymic relationships via rule-based is often the case for mixtures and conglomerates. and pattern-matching methods which have been Mass nouns are common in the part role for these boot-strapped from known meronymic pairs, our substance-based relationships. goal was to use subtler syntactic clues to the ex- The third type is the member-group relation- istence of these relationships with the hope of ship. This type is similar to the part-whole re- powering a learner which is able to handle a wider lationship, but member entities do not have a variety of structures indicating the sought after direct structural role in the whole. Instead, this relationships. To this end, we frame meronym de- type of relationship is defined by the presence of tection as a classification problem in which pairs an associative relationship, often based on phys- of entities either are or are not in a meronymic ical proximity or social connections. relationship. It is important to note that this relationship is inherently directional, because we In order to keep a focused and well-defined need to distinguish the part from the whole. task, we exclude certain relationships that other authors have included as meronyms, or which are Although our initial machine learning efforts tangentially related to meronyms. This include focus on simple detection of meronymic relation- relationships of classification (often termed is- ships, our specification is set up to also provide a) relationships, the containment relationship (a features for further classification of a detected re- form of has-a), geographic and location-based re- lationship as one of the three types mentioned lationships, mass-portion relationships, activity- earlier. As such, the distinction between the based meronyms (we focus on nouns as entities, three is important to our task, and its effective while these are based on verbs), possession and communication to annotators in the guideline is owernship relationships, and others. These are critical. left open as possible extensions of the specifica- The first type of meronym is the simple part- tion in future work. whole relationship. In this sort of relationship, As an aid to machine learning, we include ex- the entity playing the part role is a distinct com- plicitly negative meronymic relationships in our ponent of the whole, in which it plays some struc- specification. These often show very similar syn- tural role. While this is a very common type tactic structure to positive instances, except for of meronym for physical objects, especially those the presence of a negation word. If these were with mechanical or anatomical structure, it can

3 simply excluded, they would present confusing chine learning. We suspect that the nature of instances for training. the entities involved in a meronymic relationship is relevant to determining the nature of that re- Because our learning effort is focused on syn- lationship. To that end, we annotate whether tactic clues, we annotate only those relationships the entity is physical or abstract (covered by the which are made clear from the text. We therefore type attribute), and whether the entity is singu- do not annotate pairs of entities which world- lar, plural, a mass noun, or has zero count (cov- knowledge tells use form a meronymic relation- ered by the reference count attribute). The zero ship, but which appear unrelated in the text. We value is to be used in instances where a nega- also exclude those relationships which are not ex- tive relation is expressed. For example, in the plicitly indicated, but could be derived by tran- sentence from our corpus: “Speedway motorcy- sitivity. cles use only one gear and have no brakes...”, we To further elucidate syntactic clues, annota- would tag ‘brakes’ as having zero count. tors are asked to note signal terms, which they Our other extent tag does not participate di- feel indicate the presence of the meronymic re- rectly in the meronymic relationships, but cap- lationship. Common signal words and phrases tures SIGNAL words which the annotator feels include “made of”, “part of”, “contains”, etc. give syntactic clues about the presence of such a These tagged signals can then be used as pow- relationship. Certain terms or phrases, such as erful features for machine learning. “contains”, “made of”, “consists of”, and the ob- Finally, we include a few simple attributes for vious “part of”, often indicate that one entity is our entities, which, while not directly related to part of another. The goal of this extent tag is to the target phenomenon, are hoped to be useful capture these indicative terms for use as features as additional features for machine learning. We during machine learning. Because they are not focus on the type of object described by the en- directly involved in our target phenomenon, we tity as well as its count (that is, whether it is did not include any additional attributes for this singular or plural) as potentially beneficial, but tag. future work could extend this to include many Example: The bat is [made of ] other attributes. SIGNAL wood. In addition to our extent tags, we use a link tag, PARTOF, to capture which pairs of enti- 4 Specification ties are participating in a meronymic relation- ship, and what the directionality of that relation- The core extent tag of our specification is the ship is. Each link joins two ENTITY tags, as well ENTITY tag. This tag is applied to nouns which as any number of SIGNAL tags which are rele- participate in a meronymic relationship, and is vant to that link. One of the two linked ENTITY used for both the meronym and the holonym in tags is identified as the ‘to’ member of the link, the relationshp. Only those nouns for which a which is the object that constitutes the ‘whole’, meronymic relationship is syntactically indicated and the other is the ‘from’ member, which is, of by the text are tagged – pairs which form a part- course, the ‘part’. whole relationship, but for which that relation- Example: ship is not specifically indicated by the text (that PART OF is, pairs for which the only indication of the rela-  tionship is prior world knowledge) are excluded. Carousels are populated with horses.

Example: Tunicate [larvaeENTITY ] have Further, we include an attribute on the both a [notochordENTITY ] and a nerve PARTOF tag which indicates which handles [cordENTITY ] which are lost in adulthood. our secondary task beyond identification of meronyms - the classification of them. The re- Each ENTITY tag has two attributes, type lationship type attribute codifies the link as be- and reference count. These attributes are not di- ing one of our three types (part-whole, substance, rectly related to the phenomenon of meronymy, and member). We also include a rarely-used but are included as potential features for ma-

4 ‘other’ option as a convenience for annotators to ment. Document length varies significantly, from indicate uncertainty. However, the intent would only 71 words for an article about the Tarbell then be to resolve this issue, rather than leave an Building to nearly 1,000 words for an article ‘other’ link in the corpus for learning. about Carousels. Some articles are quite dense with entities and links, with one article contain- ing nearly 70 links, while others are extremely 5 Corpus sparse, or, in a few cases, entirely devoid of links. The frequencies of various numbers of links is shown below: 5.1 Collection Links Count Our corpus was collected from a variety of En- 0-10 20 glish Wikipedia articles, stripped of formatting 11-20 14 and non-textual elements. Wikipedia was chosen 21-30 3 because it provides several key features that we 31-40 2 felt were important in a corpus. First, Wikipedia > 40 1 articles provide coverage of many domains, and One of the reasons for this wide variety of detailed information is available for each domain. densities is the need to find articles representing Additionally, Wikipedia’s articles are written in all three types of meronyms. Articles which deal a consistent, formal style which features an in- with mechanical or anatomical parts often have dicative voice and gramatically sound complete long lists of components, because they aim to ex- sentences. This provides clear and accurate text plain the parts of the subject. This results in for our annotators, and consistent features for many entities and many links. By contrast, sub- machine learning. jects that are more abstract rarely have this sort Gathering of the corpus was split between of detailed breakdown of their components, and two approaches. The first was automatic collec- so entities and links are rare in those articles. tion loosely modeling that seen in prior works, in As a result of the tendency for different types which articles were searched for a sufficient den- of meronym to show up with different frequen- sity of known signal terms. While this provided cies, the number of each type in our corpus is a number of useful articles, it was also prone to not balanced. Part-of relations are much more generating false-positives that did not have any common, while substance-based meronyms are instances of our target phenomenon. quite rare. Often, items have a single homoge- The second method was human-driven, and nous substance (as in a ring made of gold, for ex- used Wikipedia’s built-in random article feature ample), while they might have many mechanical to generate a selection of candidates. Although parts. The relative frequencies of the different many had to be discarded for being extremely types identified by annotators are shown below short, of low quality, or lacking meronyms, we (the frequencies in the adjudicated gold standard were able to salvage candidates by navigating to are quite similar): a related but more general article. These articles Type Percentage were often more detailed and of higher quality as Part-whole 63% they were more frequently used and editted. For Member 27% example, although an article about a specific type Substance 9% of fly might be a mere stub containing no useful Other/uncategorized <1% text, the related article about insects would have Because articles that have strong examples of a wealth of useful information. non-physical meronymic relationships are com- paratively rare, and comparatively sparser, our 5.2 Statistics examples of these were generally of lower quality. It was easy to find a wealth of good examples for Our corpus (counting only those documents physical meronyms, but for many abstract top- which were annotated) contains 40 documents, ics we had to settle for samples that were per- with an average of roughly 425 words per docu- haps more ambiguous and prone to annotator dis-

5 agreement. This does bring some benefit, how- untagged text. ever, as these difficult examples provide a richer The second important change to our specifi- training set for future machine learning. cation was the way we dealt with noun phrases as entities. We originally felt that, in some cases, modifiers of the head of the phrase significantly 6 MAMA alter its meaning, such that it would be neces- sary to have those modifiers as part of the entity Our original specification differed from our cur- in order to ensure a coherent part-whole pair. To rent one in two important ways. First, instead this end, we asked annotators to tag not only the of annotating only those entities which actually head, but also any surrounding text which was participate in a meronymic relationship, we an- essential to the meaning of the head. notated all nouns in the text which were eligible Unfortunately, we found that this distinc- to participate in such a relationship, even if they tion was very difficult to make in practice. The did not. This resulting in marking nearly every annotators struggled with many ambiguous in- noun as an entity, with the only ones excluded stances, and we struggled to formulate a coherent being those that could only participate in a type and consistent standard for when to mark what. of meronymic relationship which we were not fo- While the toy examples presented in our guide- cusing on, such as gerunds (which would be in line were clear, the actual corpus had a very wide activity-based relationships) or locations (which variety of potentially important modifiers. If we would be in geographic-based relationships). The were to capture them, we would likely be trad- goal of this was to ensure that we had an accurate ing away a significant amount of inter-annotator representation of not only the positive examples agreement. of our phenomenon, but also the negative exam- ples. Instead, we decided to always annotate only the head of the noun phrase. While some detail is However, we found that the number of non- lost by this change, we felt it was more important participating nouns in the text vastly outweighed to be able to give the annotators and clear and the number of actual meronyms, and so annota- concise standard to follow. Further, it may be tors spent almost all of their time marking nega- possible to recover many of these lost modifiers tive examples. Although these negative instances in postprocessing if necessary. may be somewhat helpful for machine learning, we felt it was more important to get good posi- tive examples, so the annotators’ time would be better spent on those. As such, we revised our 7 IAA specification to its current form in which only those entities actually in a meronymic relation- Each document in our corpus (with a few ex- ship should be tagged. By simplifying the task ceptions) was annotated twice. We used these in this way, we allowed annotators to generate a pairs of duplicate documents to calculate Cohen’s much greater volume of useful data. Kappa as a measure of inter-annotator agreement Ultimately, we were still able to generate a on our extents and attributes. Cohen’s Kappa is sufficient number of negative examples by tak- calculated using two terms - P r(a), which is the ing pairs of tagged entities which participated observed rate of agreement, and P r(e), which is in meronymic relationships, but not with each the rate of agreement expected by chance. The other. Because the number of possible pair- calculation of P r(e) is somewhat complicated, ings from a set is quite large, this resulted in a as a truly accurate measurement would require very large number of potential negative examples. knowing the exact probability of a particular an- Further negative examples could be generated by notator marking each word, which is of course applying a part-of-speech tagger to the corpus not consistent across words (all annotators are and extracting nouns which were not tagged as more likely to mark nouns as entities than they entities. This second method may be more use- are prepositions, for example). However, since ful in the long-run, as it would mirror the pro- this sort of information is obviously unavailable, cess required to apply a classifier to completely we made the simplifying assumption that each

6 word in a document was in fact equally probable File ENTITY Kappa to be marked. That probability was taken to be khoisan languages -0.08 the total number of words in the document di- tynwald -0.07 vided by the number of words actually tagged by statistics -0.02 a given annotator. prudential center -0.01 We also allowed for somewhat generous mathematics 0.00 matching when deciding if two annotators had computer science 0.00 marked the same extent. While the most rigor- ark of the covenant 0.07 ous method would be to count a match only if the paint 0.11 two extents had exactly the same start and end plot 0.13 offsets, this would exclude many cases in which parliamentary system 0.16 annotators accidentally included a space or punc- spotify 0.17 tuation in their extents. Instead, we count any republics of russia 0.18 pair of overlapping extents to be a match. madden nfl 08 0.22 republic 0.25 Across our documents, we saw a wide range tarbell building 0.30 of kappa scores for both extents and attributes. english 0.32 The scores for ENTITY agreement are shown be- single market 0.36 low: rock music 0.38 bojjannakonda 0.38 0.38 gravel road 0.39 computing 0.41 algorithm 0.41 ore 0.43 chemical element 0.44 insect 0.49 palaeeudyptes 0.49 physics 0.51 telephone 0.53 sea snail 0.53 computer network 0.59 city 0.59 atomic number 0.73 yacht 0.79 motorcycle speedway 0.79 Although there is some disagreement about how to interpret Cohen’s Kappa, Landis & Koch(9) suggested that a score < 0 indicates no agreement, 0 − .2 indicates slight agreement, .2 − .4 fair agreement, .4 − .6 moderate agree- ment, .6 − .8 substantial agreement and .8 − 1 nearly perfect agreement. The table is divided into sections to indicate these ranges. As can be seen, the majority of our documents fell into the fair to moderate range, with many documents clustered around .4. We also had a few documents with strong agreement, and a sig- nificant minority with poor agreement. We see that, with some exceptions, the most

7 successful files are those dealing with physical itself. This would give us a clearer picture of the items, often with mechanical or anatomical parts. top-level phenomenon of meronyms as a whole, Files like yacht, sea snail and insect had many and make it easier to resolve disagreements over very clear meronymic relationships dealing with type. this sort of physical objects. On the other hand, The SIGNAL tag showed similar patterns of files dealing with more abstract concepts, such as agreement to the ENTITY tag: language, government and general fields of study saw much poorer agreement. File SIGNAL Kappa statistics -0.01 We suspect that this is because the physi- spotify -0.01 cal part-whole relationship is the most familiar, prudential center 0.00 where as abstract relationships are more distant khoisan languages 0.00 conceptually. As such, annotators were able to mathematics 0.00 use their existing skill at identifying the physi- computer science 0.00 cal relationships to do so more consistently. The ark of the covenant 0.08 poorer results on abstract concepts suggests a parliamentary system 0.11 need for improvement of our guideline with re- paint 0.13 spect to this sort of example. chordate 0.18 Additionally, there can be a tendency when plot 0.18 annotating to look for more examples to tag when rock music 0.22 a file seems sparse. The natural assumption when insect 0.23 given a file would be that there are some good ex- republic 0.23 amples of the target phenomenon in that file. If algorithm 0.25 none or few are found, the annotator may expand tarbell building 0.25 their definition to fit ambiguous cases that oth- english 0.28 erwise would have been rejected. Because docu- republics of russia 0.33 ments about abstract concepts often have many madden nfl 08 0.39 fewer meronyms than those about physical ob- physics 0.39 jects, this tendency may have impacted results palaeeudyptes 0.40 for abstract topics more severely. city 0.44 Another factor which may have hurt agree- bojjannakonda 0.45 ment on some files was our exclusion of some telephone 0.45 types of meronyms. For example, many sources computing 0.45 consider classifications to be meronyms (such as ore 0.49 “Wolves belong to the canis”). We, how- gravel road 0.50 ever, exclude this relationship entirely. This tynwald 0.50 means that if annotators find an ambiguous case chemical element 0.55 between this type of relationship and a member- single market 0.59 group relationship, they may simply decline to computer network 0.62 mark it entirely. The same is true for other motorcycle speedway 0.66 types of meronyms which we exclude, such as atomic number 0.66 geographic meronyms (which can be ambiguous sea snail 0.69 when dealing with a noun that is both a loca- yacht 0.74 tion and a political entity), and activity-based It should not be surprising that the agree- meronyms (which can be ambiguous in fields of ment for SIGNAL closely follows the agreement study, which can be viewed both as an activity for ENTITY, because a SIGNAL word can only and a body of knowledge). be present when a meronym is identified. Agree- It may be worth reconsidering the decision to ment is generally slightly lower, because agree- exclude these other types of meronyms, because ment on a SIGNAL word requires agreement on including them would relegate much of the ambi- both the underlying relationship and which word guity to the type attribute instead of the extent signals it.

8 We also calculated agreement for the at- other is our lax treatment of these attributes in tributes of the ENTITY tag (type and reference our guidelines. count). When we include cases on which annto- When designing our DTD, we noted that tators disagree about the underlying extents, the most entities in our corpus were physical, and results relatively closely follow the scores for EN- so made that the default value of the type at- TITY. To get a clearer picture of the agreement tribute. However, this can lead annotators to for- on these attributes alone, we considered only the get to change the default when dealing with the subset on which annotators agreed on the under- rarer abstract case, especially in a file with many lying extent. Obviously, this requires dropping physical entities and few abstract ones. This is some of our tagged ENTITIES, and so may not exacerbated by the fact that identifying entity be completely representative of the data set, but characteristics is not the main focus of the task, we feel it gives a better understanding of the per- and so is more easily overlooked. formance on the attributes alone. Note that a file is excluded entirely if no ENTITY tags were Many files had perfect agreement on whether agreed upon. an entity was physical or abstract, but a surpris- ing number had very poor agreement. There are File TYPE Kappa a few contributing factors to this. One is the tynwald -1.17 danger of default values for attributes and an- insect -1.17 other is our lax treatment of these attributes in republic -1.14 our guidelines. republics of russia -0.33 computing -0.18 When designing our DTD, we noted that spotify -0.13 most entities in our corpus were physical, and algorithm -0.03 so made that the default value of the type at- telephone 0.03 tribute. However, this can lead annotators to for- chemical element 0.03 get to change the default when dealing with the single market 0.19 rarer abstract case, especially in a file with many computer network 0.2 physical entities and few abstract ones. This is yacht 0.36 exacerbated by the fact that identifying entity rock music 0.71 characteristics is not the main focus of the task, ark of the covenant 1 and so is more easily overlooked. One option to plot 1 alleviate this problem would be to integrate pre- sea snail 1 processing to determine likely values for the at- gravel road 1 tribute, and alert the annotator if a mismatch motorcycle speedway 1 was found between these values and their anno- palaeeudyptes 1 tation. However, this would require significant atomic number 1 modifications to the annotation environment. chordate 1 Because this attribute was not part of the english 1 phenomenon which we are trying to identify, our physics 1 explanation of how to properly annotate it was city 1 somewhat brief in our guidelines. As can be seen ore 1 by our Kappa scores, this turned out to be a task parliamentary system 1 with many non-trivial cases, and we would have paint 1 done well to provide a more thorough explana- bojjannakonda 1 tion of it to our annotators. We underestimated madden nfl 08 1 its complexity, and assumed it would be quite tarbell building 1 easy to annotate. Many files had perfect agreement on whether In fact, there are many cases where a noun an entity was physical or abstract, but a surpris- can have both physical and abstract interpre- ing number had very poor agreement. There are tations. For example, in our document about a few contributing factors to this. One is the Khoisan languages, we see Africa used to refer danger of default values for attributes and an- to both the geographic location (a physical en-

9 tity) and the cultural amalgamation of its people also using NLTK to split the raw Wikipedia arti- (an abstract entity). This sort of double meaning cles into sentences, then tokens. Part of speech is contributed to many of the poor kappa scores. an important feature because, in general, all en- tities are nouns, but even amongst nouns, there Our scores for the reference count attribute are few forms of nouns and the specific subtypes saw very similar kappa scores, and we suspect the give subtle clues to the existence or lack-there- same causes. Our guideline again provided only of of part-whole relations. The second feature is a brief explanation, but we later encountered sig- the entity’s dependency role. This feature and all nificantly more complex examples in our corpus, other dependency tree related features were ex- especially when dealing with the mass noun and tracted by running and parsing the output of the zero count values. MaltParser from within Python. Intuitively, the dependency role is important because it describes 8 Machine Learning the role played by the entity in some relationship that the sentence describes. The next features are dependency tree path distance between en- 8.1 Problem Formulation tities and the dependency tree path distances to the root of the dependency tree. These are useful We formulate part-whole relation extraction as because they describe how “close” their depen- a simple classification problem. The first step is dency relationships are and how close each en- to detect all entities within in a document that titie’s role is to the main or overall information could participate in a part-whole relationship. In the sentence conveys. We also included slight the corpus, this is done by simply using the an- variations of the above numerical features, par- notated entities while disregarding other infor- ticularly the floored log (base 2) since the distri- mation, while for unlabed corpora, this would be bution of tree distances in our corpus tends to done by noun or noun-phrase chunking. After follow a logarithmic distribution. This also helps all of the entities are found, each pair is con- counteract our small corpus size. sidered to potentially participate in a part-whole relationship. Since most part-whole relationsips occur within a single sentence or adjacent sen- 8.3 Results tences, a sentence distance could (and should) be imposed, but we did not do that for our results. On top of the tools used to perform feature ex- Then, for each pair of entities in the document, traction, we used Weka 3.6.9 to perform the ac- we create a ‘sample’ for the machine learnign that tual classifications. Weka is an open-source col- consists of features for the individual entities and lection of many popular classifiers that allows for the combination of entities and the class label is a great deal of classifier-specific options and au- a binary “True” or “False” corresponding to the tomation. One weakness of Weka is it’s inability presence or absence of a part-whole relationship to handle large data sets such as ours. Though we respectively. had few part-whole relationships in our corpus, totaling at 525, a very large number of negative 8.2 Features samples were created after pairing every entity in each document with every other entity in that From the beginning, we wanted to choose fea- document. The total number of samples was over tures that were, like prior works, based on syn- 55,000. Because of the limitations of Weka we ran tax and patterns. All of our features were de- a few of the faster, well-known, and popular clas- rived using the NLTK (Natural Language Tool sifiers and summarize the results below showing Kit) package for Python and Java MaltParser various statistics (true positive rate, false positive program. The NLTK provides a wide range of rate, precision, recall, and f1-measure) and con- natural language processing functions, some of fusion matricies. As with prior work, our main which are described here and the MaltParser is focus was to get the true positive rate for the a library for parsing sentences into their depen- True class (indicating a part-whole relationship) dency tree structures. Our first feature was part as large as possible. This true positive rate corre- of speech, which we derived using NLTK, after sponds to the ratio of samples that are correctly

10 recognized as part-whole’s among all of the part- 8.4 Future Work whole samples. We found that Naive Bayesian classifiers outperform the other classifiers used by For machine learning, the core of future work is a surprising 40% with respect to the true positive to add more features, especially features based rate of True samples. on syntax trees, and to balance the number of positive and negative samples. Some key features we’d like to add are the 8.3.1 Naive Bayes (Updatable) number of occurrences of signal words identified in the annotated data near or between two enti- TP FP Prec. Recall F1 Class ties, the syntax tree path distance, the depen- 0.969 0.391 0.996 0.969 0.982 False dency head of each entity, and a boolean for 0.609 0.031 0.156 0.609 0.249 True whether the entity heads match. The signal word 0.966 0.387 0.988 0.966 0.975 count should help the classifier because these are False True ← (classified as) words that have been manually identified as in- 52936 1698 False dicators of meronymic relationships. Syntax tree 202 315 True path distance, and other syntactic features, can contribute to machine learning because the re- lationships are expressed syntactically and syn- tax tree distance is generally correlated with ex- 8.3.2 Bayes Network pressed relationships in a sentence. Finally, the dependency head of the entities directly indicate TP FP Prec. Recall F1 Class a relationship that the sentence explicitly states 0.969 0.383 0.996 0.969 0.982 False the entities participate it and what type of rela- 0.617 0.031 0.159 0.617 0.252 True tionship that is. If this is a part-whole relation- 0.966 0.38 0.988 0.966 0.976 ship, then we could rely on that feature alone, False True ← (classified as) and if the two heads match, that means that sen- 52943 1691 False tence explicitly states that they participate in the 198 319 True same relationship. To address balancing the corpus, there are two general approaches; removing negative sam- 8.3.3 Voted Perceptron ples and ‘duplicating’ positive samples (in the training data). Removing negative samples is TP FP Prec. Recall F1 Class straightforward and can be done by simply re- 1 0.992 0.991 1 0.995 False moving a random subset of the negative sam- 0.008 0 0.308 0.008 0.015 True ples or ruling out ‘obvious’ negative samples such 0.991 0.983 0.984 0.991 0.986 as entities that have a large sentence distance. False True ← (classified as) In general, part-whole relationships that are di- rectly expressed (like the ones we are interested 54625 9 False in for machine learning) occur within a sentence 513 4 True or two adjacent sentences. Duplicating samples is similarly straightforward, though we came up with a method we call “fuzzing” that may be 8.3.4 Random Forest more beneficial than duplicating the samples ex- actly. Fuzzing consists of taking numerical at- TP FP Prec. Recall F1 Class tributes, such as tree distances, and adding a 0.998 0.793 0.993 0.998 0.995 False small amount, such as -1 or 1. Since numer- 0.207 0.002 0.476 0.207 0.288 True ical features such as tree distance should have 0.99 0.786 0.988 0.99 0.989 some tolerance, we believe this strategy will al- False True ← (classified as) low a small corpus like ours to cover a larger 54516 118 False feature space and give more positive samples, 410 107 True thereby improving classification, while not harm-

11 ing classification since we expect that the unan- 9.2 Additional types notated data the classifier would be used on would contain small variations that do not indi- Another avenue for expansion, as mentioned ear- cate a change in the presence or absence of part- lier, would be to handle other types of meronyms whole relationships. which we originally excluded for simplicity. Some of these would be easily incorporated into the ex- isting frame work, such as geographic meronyms and classification meronyms, which have roughly 9 Future Work the same structure and components as our cur- rent types.

9.1 Anaphora Other types, especially activity-based meronyms, would require changes or additions to our specification. Whereas the existing types A potential route for expanding the specification use nouns as entities, these meronyms often have is the handling of anaphora. Because we tag verbs as their constituent parts. This means the the entity which is most directly syntactically in- addition of an entirely new part of speech to our volved in the meronymic relationship, we often specification, along with new attributes for these tag anaphora instead of the original referent ob- verbal entities. ject. In the case of lengthy descriptions of an ob- ject’s constituent parts, this can often mean that As we expand the types of meronyms cov- the original referent is appears several sentences ered, the machine learning component may be- prior. come more difficult, as each type will have its own common structure and signals, which may On the one hand, knowing that “it” is part of differ significantly from the others. This might “that” is much less helpful than having the refer- muddle the feature set and make learning less ents resolved, so we would like to have a way of successful. indicating what those referents are. If a corpus has many instances of anaphora, even an other- wise strong machine learning algorithm that does 9.3 Robust Signals not account for resolving them will produce many useless results like the example above. Finally, we might consider a more robust han- However, the alternative of simply tagging dling of signals. Presently, we focus on words as the original referent as the entity creates a much the only form of signal. However, sometimes it is more difficult learning environment, because only a part of a word that signals for a meronymic those referents may be far removed from the syn- relationship, as in the case of possessives. While tactic clues to the existence of the meronym. it is possible to simply tag the entire possessive Without some understanding of anaphora, a ma- word as a signal, it would be more accurate to chine learning algorithm would be able to make view the fact of the possessive form as the signal, little use of an example where one entity is sepa- rather than the word which is in that form. This rated by a full paragraph from the other. could be accomplished by adding attribute(s) to the SIGNAL tag to allow the annotator to specify Unfortunately, anaphora resolution is a non- what about the word makes it a signal. trivial problem, so introducing it alongside meronymy detection would greatly complicate Similarly, we sometimes see adjectives that the task. We would, in effect, be asking the com- indicate a meronymic relationship, especially for puter to learn two complex phenomena at once, the substance type (for example, “a wooden bat” as well as their interaction. So, while a success- or “a golden ring”). While these cases clearly ful system which handled anaphora would almost indicate a substance relationship, they don’t fit surely produce much better results than an oth- well into our current specification and guideline, erwise successful system which did not, the cost which want entities to be nouns rather than ad- of that extension would be quite significant. As jectives. A change to handle adjectives would a result, we opted to keep the task focused on also incorporate the above concept of word form meronyms alone. as a signal, because the fact of the adjective form

12 is serving as the signal, but the underlying root sults form a foundation for future work, as well is serving as the entity. as valuable lessons-learned for building a more effective annotation. A third unconventional form of signal is word compounding, where neither word is itself a sig- nal, but rather their placement next to each other. For example, in the compound “tuna References salad”, we can tag tuna and salad as entities, [1] Alessandro Artale, Enrico Franconi, Nicola but lack a way to indicate that their relationship Guarino Open problems with part-whole rela- is signaled by compounding. This type of sig- tion. Applied Ontology 0 (2007) nal may be especially difficult to capture using extent-based annotation, because no particular [2] Ulrike Sattler A concept language for an engi- string of text is acting as the signal. Handling it neering application with part-whole relations. effectively may require a change to the annota- Proceedings of the International Workshop on tion paradigm which can sensibly handle signals Description Logics, pages 119-123 as structures instead of just as extents. [3] Uta E. Priss, The formalization of WordNet by Methods of Relational Concept Analysis. 1996. 10 Conclusion [4] Cruse, D. A. Lexical Semantics Cambridge, New York. Overall, our guideline and specification saw mod- erate success in enabling our annotators to ac- [5] Iris, Madelyn; Litowitz, Bonnie; Evens, curately capture our target phenomenon. While Martha Problems of Part-Whole Relations. we were able to acheive strong agreement for In: Evens, Martha W. (ed.). Relational Mod- certain domains (physical objects in traditional els of the Lexicon. Cambridge University part-whole relationships), our guideline fell short Press. for more complicated domains, especially those [6] Morton E. Winston, Roger Chaffin, Douglas involving abstract objects. However, we feel Herrmann, A of Part Whole Rela- that our areas of success indicate the potential tion. Cognitive Science, Volume 11, Issue 4, of significant improvement in those areas where pages 417–444, October 1987 we struggled. With revisions to the guideline, and potential revisions to the annotation proce- [7] C. Maria Keet, Alessandro Artale Represent- dure, we believe strong agreement on all domains ing and Reasoning over a Taxonomy of Part- would be possible. Whole Relations. Applied Ontology 0 (2007)

Our machine learning efforts, though still pre- [8] Roxana Girju, Dan Moldovan, Adriana Bad- liminary, show promising results. We demon- ulescu Automatic Discovery of Part–Whole strate an improvement over baseline random Relations. Journal Computational Linguistics chance, and our annotation enables a richer fea- archive, Volume 32 Issue 1, March 2006, ture set than simple bag of words and n-grams. Pages 83-135 Further work on enriching the feature set, corpus and annotation would be expected to provide ad- [9] Landis, J.R.; & Koch, G.G. The measurement ditional gains in classification accuracy. of observer agreement for categorical data. Biometrics 33 (1), 1977, Pages 159–174. The task of identifying and classifying meronymic relationships presents is non-trivial, [10] Robert, Angus. Learning Meronyms from and presents many challenges for both annota- Biomedical Text. In Proceedings of the ACL tion and learning. While some of these challenges Student Research Workshop (June 2005), pp. remain to be overcome, our current work and re- 49-54

13