Extracting Entailing Words from Small Corpora for Ontology Building

Total Page:16

File Type:pdf, Size:1020Kb

Extracting Entailing Words from Small Corpora for Ontology Building Extracting Entailing Words from Small Corpora for Ontology Building Aurelie Herbelot Computer Laboratory University of Cambridge J.J. Thompson Avenue, Cambridge, United Kingdom [email protected] Abstract Pantel and Lin (2002) followed in her path using distributional similarity to extract large clusters of This paper explores the extraction of con- related words from corpora. ceptual clusters from a small corpus, given The main advantage of the clustering method is user-defined seeds. We use the distribu- that it allows users to find hyponymic relations that tional similarity hypothesis (Harris, 1968) are not explicitly mentioned in the corpus. One to gather similar terms, using semantic major drawback is that the extracted clusters must features as context. We attempt to pre- then be appropriately named - a task that Pantel and serve both precision and recall by using Ravichandran (2004) showed has no simple solu- a bootstrapping algorithm with reliability tion. Furthermore, although mining a corpus for all calculations proposed by Pantel and Pen- its potential clusters may be a good way to extract nacchiotti (2006). Precision of up to large amounts of information, it is not a good way to 78% is achieved for our best query over a answer specific user needs. For instance, if I wish to 16MB corpus. We find, however, that re- compile a list of all animals, cities or motion verbs in sults are dependent on initial settings and my corpus, I must mine the whole text, hope that my propose a partial solution to automatically query will be answered by one of the retrieved clus- select appropriate seeds. ters and identify the correct group. Finally, previous work has suggested that clustering is only reliable 1 Introduction for large corpora: Pantel and Pennacchiotti (2006) The entailment task can be described as finding pairs claim that it is not adequate for corpora under 100 of words so that one (the entailed word) can replace million words. the other (the entailing word) in some contexts. Or This paper proposes a user-driven approach to in other words, it consists in finding words in a hy- clustering where example seeds are given to the sys- ponym/hypernym relation. This is of course also tem, patterns extracted for those seeds and similar the task of ontology extraction when applied to the words subsequently returned, following the typical is-a relationship. Hence, entailment tools such as entailment scenario. The obvious difficulty, from an those based on the distributional similarity hypoth- ontology extraction point of view, is to overcome esis (Harris, 1968) can also be used in ontology ex- data sparsity without compromising precision (the traction, as shown by Pantel and Lin (2002) in a clus- original handful of seeds might not produce many tering task. accurate patterns). We therefore investigate the use The clustering method in ontology extraction was of bootstrapping on one hand in order to raise the pioneered by Caraballo (1999). She showed that number of extractions and of semantic features with semantically-related words could be clustered to- reliability calculations on the other hand to help gether by mining coordinations and conjunctions. maintain precision to an acceptable level. The next section reviews relevant previous work lationships which were filtered for taxonomic pairs. and includes a description of the piece of work (The filtering simply consisted in checking whether which motivated the research presented here. We the hyponym and hypernym were animal names, then describe our algorithm and experimental setup. using a list compiled from Wikipedia article titles.) Results for the whole corpus (16MB) and a 10% A careful evaluation was performed, both manually subset are presented and discussed. Problems re- on a subset of the results and automatically on the lating to initial setting sensitivity are noted, and a whole extracted file using the NCBI1 taxonomy. We partial solution proposed. We finally conclude with reported a precision of 88.5% and a recall of 20%. avenues for future work. This work highlighted the fact that the dictionaries of animal names that we had at our disposal (both 2 Previous Work and Motivation the list extracted from Wikipedia and the NCBI itself) were far from comprehensive and therefore The clustering method represents so far a marginal affected our recall. approach in a set of ontology extraction techniques In this paper, we attempt to remedy the short- dominated by the lexico-syntactic pattern-matching comings of our dictionaries and investigate a min- method (Hearst, 1992). Clustering was initially pro- ing algorithm which returns conceptual clusters out posed by Caraballo (1999) who used conjunction of a small, consistent corpus. The fairly convention- and coordination to cluster similar words. She ob- alised aspect of Wikipedia articles (the structure and tained a precision of 33% on her hyponymy extrac- vocabulary become standardised with usage) tends tion task. to produce good, focused contexts for certain types Pantel and Lin (2002), following from Cara- of words or relationships, and this partly overcomes ballo’s work, proposed their ‘clustering by com- the data sparsity problem. We therefore propose the mittee’ algorithm, using distributional similarity to realistic task of finding clusters of terms that a reader cluster similar words. Their algorithm distinguishes of biological texts might be interested in. Here, between various senses of a word. Pantel and specifically, we focus on animal names, geograph- Ravichandran (2004) report that the algorithm has ical areas (i.e. potential animal habitats) and parts a precision of 68% over a 3GB corpus (the figure of the animal body. We reuse the Wikipedia corpus is calculated over the relation between clusters and from our previous work - 16MB of plain text - and their automatically generated names.) apply to it distributional similarity using semantic On the entailment front, Geffet and Dagan (2005) features, with a bootstrapping algorithm proposed also used distributional similarity over an 18 million by Pantel and Pennacchiotti (2006). word corpus and obtained up to 74% precision with a novel feature weighting function (RFF) and an In- 3 The Algorithm clusion Testing algorithm which uses the the k char- The aim of the algorithm is to find words that are acteristic features common to two words in an en- similar to the seeds provided by the user. In order to tailment relation. achieve this, we use the distributional similarity hy- Our own investigation derives from a pothesis (Harris, 1968) which states that words that previous ontology extraction project (Her- appear in the same context are semantically related. belot and Copestake, 2006) on Wikipedia Our ‘context’ consists here of the semantic triples (http://www.wikipedia.org/). That in which a word appears, with the semantics of the project focused on uncovering taxonomic rela- text referring to its RMRS representation (Copes- tionships in a corpus consisting of over 12,000 take, 2004). So for instance, in the sentence ‘the Wikipedia pages on animals. We extracted a cat chased a green mouse’, the word ‘mouse’ has a semantic representation of the text in RMRS form context comprising two triples: (Copestake, 2004) and manually defined patterns lemma:chase pos:v arg:ARG2 var:mouse pos:n characteristic of the taxonomic relationship, also which indicates that ‘mouse’ is object of ‘chase’ and in RMRS format. Matching those patterns to the text’s semantics allowed us to return hyponymic re- 1www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Taxonomy lemma:green pos:j arg:ARG1 var:mouse pos:n P (f, i) is the probability that they appear together. which indicates that the argument of ‘green’ is Pointwise Mutual Information is known for pro- ‘mouse’. ducing scores in favour of rare events. In order Let’s assume that ‘mouse’ is one of the seeds to counterbalance this effect, our figures are multi- provided to the system. We can transform its plied by the discount factor suggested in Pantel and context into generic features by replacing the slots Ravichandran (2004): containing the seed with a hole: lemma:chase pos:v arg:ARG2 lemma:mouse pos:n becomes c min (c , c ) d = if ∗ i f (2) lemma:chase pos:v arg:ARG2 lemma:hole cif + 1 min (ci, cf ) + 1 pos:n. Then, every time a generic feature is encoun- where cif is the cooccurrence count of an instance tered in the text, we can hypothesise that whichever and a feature, ci the frequency count of instance i word fills the hole position is semantically similar to and cf the frequency count of feature f. our seed: if we encounter the triple lemma:chase We then find the reliability of the feature as: pos:v arg:ARG2 var:bird pos:n, we hypothe- P pmi(i,f) sise that ‘bird’ belongs to the same semantic class iI max ∗ ri r = pmi (3) as ‘mouse’. (We assume that a match on any one f |I| feature - as opposed to the whole context – is suffi- cient to hypothesise the presence of an instance.) where rf and ri are the reliabilities of the feature and We initially extract all the features that include of an instance respectively, and I is the total number one of the seeds presented to the system. We filter of instances extracted by f. those features so that semantically weak relations, Initially, the seeds have reliability 1 and all the such as the one between a preposition and its argu- other words reliability 0. We then select features ment, are discarded: triples containing a preposition with n-best reliabilities. Those features are used to or quantifier as the head or argument of the relation extract new instances, the reliability of which is cal- are deleted. At the moment, we are also leaving culated in the same fashion as for the initial patterns: conjunction aside.
Recommended publications
  • Research Project Summmary Main Research
    Research project summmary Main research fields: zoology and reproductive physiology specialization fields: Animal reproduction, Behavioural ecology, Behavioural physiology, sociobiology. My research interests focus mainly on the sociobiology of African rodent moles (Bathyergidae) and in particular the extrinsic and intrinsic factors that have led to the evolution of sociality in this remarkable endemic African family. My research group is fundamentally interested in elucidating the modes and mechanisms that are responsible for reproductive suppression in the non-reproductive females of the various species. Research is currently being directed at the neuroendocrine and molecular levels to elucidate the extent, nature and location of GnRH suppression. We are also interested in the photic and thermic input in the control of reproduction in seasonally reproducing mole-rats and the potential lack of a role in aseasonally breeding bathyergids. Long term population studies on social mole-rats from mesic and xeric environments are underway. These studies are providing empirical data on the spatial distribution of colonies, longevities, factors restricting and promoting dispersal, vagility, foraging methods and lifetime reproductive success. We are interested in the genetic relatedness of colonies and also the type of paternal skew operational in the social genera. This work is being carried out in collaboration with Dr Chris Faulkes at Queen Mary and Westfield College, London. Students currently under supervision MSc (research) 1. Ms. Kemba Butler Neuroendocrinology of induced ovulation in the highveld mole-rat (Cryptomys hottentotus pretoriae) 2. Mr. Andre Prins What makes a good helper? A behavioural study of cooperation in Damaraland mole-rats (Fukomys damarensis). 3. Mr. Josh Sarli Seasonal Reproductive Cycle and Parasite Burden of Two Small Mammals from Saudi Arabia.
    [Show full text]
  • K'2 108\3-2 Room 14-0551 77 Massachusetts Avenue Cambridge, MA 02139 Ph: 617.253.2800 Mitlibraries Email: [email protected] Document Services
    LABYRINTHOS by GREGORY PATRICK GARVEY B.S. University of Wisconsin, Madison (1975) M.F.A. University of Wisconsin, Madison (1980) Submitted to the Department of Architecture in Partial Fulfillment of the Requirements of the Degree of Master of Science in Visual Studies at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 1982 Gregory Patrick Garvey 1982 The author hereby grants to M.I.T. permission to reproduce and to distribute copies of this thesis document in whole or in part. Signature of Author: 6ppftr hpt of Arcitec,4*e Certified by: Thesis Suiperysor, Otto Piene Accepted by: D partment of Architecture Nlicholas Negroponte, Chairman, Ro"artmental Committee on Graduate Students K'2 108\3-2 Room 14-0551 77 Massachusetts Avenue Cambridge, MA 02139 Ph: 617.253.2800 MITLibraries Email: [email protected] Document Services http://libraries.mit.eduldocs DISCLAIMER OF QUALITY Due to the condition of the original material, there are unavoidable flaws in this reproduction. We have made every effort possible to provide you with the best copy available. If you are dissatisfied with this product and find it unusable, please contact Document Services as soon as possible. Thank you. The images contained in this document are of the best quality available. -2- LABYRINTHOS by GREGORY PATRICK GARVEY Submitted to the Department of Architecture on August 6, 1982 in partial fulfillment of the requirements for the Degree of Master of Science in Visual Studies. Otto Piene, Thesis Supervisor ABSTRACT Composition, in time and space is discussed as a general problem in graphics, music, film/video, landscape architecture and archi- tecture.
    [Show full text]
  • Proceedings of the United States National Museum
    FIELD NOTES ON VERTEBRATES COLLECTED BY THE SMITHSONIAN - CHRYSLER EAST AFRICAN EXPEDI- TION OF 1926 By Arthur Loveridge, Of the Museum of Comparative Zoology, Cambridge, Mass. In 1926 an expedition to secure live animals for the United States National Zoological Park at Washington was made possible through the generosity of Mr. Walter Chrysler. Dr. W. M. Mann, the director of the Zoological Park, has already published a report on the trip; ^ the following observations were made by the present writer, who was in charge of the base camp at Dodoma during three and a half of the four months that the expedition was in the field. The personnel of the party consisted of Dr. W. M. Mann, leader of the expedition; F. G. Carnochan, zoologist; Stephen Haweis, artist; Charles Charlton, photographer; and the writer. Several local hunters assisted the party in the field for longer or shorter periods, and Mr. Le Mesurier operated the Chrysler car. The expedition landed at Dar es Salaam, capital and chief port of entry for Tanganyika Territory (late German East Africa), on Thursday, May 6, and left on the following Monday by train for Dodoma, which had been selected as headquarters. The expedition sailed from Dar es Salaam on September 9. Dodoma is situated on the Central Railway almost exactly one- third of the distance between the coast and Lake Tanganyika. It was primarily selected as being a tsetse-free area and therefore a cattle country where milk in abundance could be obtained for the young animals; it is also the center of a region unusually free from stock diseases.
    [Show full text]
  • Longevity Survey
    LONGEVITY SURVEY LENGTH OF LIFE OF MAMMALS IN CAPTIVITY AT THE LONDON ZOO AND WHIPSNADE PARK The figures given in the following tables are based on the records of the Zoological Society of London for the years 1930 to 1960. The number of specimens in each sample is given in the first column. The percentage of the sample that died in less than a year at the zoo is iven in second column. In this way ‘delicate’ zoo species can be noted at a glance. An average fife span of all those individuals living for more than a year at the zoo is given (in months) in the third column. In the fourth column the maximum individual life-span is noted for each species. It must be emphasized that the age of specimens arriving at the zoo is seldom known accur- ately and no allowance has been made for this. All figures refer to the period of time between arrival at the zoo and death at the zoo. Actual life-spans will, therefore, usually be longer than those given. Number o % dead in Average age Ma.life individual less than (in nronths) span in sample 12 month of those (in months) livi 12 man% or longer ORDER MONOTREMATA Tachyglossus aculeatirs Echidna 7 I00 10 Zuglossus bruijni Druijns Echidna 2 - 368 ORDER MARSUPIALIA Caluromys philander Philander Opossum 3 - 50 Philander opossum Quica Opossum I - 1s Lutreolina crussicaudufa Thick-tailed Opossum 6 50 1s Metachirus nudicuudatirs Rat-tailed Opossum 14 71 27 Didelphis marsupialis Virginian Opossum 34 82 26 Didelphis azarae Azara’s Opossuni IS 53 48 Dmyurus uiuerrinus Little Native Cat 2 I00 I1 Dasyurus maculatus
    [Show full text]
  • A Checklist of the Land Mammals Tanganyika Territory Zanzibar
    274 G. H. SWYNNERTON,F.Z.S., Checklist oj Land Mammals VOL. XX A Checklist of the Land Mammals OF mE Tanganyika Territory AND mE Zanzibar Protectorate By G. H. SWYNNERTON, F.Z.S., Game Warde:z, Game Preservation Department, Tanganyika Territory, and R. W. HAYMAN, F.Z.S., Senior Experimental Officer, Department of Zoology, British Museum (Natural History) 277278·.25111917122896 .· · 4 . (1)(3)(-)(2)(5)(9)(3)(4)280290281283286289295288291 280. .. CONTENTS· · · No. OF FORMS* 1. FOREWORDINSECTIVORA ErinaceidaM:,gadermatidaEmballonuridaSoricidt:eMacroscelididaMarossidaNycteridaHipposideridaRhinolophidaVespertilionida(Shrews)(Free-tailed(Hollow-faced(Hedgehogs)(Horseshoe(Leaf-nosed(Sheath-tailed(Elephant(Simple-nosed(Big-earedBats)Bats)Shrews)BatsBats)Bats) Pteropodida (Fruit-eating Bats) 2.3. INTRODUCTIONSYSTEMATICLIST OF SPECIESAND SUBSPECIES: PAGE CHIROPTERA Chrysochlorida (Golden" Moles to) ···302306191210.3521. ·2387 . · 6 · IAN. (1)(2)1951(-)(4)(21)(1)(6)(14)(6)(5),(7)(8)333310302304306332298305309303297337324325336337339211327 . SWYNNERTON,. P.Z.S.,·· ·Checklist··· of·Land 3293Mammals52 275 PItIMATES G. It. RhinocerotidaPelidaEchimyidaHyanidaPongidaCercopithecidaHystricidaMuridaHominidaAnomaluridaPedetidaCaviidaMustelidaGliridaSciuridaViverrida(Cats,(Mice,(Dormice)(Guinea-pigs)(Apes)(Squirrels)(Spring(Hyaenas,(Genets,(Man)(Polecats,(Cane(porcupines)(Flying(Rhinoceroses)Leopards,(Monkeys,Rats,Haas)Rats)Civets,Arad-wolf).Weasels,Squirrels)Gerbils,Lions,Baboons)Mongooses)Ratels,etc.)•Cheetahs)..Otters) ProcaviidaCanidaLeporidaElephantidaLorisidaOrycteropodidaEquidaBathyergidaManida
    [Show full text]
  • Fukomys Damarensis – Damaraland Mole-Rat
    Fukomys damarensis – Damaraland Mole-rat its population is unlikely to be declining. It is locally common and is frequently found at high population densities. Regional population effects: This species is naturally fragmented, but no distinct barriers to dispersal have been identified, and thus a rescue effect is possible. Distribution Endemic to sub-Saharan Africa, this species is widespread across the central regions of southern Africa, occurring from central and northern Namibia, across Hannah Thomas western Zambia, and throughout the majority of Botswana (with the exception of the extreme east), into western Zimbabwe. The habitat of this species is contiguous, Regional Red List status (2016) Least Concern although naturally fragmented. The southern portion of its National Red List status (2004) Least Concern range extends into the Northern Cape and North West provinces of South Africa, where it occurs in the Kgalagadi Reasons for change No change Transfrontier Park, Tswalu Game Reserve, Hotazel, Blackrock and Winton (Figure 1). Its distribution is Global Red List status (2016) Least Concern associated with red Kalahari arenosols but it also occurs TOPS listing (NEMBA) None in coarse sandy soils (Bennett 2013). CITES listing None The species is said to be sympatric with the Common Mole-rat (Cryptomys hottentotus) where soil sandiness Endemic No ensures local niche differentiation (Skinner & Chimimba 2005), which may be the case in the North West Province This species is locally abundant in suitable habitat where such conditions exist (Power 2014). Power (2014) and there is little conflict with agriculture due to its surmises that the Mafikeng Bushveld vegetation type is a arid distribution (Bennett 2013).
    [Show full text]
  • Příbuzenská Struktura a Paternita V Přírodní Populaci Solitérního Rypoše
    Masarykova univerzita v Brně Přírodovědecká fakulta Ústav botaniky a zoologie Příbuzenská struktura a paternita v přírodní populaci solitérního rypoše Heliophobius argenteocinereus Diplomová práce 2007 Bc. Hana Patzenhauerová Vedoucí DP: Mgr. et. Mgr. Josef Bryja, PhD . Poděkování : Na prvním místě bychchtěla poděkovat svémuvedoucímuJosefuBryjovi nejenza nesmírnou pomoc při psanítétopráce,aleiza jejízajímavétéma.DěkujitakéRadimuŠumberovizasběr materiálu vMalawi a za nekonečnou ochotu odpovídat na všetečné otázky týkající se rypošíhoživota.Dále bychráda poděkovala všem lidem,kteří patří klaboratořím Oddělení populační biologie ÚBO ve Studenci,kde byla tatopráce zpracována, za vytvoření úžasného kolektivu. 2 OBSAH 1 Abstrakt....................................................................................................................4 Abstract....................................................................................................................5 2 Úvod..........................................................................................................................6 2.1 Rypošovitía jejichsociálnísystémy..........................................................6 2.2 Dosavadní poznatkyobiologii rypošestříbřitého...................................10 2.3 Genetickémarkerypoužívanékanalýzám příbuzenskéstrukturya párovacíchsystémů..................................................................................15 2.4 Genetické párovacísystémya příbuzenskástruktura populací podzemníchsavců....................................................................................17
    [Show full text]
  • Garamba!National!
    ! ! Mammalwatching!in!Garamba!National!Park! Democratic!Republic!of!Congo! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Mathias!D’haen! Introduction* ! As!a!master!student!I!am!currently!based!in!Garamba!National!Park!to!collect!the!data!for!my! thesis! on! the! local! Kordofan! giraffe! population,! a! subspecies! of! the! Northern! giraffe! (G.! camelopardalis! antiquorum),! all! this! through! my! university! in! Czech! republic! (Czech! University!of!life!Sciences)!and!with!help!of!the!Giraffe!Conservation!Foundation!(GCF)!and! its!partners.! I!am!here!for!nearly!a!year!now!and!apart!for!the!research!I’ve!been!trying!to!focus!myself!on! getting!a!thorough!understanding!of!what’s!flying,!creeping!and!running!around!in!the!park!–! I!will!stick!to!mammals!in!this!report.! ! As! I’ve! been! living! here! it’s! difficult! to! give! a! dayJtoJday! itinerary! or! even! to! give! a! breakdown!of!places!to!visit!in!the!surrounding.!The!park’s!management,!a!joint!venture!of! Institut!Congolais!pour!la!Conservation!de!la!Nature!(ICCN)!and!African!Parks!(AP)!is!making! a!huge!progress!in!terms!of!securing!and!developing!the!area!but!real!hotspots!for!naturalists! haven’t!been!established!yet!–!for!now!species!are!to!be!found!in!equal!densities!everywhere! in!the!park.!! Therefore!it!makes!sense!that!I!limit!myself!to!giving!a!breakdown!of!the!species!that!I’ve! seen!so!far!and!what!more!species!could!be!expected!based!on!historical!records.! ! All!photos!and!observations!can!also!be!found!on:!https://observation.org/user/view/40842! ! Location*and*Vegetation*
    [Show full text]
  • Tanzania's Nature Reserves
    THE ARC JOURNAL ISSUE 30 Biannual Newsletter Issue No. 30 May 2017 This Edition of the Arc Journal is focused on Tanzania’s 12 Nature Forest Reserves. It includes descriptions of each of the 12 reserves as well as information on their biodiversity, threats, management and tourism SPECIAL opportunities. View of Mkingu Nature Reserve. ISSUE Photo by Rob Beechey. Tanzania’s Nature Reserves Introducing Tanzania’s Nature Reserves coastal forests in southern Tanzania (Rondo) and one encompasses the forests of a recently dormant Prof. Dos Santos, Head of the Tanzania Forest Services Agency volcano (Mount Hanang). The last one (Minziro) In Tanzania, the Forest Legislation of 2002 recognises includes areas of lowland swamp forest close to the Nature Forest Reserves as protected forest areas of Uganda border that is of similar composition to the particularly high importance for the conservation of forests of the Congo Basin. biodiversity. Formation of Nature Reserves started in These 12 Nature Forest Reserves are being managed the 1990s with the gazettement of the Amani Nature by dedicated conservators distributed at each site, Reserve in the East Usambara Mountains. supported by teams of professional staff. At the Since 2002, the government has been working to TFS headquarters the Nature Forest Reserves fall identify and upgrade the status of a network of key under the section that manages the natural forests sites for conservation that were already under the of Tanzania. management of the Tanzania Forest Services (TFS) TFS is committed to improving the management Agency. There are two portfolios of Nature Forest and conservation of these sites and aims to Reserve – firstly within the Eastern Arc Mountains, generate sustainable sources of revenue for both the whilst the other portfolio includes the best examples management of the reserves and also to help support of other forests in the country.
    [Show full text]
  • Mammals List
    CA, WEA Pan troglodytes Chimpanzee From S Senegal across the forested belt north Chimpanzé of the R. Zaire to W Uganda and Tanzania; Rainforest, forest gallerie, extending into savanna woodland; lowland and montane forest Gorilla gorilla Gorilla western population : from R. Cross to R. Zaire Gorille / R. Sangha, with a relict population at Bondo on R. Uelle; eastern : triangular wedge between R. Luala (R. Zaire), L. Edward and L. Tanganyika; Lowland tropical rainforest; mountain and subalpine environment Piliocolobus Central African Red CA north and east of R. Zaire to W Uganda oustaleti Colobus and W Tanzania; Colobe bai ranges between 300 and 2500 ms from levee, d'Ouganda swamp, lowland and mixed to montane forests Colobus angolensis Angola Pied CA, EA; Colobus Montane and lowland forests Colobe d'Angola Colobus angolensis Angola Pied CA, EA; Colobus Montane and lowland forests Colobe d'Angola Colobus guereza (= Guereza Colobus CA, EA; C. abyssinicus) Colobe guereza Montane and lowland forests Papio anubis Olive baboon northern and eastern CA, EA; Babouin doguera from Sahelian woodland to forest-mosaic habitats Lophocebus Grey-Cheeked CA to W EA; albigena Mangabey rainforest, esp. swamp forest Mangabé à joues grises Cercopithecus Patas Monkey northern CA, EA; (Erythrocebus) Singe rouge mostly Acacia-grassland patas Cercopithecus Grivet, Vervet, etc. northern and southern CA, EA, SA; aethiops Monkeys savannahs, woodlands, forest-grassland mosaic Singe vert Cercopithecus L'Hoest's Monkey Bioko, SW Cameroon, Central Gabon, eastern l'hoesti Cercopithèque de RDC, W Uganda, Rwanda, Burundi L'Hoest forest, both montane and lowland, thickets, swamp forest Cercopithecus De Brazza's Monkey S Cameroon and Equatorial Guinea to neglectus Cercopithèque de Ruwenzori and Mt.
    [Show full text]
  • Word-Markers: Toward a Morphosyntactic Description of the Gorwaa Noun
    SOAS Working Papers in Linguistics Vol. 19 (2018): 49-89 Word-markers: Toward a morphosyntactic description of the Gorwaa noun Andrew Harvey [email protected] Abstract Many languages are said to possess “gender”, that is, a morphosyntactic system in which nouns induce formal marking on other words beyond the noun itself (adjectives, verbs, etc.). Gorwaa (gow; South-Cushitic; Tanzania) possesses a gender system which is interrelated with number in a complex manner. Following the line of reasoning that biological (semantic) sex, grammatical (syntactic) gender, and (morphological) form-class are “interrelated but autonomous domains of linguistic generalization” set out in Harris’ (1991) examination of Spanish, and establishing that number and gender are interrelated in a complex manner, this paper considers the morphophonological word-markers of Gorwaa, a language whose nominal morphology is considerably different from that of Spanish. Following a discussion of gender and number in Gorwaa, all word-markers and their associated gender and number values are identified. In addition to being a useful exercise in arranging the empirical data, this paper sheds light on some surprising surface patterns of a little-studied language. Keywords Morphology, Afro-Asiatic, Gorwaa, Gender, Number, Word-markers 1. Introduction Gender is classically defined as a grammatical property which “determine[s] other forms beyond the noun” (Corbett 1991: 4). Because the nouns ‘wine’ and ‘cream’ in (1) below determine the forms of ‘a’, ‘good’, and ‘white’ in two different ways, these nouns are said to belong to two different genders. (1) a. un bon vin blanc (French) INDEF.M good.M wine white.M ‘a good white wine’ b.
    [Show full text]
  • Word-Markers: Toward a Morphosyntactic Description of the Gorwaa Noun
    SOAS Working Papers in Linguistics Vol. 19 (2018): 49-89 Word-markers: Toward a morphosyntactic description of the Gorwaa noun Andrew Harvey [email protected] Abstract Many languages are said to possess “gender”, that is, a morphosyntactic system in which nouns induce formal marking on other words beyond the noun itself (adjectives, verbs, etc.). Gorwaa (gow; South-Cushitic; Tanzania) possesses a gender system which is interrelated with number in a complex manner. Following the line of reasoning that biological (semantic) sex, grammatical (syntactic) gender, and (morphological) form-class are “interrelated but autonomous domains of linguistic generalization” set out in Harris’ (1991) examination of Spanish, and establishing that number and gender are interrelated in a complex manner, this paper considers the morphophonological word-markers of Gorwaa, a language whose nominal morphology is considerably different from that of Spanish. Following a discussion of gender and number in Gorwaa, all word-markers and their associated gender and number values are identified. In addition to being a useful exercise in arranging the empirical data, this paper sheds light on some surprising surface patterns of a little-studied language. Keywords Morphology, Afro-Asiatic, Gorwaa, Gender, Number, Word-markers 1. Introduction Gender is classically defined as a grammatical property which “determine[s] other forms beyond the noun” (Corbett 1991: 4). Because the nouns ‘wine’ and ‘cream’ in (1) below determine the forms of ‘a’, ‘good’, and ‘white’ in two different ways, these nouns are said to belong to two different genders. (1) a. un bon vin blanc (French) INDEF.M good.M wine white.M ‘a good white wine’ b.
    [Show full text]