Quantifying the Semantic Core of Gender Systems Adina Williams@ Ryan CotterellS,H Lawrence Wolf-SonkinS Damian´ BlasiP Hanna WallachZ @ Facebook AI Research, New York, USA SDepartment of Computer Science, Johns Hopkins University, Baltimore, USA HDepartment of Computer Science and Technology, University of Cambridge, Cambridge, UK PComparative Linguistics Department, University of Zurich,¨ Switzerland ZMicrosoft Research, New York City, USA
[email protected],
[email protected],
[email protected] [email protected],
[email protected] Abstract portion of the lexicon where this relationship is clear usually consists of animate nouns; nouns Many of the world’s languages employ gram- referring to people morphologically reflect the matical gender on the lexeme. For example, sociocultural notion of “natural genders.” This por- in Spanish, the word for house (casa) is feminine, whereas the word for paper (papel) tion of the lexicon—the “semantic core”—seems is masculine. To a speaker of a genderless to be present in all gendered languages (Aksenov, language, this assignment seems to exist 1984; Corbett, 1991). But how many inanimate with neither rhyme nor reason. But is the nouns can also be included in the semantic core? assignment of inanimate nouns to grammatical Answering this question requires investigating genders truly arbitrary? We present the first whether there is a correlation between grammatical large-scale investigation of the arbitrariness gender and lexical semantics for inanimate nouns. of noun–gender assignments. To that end, we use canonical correlation analysis to correlate Our primary technical contribution is demon- the grammatical gender of inanimate nouns strating that grammatical gender and lexical with an externally grounded definition of their semantics can be correlated using canonical lexical semantics.