Studies in Hispanic and Lusophone Linguistics 2017; 10(1): 39–66

David Eddington* Dialectal variation in Spanish diminutives: A performance model

DOI 10.1515/shll-2017-0002

Abstract: While the diminutive form of most Spanish words is invariant, a great deal of variation is found in bisyllabic words that either contain /je, we/ in the stem (e. g., viejo ‘old’> viejito/viejecito, pueblo ‘town’ > pueblecito/pueblito), or that end in /e/ (e. g., dulce ‘sweet’ > dulcecito/dulcito), or that end in /jo, ja/ (e. g., rubio ‘blonde’ > rubiecito/rubito). Data from the Corpus del Español indicate that in many cases both diminutive forms exist within a single country. This kind of variation has been accounted for in a number of competence-based studies. However, many of these studies, along with the entities and mechanisms they employ, are not designed to explain actual language processing. The purpose of the present study, on the other hand, is to present a performance model of diminutive formation that accounts for the observed variation. The model assumes that highly frequent diminutives have been lexicalized, and as a result, their production is a matter of lexical retrieval. In contrast, low frequency words are diminutivized based on analogy to the diminu- tive forms of words stored in the mental lexicon. A data set of existing diminutives in each country was extracted from the Corpus del Español. Using these data sets, a series of computational simulations was performed in order to predict the diminu- tive allomorphs. The model proved to be highly successful in correctly predicting the diminutives in each country.

Keywords: Spanish, diminutives, analogical modeling, performance model

1 Introduction

Perhaps the most studied aspect of Spanish diminutives are the allomorphs of -ito/a (i. e., the short forms -ito/a,andthelongforms-cito/a, -ecito/a).Onelineof research is descriptive and documents how the allomorphs are distributed, some with an emphasis on different varieties of Spanish (e. g., Bradley and Smith 2011; Callebaut 2011; Castillo Valenzuela and Ortiz Ciscomani 2013; Fontanella 1962; Gaarder 1966; Horcajada 1988; Jaeggli 1980; Miranda 1999; Norrmann-Vigil 2012; Rojas 1977). Another focus has been on describing diminutive formation within

*Corresponding author: David Eddington, Brigham Young University, Provo, UT 84602, USA, E-mail: [email protected]

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 40 David Eddington different theoretical frameworks: lexical phonology (Castro 1998), exemplar theory (Eddington 2002), optimality theory (Bradley and Smith 2011; Colina 2003; Elordieta and Carreiras 1996; Miranda 1999; Smith 2011; Stephenson 2004), Maximum entropy modeling (Norrmann-Vigil 2012) and others (Ambadiang 1996; 1997; Bermúdez-Otero 2007; Crowhurst 1992; Prieto 1992). Diminutivization in Spanish is a very robust process that can potentially apply to all nouns and adjectives, and occasionally other words such as gerunds and adverbs. In the majority of Spanish words that undergo diminutivization, there is not cross-dialectal variation. For instance, the diminutive of mesa ‘table’ is mesita and the diminutive of zapato ‘shoe’ is zapatito throughout the Spanish- speaking world. However, there are certain small areas in the lexicon where diminutives vary a great deal from country to country, and possibly from speaker to speaker. The present paper will focus on three of these, the first being bisyllabic words with /je/ and /we/ in the stem (DIPH, e. g., viejo ‘old’> viejito/viejecito, pueblo ‘town’ > pueblecito/pueblito). A great deal of variation is also found in bisyllabic words ending in /e/ (FINAL -E, e. g., dulce ‘sweet’ > dulcecito/dulcito, diente ‘tooth’ > dientecito/dientito). Finally, bisyllabic words ending in -io/a yield different diminutives in different regions (FINAL -IO/A, e. g., indio ‘Indian’ > indiecito/indito, rubio ‘blonde’ > rubiecito/rubito). The central purpose of this paper is to provide a performance model that accounts for how Spanish speakers may determine the correct diminutive allomorph of a given base word, and more importantly, how dialectal variation may be explained. The paper begins by framing itself as a performance rather than a compe- tence account of the issue. The following section describes the wide variation in diminutive forms found both between and within Spanish-speaking countries. How such variation is handled in competence approaches is briefly reviewed, and analogy is suggested as a model that may explain linguistic performance. However, it is argued that the best model of diminutive variation is one in which both derivation by analogy and retrieval of lexicalized forms play a part.

2 Competence and performance perspectives

Many linguists have followed Chomsky’s lead by emphasizing the study of linguistic competence: the knowledge that a completely fluent, ideal speaker/hearer would have in a completely homogenous linguistic community (Chomsky 1965: 3). Performance, on the other hand, is the processing of language in real time by actual language speakers. Chomsky is quick to clarify the relationship between the formal mechanisms of an analysis of competence and linguistic performance:

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 41

Although we may describe the grammar G as a system of processes and rules that apply in a certain order to relate sound and meaning, we are not entitled to take this as a description of the successive acts of a performance model. (Chomsky 1968: 117)

Kager reflects this same sentiment when discussing optimality theory:

Explaining the actual processing of linguistic knowledge by the human mind is not the goal of the formal theory of grammar … a grammatical model should not be equated with its computational implementation. (Kager 1999: 26)

Performance approaches, in contrast, attempt to understand the way actual speakers learn and process linguistic information in real time. Since perfor- mance deals with actual behavior, it should be carried out empirically. That is, it must deal with entities that are observable in the speech signal or via the results of psycholinguistic experiments. Entities that are unobservable in the real world and mechanisms that are purely theory internal are not useful, and therefore not permitted in an empirical study. The principal reason for this is that such entities are not subject to potential falsification (Popper 1968). Finally, an empirical approach requires a hypothesis that makes predictions about linguistic behavior. This contrasts with competence approaches that essentially describe linguistic data, but are not designed to address linguistic behavior. The distinction between competence and performance approaches as they relate to the question of diminutive formation and dialectal variation will be discussed later on in the paper.

3 The corpus data

The purpose of the present paper is to consider variation in diminutive formation across different varieties of Spanish. Most extant studies of the Spanish diminu- tive do not consider dialect variation. Those that do emphasize how diminutive formation occurs in a particular country, but do not make cross-country com- parisons. One exception is Prieto (1992) who gathered intuitions from one or two speakers from seven countries. Another is Callebaut (2011) who extracted diminutives from 14 countries using the CREA corpus1 and other online sources. However, in many cases, even this 70 million word corpus only yielded a handful of instances of diminutives of a particular word in a particular country. The possibility of examining diminutive forms from a variety of countries has

1 http://corpus.rae.es/creanet.html

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 42 David Eddington been aided by the recent release of the newly updated Corpus del Español.2 This corpus contains 2 billion words of Spanish from 21 different countries.3 Roughly 60 % of the data come from blogs, meaning it covers informal registers quite well. Given its size, it should allow the process of diminutive formation to be more rigorously explored across the Spanish speaking world. Of course, corpora do have their limitations and drawbacks. The existence of typographical errors in the source documents, as well as errors introduced in the compilation and tagging process are always an issue for corpora. The country of origin in this corpus was determined by Google’s algorithm4 which is not fool- proof, and may categorize a document into the wrong country. In like manner, the fact that a blog was written in one country does not exclude the possibility that its author may actually be from another. An example of this type of error is seen in the word gurises ‘boys,’ which is used exclusively in Uruguay and to a lesser extent in , yet the corpus shows scattered tokens of this word appearing in many other countries, which are either produced by Uruguayans abroad or miscategorized documents. Another potential problem is that the corpus is divided along national boundaries. Although the data are taken from individual countries, one country may house several dialects that differ in how they form diminutives. Additionally, if the corpus happens to incorporate a document in which one particular author uses a large number of diminutives, that author may skew the results for that country. In spite of these issues, the new Corpus del Español is currently the best source for looking at diminutive variation by country, and the variation found therein still warrants investigation.

3.1 Methodology

As already mentioned, the great majority of Spanish words that can undergo diminutivization demonstrate no variation. For example, the only extant diminu- tive of manzana ‘apple’ is manzanita not *manzancita, *manzanacita or *man- zanecita. The present study focuses on words in the DIPH, FINAL -E, and FINAL -IO/A categories due to the high degree of variation found there. However, it was

2 http://corpusdelespanol.org 3 Number of million words in the corpus by country: 169.4 Argentina, 39.3 Bolivia, 66.2 Chile, 166.4 , 29.5 , 63.2 , 33.6 , 52.3 Ecuador, 426.5 , 54.2 Guatemala, 35.1 , 245.9 México, 32.3 , 22.2 Panamá, 107.2 Peru, 32.1 , 29.7 Paraguay, 36.4 El Salvador, 166.0 USA,38.7 Uruguay, 98.1 4 https://support.google.com/webmasters/answer/62399?hl = en Davies (2015) grappled with this issue when compiling the Global Web-based English Corpus.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 43 not only imperative to include words that vary in their diminutive, the words must be frequent enough in the language that a substantial number of them occur in the corpus in each of the 21 countries studied. Nothing is learned if we find that the short diminutive form of word X appears in countries A, B, and C 3 times, while the long version of the same word appears in countries D, E, and F 5, 1, and 0 times respectively. For this reason, the initial goal was to collect at least ten instances of each word in each country. However, the corpus is not comprised of an equal number of words from each country. Ten instances of a particular diminutive form were often not present in underrepresented countries. In any event, these two requirements severely narrowed the number of test cases initially considered (see Appendix 1). The Corpus del Español is not comprised solely of edited text from sources such as newspapers and books. It includes more colloquial data collected from sources such as blogs. As a result, there are many non-standard usages and spellings, and those needed to be included in the searches along with the standard forms. For this reason, in the search for abuelecito/a, for example, included the feminine abuelecita, and the plurals abuelecitos and abuelecitas,as well as forms with the other allomorphs (abuelito/a/s). However, it was impor- tant to search for the non-standard forms such as avuelecito/a/s, abuelesito/a/s, avuelesito/a/s, avuelito/a/s as well. Common spelling variants that were included in the corpus searches were s~c (ciego~siego ‘blind’), hie-~ye (hier- ba~yerba ‘herb’), y~ll (cuello~cueyo ‘neck’), b~v (vuelta~buelta ‘return’).

3.2 Results of the corpus search

Diminutives in the three categories being considered all have a long and short form. The data resulting from the corpus search include 20 DIPH words, and the individual results for each of these are found in Appendix 1. In addition to being phonologically similar, these 20 words have the same morphological composi- tion and terminal element. Each of these words has a short diminutive form (e. g., diente > dientito ‘tooth’ and cuento > cuentito ‘story’) and a long diminutive form (e. g., diente > dientecito and cuento > cuentecito). Only five FINAL -E words were frequent enough in the corpus to be included. However, they all have the same morphological structure, so that they can be considered together: suave, dulce, carne, diente, and leche (‘soft, sweet, meat, tooth, milk’). Given the diphthong in diente it could be cross categorized as a bisyllabic word containing /je/ as well. Nevertheless, like the other words in this category that end in /e/ it has a long and short diminutive form. Finally, four FINAL -IO/A words resulted from the corpus search: indio/a

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 44 David Eddington

‘Indian,’ limpio/a ‘clean,’ novio/a ‘fiance/fiancee,’ and rubio/a ‘blond.’ Each has a long and short form as well (e. g., limpito, limpiecito). At first it appears that speakers could carry out diminutive formation solely on surface apparent traits. For example, if all bisyllabic words that contain /je, we/ are diminutivized in the same fashion (either pueblo > pueblito or pueblo > pueblecito) no other information is needed to predict the correct diminutive. Eddington (2002) suggests that dialectal variation may be accounted for if one assumes that, in a given dialect, similar words always form diminutives in the same way. That is, in Dialects A, B, and C speakers produce the short form diminutives of all DIPH words yielding pueblito, viejita, and cuentita. In contrast, speakers of Dialects D, E, and F produce the long diminutive versions to all words with similar bases resulting in pueblecito, viejecita, and cuentecita. If this is true then a dialect that has the long diminutive vueltecita ‘return,’ would be expected to have the long diminutives tiempecito ‘time,’ and pueblecito ‘town’ as well. Conversely, a dialect with the short diminutive vueltita would be expected to have tiempito, and pueblito. This is a hypothesis that can be tested using the results of the corpus search.

3.3 Testing the relationship between the long and short forms of DIPH words

A cursory view of the results of the corpus suggests that Eddington’s hypothesis does not hold. Fortunately, the corpus data provide an empirical way of testing it. The proportion of the long -ecito/a/s forms of each of these 20 words may be correlated with the proportion of the long diminutive forms of the other 19 words of this type. Figure 1 exemplifies the kind of positive correlation between two diminutives that the hypothesis predicts.5 It is clear that in Costa Rica and Uruguay no examples of the long forms piececita ‘piece’ or tiempecito were found in the corpus, while in Mexico and Spain, higher proportions of piececita are associated with higher proportions of tiempecito (r (5) = 0.984, p < 0.0005). However, in many other cases, such as nieto ‘grandson’ and viejo (Figure 2) no relationship is found. That is, higher proportions of the long form nietecito/a/s variants are not correlated with a higher proportion of the long form viejecito/a/s diminutives (r (14) = 0.08, p < 0.769). If the same diminutivizaton process applies to all words of this type, then correlating the long form of each word with the long form of all other words of this type would be expected to result in

5 When there were fewer than ten cases the results for a particular country, those results were not included in the charts or in the statistics.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 45

Figure 1: Proportion of diminutives of piececita and tiempecito by country.

Figure 2: Proportion of diminutives of viejecito/a/s and nietecito/a/s by country.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 46 David Eddington many significant correlations. To this end, 196 of these words were paired, which resulted in 171 correlations, and of these, only 48 (28 %) are statistically sig- nificant (p < 0.05).7 Another possibility that needs to be considered is that words with /je/ and /we/ in the stem should not be considered together. Speakers may process each subgroup differently. The following is found when words containing /je/ and /we/ are separated into different groups. Of the 55 pairings of the 11 words with /we/, only 14 (25 %) are significant at the p <0.05level(25%).8 Of the 28 pairings of the 8 words with /je/, only 5 are significant (18 %).9 Regardless of whether they are considered together or separately, the lack of widespread correlation argues against the idea that the diminutives of words of this sort could be produced by merely producing a long (or short) diminutive form to DIPH words depending on the dialect. Clearly other factors need to be considered.

3.4 Testing the relationship between the long and short forms of FINAL -E words

Dothefivewordsendingin/e/(suave, dulce, carne, diente,andleche ‘soft, sweet, meat, tooth, milk’) consistently take the long or short diminutive form in a given country? Correlations between the proportions of the long -cito/a/s diminutives of each of these five words were run, and none of the ten correla- tions were significant. This lack of relationship can be exemplified by Colombian speech, where the long forms dientecito and suavecito are the most frequent diminutives (92 % and 94 % respectively), but the short forms carnita and lechita aremorecommon(84%and75%)thancarnecita or lechecita. Once again, this does not bode well for a model that, in a given county, predicts the same diminutive allomorph for all words with a similar phonological structure.

6 Hueco was not included because the corpus search encountered 10 or more instances of this word in only 5 countries. 7 Vuelta/cuento, vuelta/cuerpo, vuelta/pueblo, vuelta/puerta, vuelta/fiesta, vuelta/nieto, vuelta/ pieza, vuelta/quieto, vuelta/tiempo, cuento/cuerpo, cuento/fiesta, cuerpo/nuevo, cuerpo/fiesta, cuerpo/nieto, hueco/hueso, hueco/pueblo, hueco/viejo, hueco/pieza, hueso/juego, hueso/pueblo, hueso/viejo, hueso/pieza, hueso/quieto, huevo/juego, huevo/viejo, huevo/nieto, huevo/tienda, juego/ pueblo, juego/rueda, juego/viejo, juego/pieza, juego/pierna, juego/tiempo, nuevo/fiesta, nuevo/nieto, nuevo/quieto, pueblo/viejo, pueblo/pieza, pueblo/tiempo, puerta/fiesta, puerta/nieto, puerta/tiempo, rueda/pierna, viejo/pieza, fiesta/nieto, fiesta/quieto, pieza/tiempo, quieto/tiempo 8 vuelta/cuento, vuelta/cuerpo, vuelta/pueblo, vuelta/puerta, cuerpo/nuevo, hueco/hueso, hueco/pue- blo, hueso/huevo, hueso/juego, hueso/pueblo, huevo/juego, huevo/pueblo, juego/pueblo, juego/rueda. 9 Viejo/pieza, fiesta/nieto, fiesta/quieto, pieza/tiempo, quieto/tiempo.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 47

3.4.1 Testing the relationship between the long and short forms of FINAL -IO/A words

Four words of this type had diminutive forms that were frequent enough that they could be compared: indio/a ‘Indian,’ limpio/a ‘clean,’ novio/a ‘fiance/fian- cee,’ and rubio/a ‘blond.’ When the proportion of the long -ecito forms were compared across countries, no significant correlations were found.

4 Accounting for dialectal variation in Spanish diminutives in competence models

To this point, the phonological and morphological constituents of the base forms on which diminutives are formed, along with information about the country of origin of the speaker have been the only factors considered to play a part in diminutive formation. The difficulty with this information is that it is not enough to account for the variation observed in the corpus. Clearly, other elements need to be included. The competence-based literature has suggested a number of mechanisms to account for dialectal differences in diminutivization. For example, in standard optimality theory (e. g., Colina 2003) it is assumed that different varieties have the same constraints, but that they rank them in different orders. Prieto (1992) and Crowhurst (1992) suggest that some varieties use a minimal word template composed of two bisyllabic feet, while others do not. If dialectal differences were merely expressed as different mechanisms in the theoretical apparatus operating differently in each country (in the form of rule orderings, constraint rankings, word templates, etc.) we would expect the same kind of diminutives to be produced for words with similar characteristics. However, judging by the high degree of within-country variability just described, this is not the case. The existence of diminutive doublets within a single country is also problematic. For example, consider the diminutives of vuelta. Although it is possible to split the countries into those use vueltita in the majority and those that favor vueltecita (Figure 3), both the short and long forms exist in every country. Or course, it would not be difficult to employ these same kinds of mechanisms so that individual words, or groups of words pass through the theoretical system in such a way that the correct diminutive of all or most words would be accounted for. Early attempts at dealing with variation of this sort within the generative paradigm were to assume that rules were variable, and to incorporate a rate at which the rule would apply (e. g., Labov 1969). More recent theories, such as

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 48 David Eddington

Figure 3: Proportion of diminutives of vuelta by country. stochastic optimality theory (Boersma and Hayes 2001) incorporate usage data in order to construct a grammar that yields variation in output candidates. Another possible solution could be to invoke lexicalization of certain diminutive forms. The idea is that the diminutive forms of certain words are stored as wholes rather than being derived from their bases. In this scenario, variation occurs because the lexicalized diminutive of a base word is different from the diminutive derived from the base word by the generative system. Therefore, diminutivization may involve both retrieval from lexical storage and derivation. Formal mechanisms such as rules, intermediate derivations, features, con- straints, and orderings on constraints have been shown to be useful devises for investigating language phenomenon. One could argue that such mechanisms are adequate to describe the linguistic competence involved in Spanish diminutive formation. It is important to state that the purpose of the present paper is not to enter into a discussion of how valid such analyses may be within their own theoretical sphere. Speaking of competence approaches Prince and Smolensky (2004: 233) emphasize that such theories should be free to use any kind of formal device regardless of whether it is computationally plausible or not. However, from the standpoint of modeling performance, many formal mechanisms simply do not relate to entities that exist in real space and real time, which makes them difficult to test empirically. Therefore, just as competence models should not be judged on their computational or psychological plausibility, performance models should not be evaluated with the criteria used in formal grammar building. In spite of the

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 49 differences between the two approaches to language, the question is whether it is possible to find some formal entity that may relate to real-world performance. The concept that appears to be able to make the jump from the competence to the performance realm is lexicalization.

5 Lexicalization

For the purposes of the present paper, a word is lexicalized if it is stored as a whole in the mental lexicon rather than being derived by some process of word formation. Sometimes the concept of lexicalization is invoked to explain coun- terexamples: if some model does not account for certain forms, they must be lexicalized. The problem with this use of lexicalization is that it allows the researcher to determine which items are lexicalized in an ad hoc fashion. In contrast, psycholinguistic research gives insight into a more principled way of determining lexicalization. There is a good deal of evidence that highly frequent morphologically complex words are stored as wholes in the mental lexicon while low frequency words are composed (Caramazza et al. 1988; Chialant and Caramazza 1995; Schreuder and Baayen 1995; Sereno and Jongman 1997; Stemberger and McWhinney 1988). Bybee (2001) argues that lexicalization of high frequency words explains why they are accessed more quickly and are less likely to undergo analogical changes such as regularization. The model described below employs frequency as a way to include lexicalization in a performance model of diminutivization.

6 Modeling dialectal differences in Spanish diminutivization

The remainder of the paper lays out and tests a performance approach model of Spanish diminutive formation. A crucial component of this, or any other model, is that it can account for the kinds of variation already described. Furthermore, it needs to be psychologically plausible.

6.1 Analogical modeling

The idea that reference to stored exemplars may explain aspects of linguistic processing such as language change, gang effects, and frequency effects has

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 50 David Eddington been examined by numerous researchers over the years (e. g., Aha et al. 1991; Bybee 2001; Medin and Schaffer 1978; Pierrehumbert 2001; Stemberger and McWhinney 1988). One computationally explicit exemplar model is Skousen’s analogical modeling (AM; 1989, 1992, 1995) which Eddington (2002) used to model how diminutivization may take place in Spanish. In AM, when the behavior of a word needs to be predicted (in the present case the behavior is thediminutiveallomorphofabaseword)wordsarechosenfromthemental lexicon that are similar to the word in question based on three derived proper- ties (Skousen 1995: 217). The first is proximity, which means that the more similar a stored exemplar is to the word in question, the greater the chances of that exemplar being selected as the analogical model for the word in question. Second, gang effects arise if the exemplar is surrounded by other exemplars having the same behavior. When this occurs the probability of selecting these similarly behaving exemplars is substantially increased. Lastly, heterogeneity is the idea that an exemplar cannot be selected as the analogical model if there are more similar exemplars, with different behavior, closer to the word in question. Based on these properties, when predicting the diminutive form of a word, the model assembles an analogical set for the word in question. The set may contain a few or hundreds of words. The degree to which each word in the analogical set helps predict the diminutive allomorph is calculated and the overall effect of all the members of the analogical set is summarized (see Skousen 1989 for details). For example the analogical set for balde may contain these words, where the phones they have in common with balde are underlined: dulce, golpe baile, sueldo, banda. The predicted outcome for balde in a particular country may be 96 % baldecito and 4 % baldito. When one wishes to model variation, this output may be taken at face value. In AM this is referred to as random selection. At other times a winner-takes-all approach can be assumed that simply declares baldecito as the winning prediction. In AM this is known as selection by plurality. The characteristics of the word in question are used to determine similarity. They are encoded as variables in the data set used to approximate speaker’s knowledge, as well as in the test set of words whose diminutive forms one wishes to predict.

6.2 The variables

In the present study, each diminutive is represented in the data sets as a series of 14 variables. The first variable describes the relationship between the base and diminutive form. Although there are other relationships in the data set

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 51 besides the six below (e. g., animal > animalito, pie > piececito) they do not relate to the three classes of diminutives the paper deals with and do not need to be defined. The six values of the first variable are:

A pueblo > pueblito. -ito appears on the end of the base word minus the final vowel. B vieja > viejita. -ita appears on the end of the base word minus the final vowel. C rubio > rubiecito. -ecito appears on the end of the base word minus the final vowel. D india > indiecita. -ecita appears on the end of the base word minus the final vowel. E diente > dientecito. -cito appears on the end of the base word. F gente > gentecita. -cita appears on the end of the base word.

This variable is what the model attempts to predict. The values of this variable state the relationship between the base and diminutive. They say nothing about exactly how a speaker manipulates the base to derive the diminutive. If it were possible to view the actual steps in processing, they could in theory vary from one person to the next. What is important is that the model predict the correct relationship between a base and its diminutive. The remaining 13 variables describe the morphological and phonological makeup of the words. Consider the variables of the base arreglado (A, P m m e = gl a = d o = o arreglado):

 A: Relationship is arreglado > arregladito  P: The stress falls on the penultimate syllable (not the final F, antepenultimate A, or M if word is monosyllabic)  m: The word is masculine (not feminine, f, or n if the word has no gender as with adverbs).  m: Repetition of variable  in order to weight this morphological variable more than the other phonological ones.  e: The nucleus of the antepenultimate vowel is /e/.  = : There is no coda in the antepenultimate syllable. (The = indicates the variable is empty.)  gl: The onset of the penultimate syllable is /gl/.  a: The nucleus of the penultimate syllable is /a/.  = : There is no coda in the penultimate syllable.  d: The onset of the final syllable is /d/.  o: The nucleus of the final syllable is /o/.  = : There is nothing in the coda of the final syllable.  o: The final phone in the word is /o/.  arreglado: The base word is arreglado. (This insures that any other word ending in -eglado is not treated as identical to arreglado.)

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 52 David Eddington

6.3 The data sets

Analogy is based on the idea that linguistic processing relies on language experience. The goal of the paper is to account for dialectal differences in diminutive forms. Therefore, an approximation of what particular diminutive forms are used in each dialect is needed. The corpus search described in Section 2.3 yielded a number of diminutives that were used in the correlational analyses (Appendix 2). These were included in the data sets for the remaining simulations as well. However, those data were expanded by searching the Corpus del Español for nouns and adjectives ending in -ito/a/s in each of the 21 countries. The great majority of the resulting diminutives do not belong to the three classes of diminutives under investigation, yet is important to include them since the data sets need to be a representative sampling of diminutives in each country. A diminutive was included when it appeared in at least two or more countries with a frequency of at least 0.01 instances per million. The existence of tagging errors in the corpus required a great deal of hand manipulation to remove words from the results of the search that were not actually diminutives, as well as diminutives of proper names, typographical errors, etc. This resulted in the following number of diminutives in the type data set of each country (Table 1). The numbers vary due to differences in corpus sizes across countries, differences in the frequency of use of diminutives, and differences in how many different diminutives were captured when the corpus was compiled. The resulting corpora are referred to from here on as the type data set of a given country.

Table 1: Number of diminutives in the type data set of each country.

Country AR BO CL CO CR CU DO EC ES GT HN

Type            Test            Country MX NI PA PE PR PY SV US UY VE Type           Test          

Note: AR Argentina, BO Bolivia, CL Chile, CO Colombia, CR Costa Rica, CU Cuba, DO Dominican Republic, EC Ecuador, ES Spain, GT Guatemala, HN Honduras, MX Mexico, NI Nicaragua, PA , PE Peru, PR Puerto Rico, PY Paraguay, SV El Salvador, US United States, UY Uruguay, VE Venezuela.

In addition to the type data set, a token data set was created. It consists of one instance of a given diminutive for every time it appears in a given country in the Corpus del Español 0.01 times per million. For example, the Guatemalan token data

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 53 set contains 7 instances of pruebita ‘test’ since its frequency in the corpus is 0.07 per million. Variation in a single country is represented in these data sets in the following manner. In the Guatemalan type data set there is one instance of quietito ‘still’ and one of quietecito. In contrast, the token data set is based on frequency. It contains 2 instances of quietito and 9 of quietecito based on their per million frequencies of 0.02 and 0.09 in the Guatemalan section of the corpus. Another data set was created that contains diminutives of the three types under consideration. This is referred to as the test set. It is comprised of 128 items: 58 DIPH words (e. g., sueldo ‘salary’), 52 FINAL -E words (e. g., bosque ‘forest’), 7 FINAL -E words that also have stem diphthongs (e. g., fuerte ‘strong’), and 11 FINAL -IO/A words (e. g., tibio ‘lukewarm,’ see Appendix 3). The test set for each country consists of the words from this test set of 12810 that are attested to in each country’s individual type data set (Table 1).

6.4 Diminutivization by analogy

6.4.1 Simulation 1: Straight analogical derivation

Section 2.3 demonstrated that, in given country, base words with similar struc- tures do not necessarily form their diminutives with same allomorph. In this section is will be shown that the same lack of correspondence is found when straight derivation by analogy is applied. In Simulation 1, the diminutive forms of the words in the test set for each country were predicted based on the words in each country’s type data set. Since each test word also appears in the type data set, the simulation excluded using a base word from the type data set as the analog for that same word. In processing terms this is like assuming that a speaker remembers all of the diminutives s/he has experienced, except the one that s/he needs to use in the moment. Every time s/he produces a diminutive it must be derived based on analogy to stored diminutives. Of course, this is a highly unlikely scenario especially in the case of high frequency diminutives such as hijita and poquito which are probably accessed as lexicalized units from memory. For this simulation, the test set was modified so that when a word has two different diminutive forms, only the most frequent one was included, while the other was excluded. Here the success rate was calculated using a winner-take-all strategy. This means that the diminutive form that was predicted at the highest probability was the taken as the outcome. As Table 2 indicates, only around half

10 Spain has many doublets and since each gets an entry in the test data set there are more than 128 test words in its test set.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 54 David Eddington

Table 2: Percent of test items correctly predicted in Simulation 1.

Country AR BO CL CO CR CU DO EC ES GT HN

% corr. ........... Country MX NI PA PE PR PY SV US UY VE % corr. ..........

of the test items were correctly predicted by straight analogical derivation. Such a low success rate is clear evidence that straight analogical derivation is not a good model of how Spanish speakers carry out diminutive formation. It also serves to reiterate the need for a model that includes other information.

6.4.2 Simulation 2: Analogy and lexical memory

Another way of approaching this problem is to adopt a full lexical storage approach. All experienced diminutives are stored as wholes or lexicalized units; no derivation is required. Processing in such a model is merely a matter of lexical retrieval from memory. But, if morphological processing only entailed retrieval from memory, if all diminutives were lexicalized, there would be no mechanism for handling novel items that have not been previously encountered. Surely some kind of derivation is needed in those cases. The model in Simulation 2 entails both retrieval from memory and derivation by analogy. How does one determine whether a word has been lexicalized and is available to be accessed as a whole from memory, or if derivation from the base word applies? Rather than apply lexicalization in an ad hoc fashion, it is better to follow the psycholinguistic research which shows that high frequency words are more likely to be retrieved and low frequency words derived. This is achieved in the simulation by using the token data set where low frequency diminutives have few entries and high frequency forms have many depending on their frequency in the corpus. When the model attempts to predict the diminutive of a base word, it searches the token test set. If it finds the word in the token test set it also finds the relationship that exists between it and its corresponding diminutive. This is equivalent to remembering the previously encountered diminutive. A simulation that assumes that all forms are lexicalized is not particularly infor- mative since it will merely predict the diminutive forms in the proportions that they appear in the data set. For example, if the only diminutive of cuerpo is cuerpito, the model will predict cuerpito at 100 %. On the other hand, if 33 % of the instances are cuerpecito and 67 % cuerpito it will predict those as the

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 55 probabilities. When matching of this sort is disallowed, as in Simulation 1, the result is that the diminutive form of a base is determined by derivation by analogy to similar words. We have already seen in Simulation 1 that total lack of lexicalization is not a very successful solution either. Hence, the most psy- chologically plausible scenario is for high frequency diminutives to be remem- bered more often, and low frequency ones to be derived by analogy more often. Simulation 2 is actually a series of simulations that tests varying degrees of memory. This is done in the model by eliminating different percentages of the data in the token data set. For example, in the 50 % memory simulation, half of the instances in the token data set are randomly removed and the other half retained. Ten simulations are then run at 50 % memory, and in each case a new random half of the data is eliminated from consideration. In order to get a sense of how different degrees of memory affect the performance of the model, ten different levels of memory were tested (100 %, 90 %, and so on to 10 %). At each of those ten levels of memory, ten simulations were run. In those simulations, remembering the diminutive, that is, finding its base in the token data set is allowed. All of the test items, including doublets in a single country (e. g., vueltita/vueltecita) are included. The success rate is again determined by win- ner-take-all. Figure 4 shows the percentage of diminutives in the test set that is correctly predicted by the model in three sample countries. (Graphing the outcome of all 21 countries is not feasible since it renders the graph unreadable.) It is not

Figure 4: Success rates of Simulation 2 at different degrees of simulated memory.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 56 David Eddington surprising that success rates are about 100 % when every diminutive is remem- bered. The fact that it is not 100 % in some cases is due to the existence of a doublet where there are equal numbers of each diminutive (e. g., 6 instances of indito and 6 of indiecito). In such cases the winner-take-all strategy must cor- rectly predict one and therefore incorrectly predict the other. In this model, where memory and production both play a part, even when 90 % of the data are removed, and when predictions are only made on the remaining 10 %, the success rate only drops to into the 80 % range. This condition simulates a state- of-affairs in which remembering high frequency forms is likely, while derivation by analogy is more likely for forms of low frequency.

6.4.3 Simulation 3: Analogy and variation

The results of Simulation 2 are interesting, yet they use a winner-take-all approach. On the one hand, this makes it easy to calculate success rates. On the other, it masks some of the variation that has been observed. Clearly, a way of comparing model predictions and corpus frequency is needed that does not eliminate the inherent variation. Consider the outcome of a simulation from the data from Paraguay (Table 3). There the diminutive of cuerpo is predicted. In the corpus data for this country cuerpecito appears 0.84 times per million and cuerpito 0.54 times per million. The model predicts an average of 53 % cuerpecito (across the ten runs) and 47 % cuerpito. Using the winner-take-all strategy essentially converts the prob- abilistic output into a binary classification, which in this case yields cuerpecito at 100 % and cuerpito at 0. A better approach which does not eliminate the variation is to correlate the model’s raw predictions for a test word with the frequencies of its corresponding diminutives in the corpus. This is the purpose of Simulation 3.

Table 3: Corpus frequencies and model predictions for cuerpecito and cuerpito in a simulation for Paraguay.

Corpus frequency Winner-take-all Model predictions cuerpecito .   cuerpito .  

The advantage of using frequency is that it is determined by actual usage, as such it is external to the model. That is, it is not a model-internal parameter that one can manipulate to improve the model’s success rate. How much memory versus derivation to allow is a valid concern, however. Of course, more correct predictions occur when memory is set at higher levels. However, the purpose of

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 57 the present paper is not to find an ideal setting, but to simply to demonstrate that a psychologically plausible account of interdialectal variation of diminu- tives results from a model that includes both memory and derivation, when memory is in proportion to the observed frequencies of the diminutives in question. It would be theoretically possible to run simulations with memory set at 5 %, 10 %, 15 % and so on up to 100 %, in each of the 21 countries. Ideally, ten simulations would be run at each memory setting, but what would the outcomes of those 4200 simulations tell us other than what we have already observed in Figure 4 (i. e., higher memory gives results closer to those observed in the corpus)? Rather than take this arduous path, it seems more sensible to choose one setting and run ten simulations at that setting in each country for much more manageable 210 simulations. Ten percent memory is fairly low and allows 90 % of the items in the data set to be excluded. This in turn makes derivation more likely for low frequency items. An example of how predictions are made is useful for understanding the workings of the model. Consider the diminutive of balde ‘bucket’ in Argentina where there are seven instances of baldecito and none of baldito in the token data set. The winner-take-all success rate of the ten simulations is 6/10 correct, while the average percent that baldecito was predicted across the ten runs was 66 % and for baldito was 34 %. On five of the runs, balde was found in token data set. In one run, balde was not found, yet baldecito was correctly predicted. In this case, the analogical set was comprised of the words in Table 4. The number of pointers11 is a measure of how similar the word in the analogical set is to balde. The number of pointers is multiplied by the number of instances of the word in the token data set. The percent of influence of a particular word is its pointers by instances value divided by the total number of pointers. Here baldecito is predicted at 97 %, baldito at a slightly below 2 %, and the unattested and incorrect *baldita at about 1 %. (Nonexistent diminutives were rarely pre- dicted, and when they were they seldom exceeded 1 %.). At the heart of Simulation 3 are the set of ten runs performed on the token data set for each country. For all the simulations the memory rate was set at 10 %. The overall success rate in each country was obtained by correlating the percen- tage of the diminutive forms of each word (as observed in the corpus) with the percentage of each diminutive form that the model predicts. The model’spredic- tions are then averaged across the ten runs on the token data set for each country. The r values of each correlation appear in Table 5 and range from 0.93 to 0.63 with a mean of 0.84. All correlations are significant at p <0.05.

11 Exactly what pointers are, as well as details about the algorithm that AM uses is beyond the scope of the present paper, but may be found in Skousen (1989).

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 58 David Eddington

Table 4: The analogical set for balde from one of the runs from the Argentina data.

Word Diminutive type # Pointers # of instances Pointers x instances % of influence

Dulce -cito    . Golpe -cito    . Grande -cito    . Valse -cito    . Baile -cito    . Viaje -cito    . Bosque -cito    . Verde -cito    . Total -cito, baldecito . Sueldo -ito    . Caldo -ito    . Total -ito, baldito . Banda -ita    . Probada -ita    . Total -ita, *baldita .

Table 5: Correlations of model predictions and corpus percentages from Simulation 3.

Country AR BO CL CO CR CU DO EC ES GT HN r . . . . . . . . . . . Country MX NI PA PE PR PY SV US UY VE r . . . . . . . . . .

7 Discussion and conclusions

A model that combines memory and derivation is able to predict, to a high degree of accuracy, the kinds of variation in diminutive forms that occur across the Spanish speaking world. What is more, it does so by using real world variables such as phonemes and word frequency rather than more abstract units. Of course, one could counter that phonemes are abstract, but the Spanish alphabet is phonemic. The reality of phonemes is seen in literate speakers of languages with a phonemic writing system who have been shown to be able to manipulate phonemes in their language (Morais et al. 1986; Read et al. 1986). One may ask about the particular algorithm used in the simulations. Are the steps in AM assumed to have correlates in speaker’s minds? No. AM, along with other models of analogy (see Section 5.1) incorporate a way of demonstrating

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 59 how similarity to stored units can influence language processing. Each model achieves this in a unique way, but until we know more about the detailed workings of the human mind, it is impossible to compare them with the exact steps in a computer model. Suffice it to say that analogy is a productive way of explaining processes such as diminutivization. The findings presented in this paper suggest a number of areas that merit further investigation. Given the nature of the corpora the data come from, it is not possible to determine if the observed variation is due to dialectal differences or different usages by speakers of the same dialect. That is, some members of a community may produce all (or most) of the diminutives in the three groups discussed in the same manner (e. g., dientito, pueblito, cuentito), while others in thesamecommunitymayproducetheopposite(e.g.,dientecito, pueblecito, cuen- tecito). Another question that could not be addressed is whether individuals produce different diminutives of the same base word. When these kinds of variation exist, that raises suspicion that there may be other factors that govern it: social class, gender, educational level, age, etc. The literature is silent on this topic, but one piece of anecdotal evidence comes from a family of five who was asked the diminutive of leche. Although they were from the same country, were raised together, and learned their native tongue in each other’s company, three insisted it was lechecita while the remaining two were positive it was lechita. In like manner, there may be other characteristics of the words themselves that trigger variation within a community or an individual speaker. For instance, in words such as nieto some speakers pronounce the vowel sequence as a diphthong and others with hiatus (Hualde 2005). Perhaps this results in different diminutivization strategies for words of this sort. This is another hypoth- esis that warrants investigation.

Acknowledgments: Special thanks to Mark Davies, Theron Stanford, and Jonathan Junca for their help with some computational aspects of this paper as well as to Catie Ritchie and Ben Adamson for doing the tedious job of data entry. The input from the reviewers was invaluable in producing a much better research paper than it would have been otherwise.

References

Aha, David W., Dennis Kibler & Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6. 37–66. Ambadiang, Théophile. 1996. La formación de diminutivos en español: ¿Fonología o morfología? Lingüística Española Actual 18. 175–211.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM 60 David Eddington

Ambadiang, Théophile. 1997. Las bases morfológicas de la formación de diminutivos en español. Verba 24. 99–132. Bermúdez-Otero, Ricardo. 2007. Spanish pseudoplurals: Phonological cues in the acquisition of a syntax-morphology mismatch. Proceedings of the British Academy 145. 231–269. Boersma, Paul & Bruce Hayes. 2001. Empirical tests of the gradual learning algorithm. Linguistic Inquiry 32. 45–86. Bradley, Travis G. & Jason Smith. 2011. The phonology-morphology interface in Judeo-Spanish diminutive formation: A lexical ordering and subcategorization approach. Studies in Hispanic and Lusophone Linguistics 4. 247–300. Bybee, Joan. 2001. Phonology and language use. Cambridge: Cambridge University Press Callebaut, Sien. 2011. Entre sistematización y variación. El sufijo diminutivo en España y en Hispanoamérica. Gent, Belgium: University of Ghent MA thesis. Caramazza, Alfonso, Alessandro Laudanna & Cristina Romani. 1988. Lexical access and inflectional morphology. Cognition 28. 297–332. Castillo Valenzuela, Rosario & Rosa María Ortiz Ciscomani. 2013. Diminutivo y aspecto nominal en español. Revista de Humanidades 27. 155–172. Castro, Obdulia. 1998. La formación del diminutivo en español y en gallego: Procesos morfológicos simples; implicaciones teóricas complejas. In Andrés Acosta Félix, Zarina Estrada Fernández, Max Figueroa Estava & Gerardo López Cruz (eds.), IV Encuentro inter- nacional de lingüística en el noroeste, 135–159. Sonora, México: Universidad de Sonora. Chialant, Doriana & Alfonso Caramazza. 1995. Where is morphology and how is it represented? The case of written word recognition. In Laurie Beth Feldman (ed.), Morphological aspects of language processing,55–76. Hillsdale, NJ: Erlbaum. Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, Noam. 1968. Language and mind. New York: Harcourt Brace and World. Colina, Sonia. 2003. Diminutives in Spanish: A morpho-phonological account. Southwest Journal of Linguistics 22. 45–88. Crowhurst, Megan J. 1992. Diminutives and augmentatives in : A prosodic analysis. Phonology 9. 221–253. Davies, Mark. 2015. Introducing the 1.9 Billion Word Global Web-Based English Corpus (GloWbE). 21st Century Text 5. 21centurytext.wordpress.com/introducing-the-1-9-billion- word-global-web-based-english-corpus-glowbe Eddington, David. 2002. Spanish diminutive formation without rules or constraints. Linguistics 40. 395–419. Elordieta, Gorka & María M. Carreiras. 1996. An optimality theoretic analysis of Spanish diminu- tives. In Lise M. Dobrin, Kora Singer & Lisa McNair (eds.), Proceedings from the main session of the Chicago Linguistic Society’s32nd meeting,49–60. Chicago: Chicago Linguistic Society. Fontanella, Maria Beatriz. 1962. Algunas observaciones sobre el diminutivo en Bogotá. Thesaurus: Boletín del Instituto Caro y Cuervo, Bogotá 17. 556–573. Gaarder, Bruce A. 1966. Los llamados diminutivos y aumentativos en el español de México. Publications of the Modern Language Association of America 81. 585–595. Horcajada, Bautista. 1988. Morfonología de los diminutivos formados sobre bases consonánticos monosílabas. Filología Románica 5. 55–72. Hualde, José I. 2005. The sounds of Spanish. Cambridge: Cambridge University Press.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Dialectal variation in Spanish diminutives 61

Jaeggli, Osvaldo A. 1980. Spanish diminutives. In Frank Nuessel, Jr. (ed.), Contemporary studies in : Eighth annual linguistics symposium on Romance Languages, 145–158. Bloomington, IN: Indiana University Linguistics Club. Kager, René. 1999. Optimality theory. Cambridge: Cambridge University Press. Labov, Willliam. 1969. Contraction, deletion, and inherent variability of the English copula. Language 45. 715–762 Medin, Douglas L. & Marguerite M. Schaffer. 1978. Context theory of classification learning. Psychological Review 85. 207–238. Miranda, Inés Miranda. 1999. An optimality theoretic analysis of Nicaraguan Spanish dimin- utivization: Results of a field survey. Seattle, WA: University of Washington dissertation. Morais, José, Paul Bertelson, Luz Cary & Jesús Alegría. 1986. Literacy training and speech segmentation. Cognition 24. 45–64. Norrmann-Vigil, Ingrid. 2012. Accounting for variation of diminutive formation in Porteño Spanish. Mester 41. 99–122. Pierrehumbert, Janet. 2001. Exemplar dynamics: Word frequency, lenition, and contrast. In Joan Bybee & Paul Hooper (eds.), Frequency and the emergence of linguistic structure, 137–158. Amsterdam: John Benjamins. Popper, Karl R. 1968 [1959]. The logic of scientific discovery, 2nd ed. New York: Harper and Row. Prieto, Pilar. 1992. Morphophonology of the Spanish diminutive formation: A case for prosodic sensitivity. Hispanic Linguistics 5. 169–205. Prince, Alan & Paul Smolensky. 2004. Optimality theory: Constraint interaction in generative grammar. Malden, MA: Blackwell Publishing. Read, Charles, Zhang Yun-Fei, Nie Hong-Yin & Ding Bao-Qing. 1986. The ability to manipulate speech sounds depends on knowing alphabetic writing. Cognition 24. 31–44. Rojas, Nelson. 1977. Aspectos de la morfonología del diminutivo –ito. In François Lopez, Joseph Pérez, Noël Salomon & Maxime Chevalier (eds.), Actas del Quinto Congreso Internacional de Hispanistas, vol. 2, 743–751. Instituto de Estudios Ibéricos e Iberoamericanos: University of Bourdeaux. Schreuder, Robert & R. Harald Baayen. 1995. Modelling morphological processing. In Laurie Beth Feldman (ed.), Morphological aspects of language processing, 131–156. Hillsdale, NJ: Erlbaum. Sereno, Joan & Allard Jongman. 1997. Processing of English inflectional morphology. Memory and Cognition 25. 425–437. Skousen, Royal. 1989. Analogical modeling of language. Dordrecht: Kluwer. Skousen, Royal. 1992. Analogy and structure. Dordrecht: Kluwer. Skousen, Royal. 1995. Analogy: A non-rule alternative to neural networks. Rivista di Linguistica 7. 213–232. Smith, Jason Allen. 2011. Subcategorization and optimality theory: The case of Spanish diminutives. Davis, CA: University of California Davis dissertation. Stemberger, Joseph P. & Brian McWhinney. 1988. Are inflected words stored in the lexicon? In Michael Hammond & Michael Noonan (eds.), Theoretical morphology, 101–116. New York: Academic Press. Stephenson, Tamina. 2004. Declensional-type classes in derivational morphology: Spanish diminutives revisited. Manuscript, Massachusetts Institute of Technology.

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM Appendix 1. Proportion of each diminutive form from the corpus 62

Test Long Eddington David Word dim. /Total All AR BO CL CO CR CU DO EC ES GT HN MX NI PA PE PR PY SV US UY VE

viejo/a viejecito . . . . . . . . . . . . . . . . . . . . . . fiesta fiestecita . .  . . .  . . . .  .   . .  . .  . hierba hierbecita ..   ..   nieto/a nietecito . . . . . .   . . .  .   . .  . . . . pieza piececita . .  . .   . . . .  . .  .  . . .  . pierna piernecita . . . . . . . . . .   . .  .  .  .   quieto/a quietecito . . . . . .  . . . . . . . . . . . . . . . tiempo tiempecito . .  . .  . .  . .  .   . .  . .  . tienda tiendecita . . .  . . . . . . .  .  . . .  . .  . vuelta vueltecita . . . . . . . . . . . . . . . . . . . . . cuento cuentecito . . . . .  . . . . . . . . . . .  . .  . cuerpo cuerpecito . . . . . . . . . . . . . . . . . . . . . .

Authenticated |[email protected] hueco/a huequecito . .   .. .  . .  . .  .  . . .

Download Date|8/19/17 6:36PM hueso huesecito ...  . .   . .  . .  . huevo huevecito ..  .   .   .   .    . . . juego jueguecito .  . . . . . . . . .  .   .   . .   nuevo/a nuevecito . . . . . . . . . . . . .  . . . . . . . . pueblo pueblecito . . . . . . . . . . . . . . . . . . . . . . puerta puertecita . . . . . .  . . .  . .  . . . .  . . . rueda ruedecita . .  . . . . .  . .  . ..  .  . carne carnecita . . . . .. .  . .   .  .  .   diente dientecito . .  . .  .  . .   . .  . .  . . . . dulce dulcecito .. . . .  . .  ..  . .   leche lechecita .  . . . .   . ..   . .   suave suavecito . . . . .  .  . .   .   .  . . . .  indio/a indiecito . . . . . . . . . . .  . . . . . . . .  . limpio/a limpiecito . . . . .  .  . . . . . . . . .  . . . . novio/a noviecito . .  . . . . . . .   .  . .  . . . . . rubio/a rubiecito . . . . . . . . . . .  . .  .  . . . . . Shaded proportions are based on fewer than ten instances. AR Argentina BO Bolivia CL Chile CO Colombia CR Costa Rica CU Cuba DO Dominican Republic EC Ecuador ES Spain GT Guatemala HN Honduras MX Mexico ilca aito nSaihdiminutives Spanish in variation Dialectal NI Nicaragua PA Panama PE Peru PR Puerto Rico Authenticated |[email protected]

Download Date|8/19/17 6:36PM PY Paraguay SV El Salvador US United States UY Uruguay VE Venezuela 63 Appendix 2. Counts of each diminutive form derived from the corpus 64

Test Word All AR BO CL CO CR CU DO EC ES GT HN MX NI PA PE PR PY SV US UY VE Eddington David

viejo/a viejito                       viejo/a viejecito                       fiesta fiestita                       fiesta fiestecita                       hierba hierbita  hierba hierbecita  nieto/a nietito  nieto/a nietecito  pieza piecita  pieza piececita  pierna piernita                       pierna piernecita  quieto/a quietito 

Authenticated |[email protected] quieto/a quietecito 

Download Date|8/19/17 6:36PM tiempo tiempito                       tiempo tiempecito  tienda tiendita                       tienda tiendecita                       vuelta vueltita                       vuelta vueltecita                       cuento cuentito                       cuento cuentecito  cuerpo cuerpito  cuerpo cuerpecito                       hueco/a huequito                       hueco/a huequecito                       hueso huesito                       hueso huesecito  huevo huevito                     

(continued) (continued)

Test Word All AR BO CL CO CR CU DO EC ES GT HN MX NI PA PE PR PY SV US UY VE

huevo huevecito    juego jueguito                       juego jueguecito  nuevo/a nuevito                       nuevo/a nuevecito                       pueblo pueblito                       pueblo pueblecito                       puerta puertita  puerta puertecita  rueda ruedita                       rueda ruedecita  

carne carnita  diminutives Spanish in variation Dialectal carne carnecita  diente dientito  diente dientecito                       dulce dulcito  

Authenticated |[email protected] dulce dulcecito                      

Download Date|8/19/17 6:36PM leche lechita                       leche lechecita    suave suavito  suave suavecito                       indio/a indito  indio/a indiecito                       limpio/a limpito  limpio/a limpiecito                       novio/a novito  novio/a noviecito                       rubio/a rubito  rubio/a rubiecito  65 66 David Eddington

Appendix 3. Words in the test set DIPH nuevo FINAL -E nene f buena piedra aire noche bueno pierna ave nube ciego pieza baile padre cielo prueba balde parque cuello pueblo bebe m parte cuenta puerta bebe f pobre m cuento puesto bloque pobre f cuerno quieta borde postre cuero quieto bosque pote cuerpo rueda bote suave m cueva rueda cable suave f diego sueldo café tarde fierro suelta calle toque fiesta suelto carne torre fuego sueño chalé traje gruesa tiempo chiste trote hierba tienda coche verde m hueca tierna cofre viaje hueco tierno dulce m huella tuerca dulce f FINAL -IO/A huerta vieja frase india hueso viejo gente indio huevo viento golpe limpia juego vuelta grande m limpio lluvia grande f media luego DIPH & FINAL -E hombre novia miedo diente jefe novio mierda fuente leche rubia muerta fuerte m llave rubio muerto fuerte f mate tibia nieta mueble molde tibio nieto muerte monte nueva puente nene m

Authenticated | [email protected] Download Date | 8/19/17 6:36 PM