<<

Testing for Types in Spanish: An Application of SAS to Corpus Linguistics Roberto Mayoral-Hernández & Lindsey Chen, University of Southern California

ABSTRACT The last decade has seen an increase in the use of technologies in linguistics research. The availability of online corpora and powerful statistical software tools have made it possible to handle large amounts of textual information and crucially, to shed some light on certain linguistic theories. Within this paper, SAS was utilized to study the relation between verb types and subject position in Spanish. Applying SAS procedures to data obtained from CREA (Corpus de Referencia del Espanol Actual), we show that transitive, unergative and unaccusative can be successfully differentiated by looking at the position of their subject. Furthermore, we conclude from our analysis that verbs of light and sound emission (VLSEs) should be, contra Mendikoetxea (1999), classified as unaccusative verbs rather than unergative verbs.

INTRODUCTION The literature (Hale & Kaiser 2002, Levin 1993) recognize three basic types of verbs: transitive, unergative and unaccusative. Transitive verbs are those that take two arguments - a subject (S) and a direct (DO); see (1a). Typically, the subject of transitive verbs has agentive properties. It is an because it functions as the initiator or causer of the action described by the verb. The direct object, on the other hand, has patient properties. It is the theme or causee that is affected by the main action described by the verb (see Dowty 1991 for further discussion on the properties of the agent and patient thematic-roles). The second type of verbs is the , which occurs only with a single - the subject. Intransitives can be further differentiated between two kinds: unergative and unaccusative. Whereas unaccusatives select a subject with theme properties (1b), unergatives take agentive subjects (1c).

(1) a. John (S) eats apples (S). b. The train (S) has already arrived. c. John (S) laughed happily.

The distinction between unaccusatives and unergatives has led some (cf. Perlmutter 1978) to state that the subject of unaccusative verbs is an underlying object. Still, this is still an assumption. In spite of the abundant research on the subject (cf. Levin & Rappaport 1995 and the references therein), there is no universally accepted test for unaccusativity or unergativity. This is particularly true for languages that do not mark the distinction with specific morphology such as Spanish. In this paper, we show, via statistical analyses with SAS, subject position in Spanish can successfully differentiate between the three verb types. We then apply this test to the controversial class of verbs of light and sound emission (VLSEs) to see whether they should be classified as unaccusatives or unergatives.

ON SUBJECT POSITION IN SPANISH The goal of this research is to demonstrate that a quantitative analysis of the position of overt subjects constitutes an indicator of the types of verb in Spanish. In order to proceed with the present discussion, it is first necessary to describe the properties of Spanish subjects, as well as the possible positions in which they can occur. Spanish is a null-subject language, which means that subject is not obligatory. When subjects appear overtly, they must agree in number and person with the verb they modify. This agreement relationship is marked by overt verbal and nominal morphology. No other argument agrees with the verb. As far as subject position is concerned, agreeing subjects can appear in preverbal or postverbal position, irrespective of verb type. The following examples were extracted from the online corpus Corpus de Referencia del Español Actual (CREA). Subjects and verbs were underlined and coded explicitly to facilitate the comprehension of the examples provided (S = subject, V = verb).

(2) Preverbal subject with : En 1896, Picasso (S) llega (V) a Barcelona. “In 1896, Picasso arrives in Barcelona”

(3) Postverbal subject with unaccusative verb: En febrero llega (V) el grupo Queen a la Argentina (S). ‘In February the band Queen arrives in Argentina”

(4) Preverbal subject with unergative verb: El minero (S) trabaja (V) durante toda la noche y… “The miner works during the whole night and…”

1

(5) Postverbal subject with unergative verb: En el coro trabaja (V) un maestro de tracerías y taraceas (S). “In the choir works a master of tracery and marquetry”

(6) Preverbal subject with : Un niño chino (S) empuja (V) su carrito en la ciudad de Kunming (S). “A Chinese child pushes his cart in the city of Kunming”

(7) Postverbal subject with transitive verb: Sólo empuja (V) el libro desde el borde de la mesa, aquel que tensiona la espalda (S) “Only the one that bends his back pushes the book from the table end”

These examples show that the three types of verbs studied - transitive, unergative and unaccusative- can appear with preverbal and postverbal subjects. This ordering is not completely random, and there is ongoing research that shows that this distribution is closely related to syntactic, semantic, sociolinguistic and pragmatic factors (cf. Mayoral Hernández (in press)).

VERB TYPES Another goal of the study is to resolve the controversy with the so-called verbs of light and emission. As illustrated earlier, intransitive verbs can be split in different types. Specifically, Perlmutter (1978) divided intransitive verbs into unaccusatives and unergatives, depending on their syntactic and semantic properties. He noted that unergatives, but not unaccusatives, can appear in the impersonal passive construction. Therefore participation in the impersonal passive construction was taken as a test for unaccusativity. The following examples were extracted from Perlmutter (1978).

(2) Impersonal passive with unergative verb: Er wordt hier veel geskied. (Dutch) “It is skied here a lot”

(3) Active sentence (a) and impersonal passive (b) with an unaccusative verb: a. Dat blok hout heeft goed gebrand. (Dutch) “That block of wood burned well” b. *Er werdt door dat blok hout goed gebrand. “By that block of wood it is burned well”

Later, with respect to the unaccusatives in Italian, Burzio (1981, 1986) came to the realization that unaccusative and unergative verbs can be differentiated by using different syntactic tests, including auxiliary selection and ne-cliticization. As with Italian, there have been several tests aimed at differentiating unergative and unaccusative verbs in Spanish. However, some of the least controversial tests, like auxiliary selection, cannot be applied to Spanish, since only haber “have” can be used as an auxiliary in compound forms. Mendikoetxea (1999) discussed some tests that can be used to differentiate between unergative and unaccusative verbs in Spanish. She explains that unaccusative verbs, but not unergative, can head absolute participial clauses, frequently participate in the causative alternation and allow for bare postverbal subjects. However, even though the classification of the most studied verbs is clear, there have been many exceptions to these tests, causing them to become a source of controversy in the field. Among the controversial class of verbs is the VLSEs, which have been both classified as unaccusative (by Perlmutter) and unergative (by Mendikoetxea). The conclusion of our statistical analyses shows that Perlmutter is on the right track.

STATISTICAL ANALYSIS WITH SAS Data obtained from the corpus CREA was coded and then analyzed using SAS. Specifically, we applied cross-tabulations to 450 tokens. The variables taken into consideration are subject position (preverbal and postverbal) and type of verb (unergative, transitive, unaccusative). The independent variable is the subject position and the dependent variable is the type of verb. Below is a sample code that creates the dataset:

Data verbs1; Input position $ type $ count @@; Datalines; Post unerg 6 post cop 12 post trans 23 post unacc 25 Pre unerg 43 pre cop 66 pre trans 177 pre unacc 49 ;

2 Then, SAS PROC FREQ PROCEDURE is applied to perform the Chi-square tests proc freq data=verbs1; table position*type/expected chisq nocol norow nopercent; weight count; title 'Verb types and subject distribution'; run;

EXPERIMENT 1: VERB TYPES As suggested earlier, subject position might be a good indicator of verb class in Spanish since it may be able to differentiate between unergative and unaccusative verbs. The idea behind this is that the subject of unaccusative verbs will have a higher percentage of postverbal subjects, since those subject are similar to objects in their semantic (patient properties) and syntactic properties). Our first object is to test if transitive, unergative and unaccusative verbs have a unique subject distribution.

RESULTS Table 1 shows that there is a statistically significant difference (p = .0001) between transitive and unergative verbs as far as subject position is concerned. While both types of verbs clearly display a higher percentage of preverbal subjects, this preference is much more salient in the case of transitive verbs.

Table of position by type

position type

Frequency‚ Expected ‚cop ‚trans ‚unacc ‚unerg ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ post ‚ 12 ‚ 23 ‚ 25 ‚ 6 ‚ 66 ‚ 12.838 ‚ 32.918 ‚ 12.18 ‚ 8.0648 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ pre ‚ 66 ‚ 177 ‚ 49 ‚ 43 ‚ 335 ‚ 65.162 ‚ 167.08 ‚ 61.82 ‚ 40.935 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 78 200 74 49 401

Statistics for Table of position by type Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 3 20.4289 0.0001

In Table 2, a comparison of unergative and unaccusative verbs also indicates a statistically significant difference (p < .0001) between them. Unaccusative verbs show no clear preference for either preverbal or postverbal subjects while unergative verbs appear more frequently with preverbal subjects.

3 Table of position by type

position type

Frequency‚ Expected ‚trans ‚unerg ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ post ‚ 5 ‚ 32 ‚ 37 ‚ 19.443 ‚ 17.557 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ pre ‚ 160 ‚ 117 ‚ 277 ‚ 145.56 ‚ 131.44 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 165 149 314

Statistics for Table of position by type Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 25.6291 <.0001

Therefore, there seems to be enough evidence to conclude that a quantitative analysis of subject position in Spanish can successfully differentiate between unergative and unaccusative verbs when different factors are controlled.

Experiment 2: Verbs of Light and Sound Emission in Spanish We have already observed the difficulties that one might find when trying to classify VLSEs as either unaccusative or unergative. This difficulty has been highlighted by the lack of agreement between researchers dealing with this issue. Perlmutter classifies them as unaccusative because they have non- volitional subjects with patient properties and because they cannot appear in the impersonal passive construction (see example (3) with the VLSE burn). Levin and Rappaport, on the other hand, treat them as unergative, because they select the auxiliary “have” for compound forms in languages like Dutch, and this feature is associated with unergativity in many languages. Mendikoetxea (1999) adopts Levin and Rappaport’s classification for these verbs in Spanish, although there is no reliable syntactic or morphological test for unaccusativity in Spanish. To shed some light on the right classification of emission verbs in Spanish, we added 274 sentences more to the corpus used in the previous experiment. The methodology was similar to the one used in experiment 1. The sentences were also extracted from CREA and SAS was used to realize the relevant statistics. The new sentences contained the verbal forms resuena “it resounds” and resuenan “they resound”. The 3rd person plural form was added to increase the number of tokens; there were not many sentences containing the singular form resuena in the corpus studied. The research hypothesis of this new test is that if emission verbs are unergative, they will show a higher percentage of preverbal subjects, but not if they are unaccusative.

RESULTS Table 3 compares the behavior of VLSEs that display the locative alternation, containing an argument PP, with non alternating cases, illustrated in (17). The p value (p = .384) indicates that there is no statistically significant difference between verbs with and without explicit alternation with respect to subject placement. Therefore, the 274 sentences were recoded together for the remaining analysis.

(17) aún resuenan los improperios por una parte y las alabanzas por otra ‘the insults on the one hand and the praises on the other hand still resound’

4 Table of position by type

position type

Frequency‚ Expected ‚loc ‚noloc ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ post ‚ 81 ‚ 56 ‚ 137 ‚ 84.5 ‚ 52.5 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ pre ‚ 88 ‚ 49 ‚ 137 ‚ 84.5 ‚ 52.5 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 169 105 274

Statistics for Table of position by type Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.7566 0.3844

Table 4 indicates that, with respect to subject position, there is no significant difference (p=0.400) between the two groups of verbs – VLSEs and unaccusative verbs.

Table of position by type

position type

Frequency‚ Expected ‚unacc ‚vlse ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ post ‚ 62 ‚ 137 ‚ 199 ‚ 66.01 ‚ 132.99 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ pre ‚ 74 ‚ 137 ‚ 211 ‚ 69.99 ‚ 141.01 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 136 274 410

Statistics for Table of position by type Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.7082 0.4000

By contrast, when comparing unergative verbs with VLSEs (table 5), the result shows that VLSEs and unergative verbs have a strong correlation with subject distribution (p = .000). In fact, while unergative verbs clearly prefer preverbal subjects (78.5%), this preference is not observed with VLSEs, whose syntactic subjects show no preference for either preverbal (50%) or postverbal positions (50%). Hence, we conclude that emission verbs should not be described as unergative in Spanish.

5 Table of position by type position type

Frequency‚ Expected ‚unerg ‚vlse ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ post ‚ 32 ‚ 137 ‚ 169 ‚ 59.53 ‚ 109.47 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ pre ‚ 117 ‚ 137 ‚ 254 ‚ 89.47 ‚ 164.53 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 149 274 423

Statistics for Table of position by type Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 32.7312 <.0001

CONCLUSION We have seen that the subject position test can successfully differentiate between the three different verb classes. Our test has interestingly contributed to the obscure distinction between intransitive verbs in Spanish. The use of the subject position test also proved that verbs of light and sound emission pattern together with unaccusative verbs, showing similar percentages of preverbal and postverbal subjects.

REFERENCES Burzio, Luigi. 1981. Intransitive Verbs and Italian Auxiliaries. Ph. D. thesis, MIT, Massachusetts. Burzio, Luigi. 1986. Italian syntax. Dordrecht: Kluwer Academic Publishers. Dowty, David. 1991. “Thematic Proto-roles and argument selection”, Language, 67:3.547–619 Hale, Ken & Samuel Jay Keyser. 2002. Prolegomenon to a theory of Argument Structure. Cambridge, Massachusetts: MIT Press Levin, Beth. 1993. English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press Levin, Beth & Malka Rappaport-Hovav. 1995. Unaccusativity. At the Syntax- Interface. Cambridge, Mass: MIT Press. Mayoral Hernández, Roberto. in press. “A typological approach to the ordering of adverbials: weight, argumenthood and EPP”. International Journal of Basque Linguistics and Philology. Mendikoetxea, Amaya. 1999. “Construcciones inacusativas y pasivas”. Gramática Descriptiva de la Lengua Española, ed. by Ignacio Bosque and Violeta Demonte, vol II, 1575-1629. Madrid: Espasa. Perlmutter, David. 1978. “Impersonal passives and the unaccusative hypothesis”. Proceedings of the 4th annual meeting of the Berkeley Linguistics Society, ed. by Jeri J. Jaeger, Anthony C. Woodbury, Farrell Ackerman, Christine Chiarello, Orin D. Gensler, John Kingston, Eve E. Sweetser, Henry Thompson, and Kenneth W. Whitler, 157-189. Berkeley: Berkeley Linguistics Society. Real Academia Española: Banco de datos (CREA) [online]. Corpus de referencia del español actual. Sorace, Antonella. 2000. “Gradients in auxiliary selection with intransitive verbs”. Language 76.859-890.

ACKNOWLEDGMENTS We would like to thank the invaluable contributions and comments made by Carmen Silva-Corvalán, María Luisa Zubizarreta, Jean Roger Vergnaud, Emily Hinch, Ana Sánchez Muñoz and Rachael Sills.

CONTACT INFORMATION Roberto Mayoral Hernández University of Southern California [email protected]

Lindsey Chen University of Southern California [email protected]

6