Coding Practices Used in the Project Optimal Typology of Determiner Phrases
Total Page:16
File Type:pdf, Size:1020Kb
Coding practices used in the project Optimal Typology of Determiner Phrases Gregory Garretson This document is one section of the coding manual used by the project Optimal Typology of Determiner Phrases, based at Boston University. It is being made available due to popular demand. If you have questions about the rest of the manual, or the contents of this section, please contact Gregory Garretson at [email protected] or [email protected]. Please cite this document as follows: Garretson, Gregory. 2004. Coding practices used in the project Optimal Typology of Determiner Phrases. Unpublished manuscript, Boston University, Boston, MA. http://npcorpus.bu.edu/html/documentation/index.html. Contents 5.1 General practices 5.2 Construction type 5.3 Which examples to count? 5.4 Expression type 5.5 Definiteness 5.6 Animacy 5.7 Weight 5.0 Applying tags to the examples This section of the coding manual details the policies we have adopted in adding tags to the examples, once they have been bracketed. Because of the wondrous variety of language, coding a corpus is not nearly as straightforward as we might like. This section will help you to make the often difficult decisions about which tags to apply when. Section 5.1 gives general guidance for coding; Sections 5.2 and on discuss various classes of tags in detail. Starting in Section 5.2, each section will begin with a list of the tags discussed in that section, so you can easily find the discussion of a given tag by looking at these lines. They look like this: [Tags discussed in this section: INCL, EXCL, PND, PWD, CMPD, IDIOM, NAME, PART, PART2, DINS, SORT] Please remember that in addition to the discussion in this section, there is a brief description of each tag given in Section 2. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5.1 General practices This section gives some general advice for coding. ------------------------------------- 5.1.1 When you are not sure It will frequently be the case that you come across an example you feel you will need to discuss with others. In such cases, there are two procedures you can follow. If you come across a very good example of something or other, or an example that you would like to discuss with others, or that you want to mark for any other reason, use the tag INTERESTING. This flags the example in a harmless way as something to look at again. Then, you can search for all examples with that tag to create a list of examples to review. If you are trying to code an example for, say, animacy, and you are not sure which tag to apply, rather than choosing one without confidence, use the "other" tag; in this case, it would be OANIM_H or OANIM_M. To date, there are "other" tags for expression type, definiteness, and animacy: OET_H, OET_M ODEF_H, ODEF_M OANIM_H, OANIM_M To review those examples you were uncertain about, you can simply search for all examples with "other" tags. The rationale for using "other" tags liberally is simple: It is much easier to reexamine all examples coded with OANIM_H and OANIM_M than to reexamine all the examples! Remember: When in doubt, choose "other"! ------------------------------------- 5.1.2 The importance of saving your work Because the coder does not make backups *while* you work, you should get into the habit of quitting the program fairly frequently (every half-hour or so is good), and then opening it again on the resulting file. This will make sure that you don't lose too much data in case your computer crashes, you lose your Internet connection, etc. See Sections 3.2 and 3.3 for more information about working with files. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5.2 Construction type [Tags discussed in this section: XS, OFX, BOAD] The easiest distinction to make is that between examples that receive the OFX tag and examples that receive the XS tag. Examples with "of_PREP" are coded as OFX, and examples with a prenominal possessor (be it a noun with "'s", a possessive personal pronoun such as "his", or another pronoun such as "whose") are coded as XS. This is done automatically by the program autocoder_ct, so you will not need to do it. Once this is done, there are several subordinate "construction type" tags that may need to be applied. Some of these can be added by autocoders, and others cannot. Section 5.3, "Which examples to count?" describes most of these tags. Of the subclasses of construction types identified so far, all result in the example being excluded from further coding, except for two: PWD and BOAD. The former is discussed in the next section, and the latter is discussed here. ------------------------------------- 5.2.1 "Boss of All Dogs" examples The tag BOAD stands for "Boss Of All Dogs", which, although cryptic, is our favorite example from this class and so, for lack of a better name, was chosen to metanymically represent the class. The BOAD examples have a very specific form: They are all OFX examples, and they all have in the Y position a singular count noun that is missing a (normally obligatory) determiner. They are found only in predicative and appositive positions. Here are some examples: the Rev. J. D. Wickham , <"headmaster of Burr and Burton Seminary"> L. C. Orvis , <"manager of the Western Union Telegraph Company"> Amos C. Barstow , <"ex-mayor of Providence"> he was chosen <"president of the meeting"> William Hartman Woodin , who was <"Secretary of the Treasury"> Oliver Herford , artist , author , and <"foe of stupidity"> Rob Roy remained <"boss of all the dogs"> There are also tendencies within the class that are not prerequisites for inclusion. First, the Y tends to be a noun representing a role in some type of institution, and the X tends to represent that institution. Note, however that this is not always the case. Second, the BOAD example usually occurs as an appositive, as seen in several of the examples above. BOAD examples are odd in that they are one of a few classes of noun phrases in which singular count nouns, which normally are required to have a determiner, are able to escape this restriction. We wish to keep track of these examples and see how they alternate with the XS form. Therefore, we do not exclude them from the set of counted examples. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5.3 Which examples to count? [Tags discussed in this section: INCL, EXCL, PND, PWD, CMPD, IDIOM, NAME, PART, PART2, DINS, SORT] All of the following assumes that you are familiar with the discussion in Section 4 of which examples to include and which ones to exclude in our corpus of examples. As explained there, only of-examples of the form "NP of NP" are included. "Non-coded examples" consisting of an of-phrase not following a noun (e.g., "sing of", "careful of") are excluded outright from our corpus of examples. However, even among the examples that are "NP of NP", there are some that we do not wish to include in our statistical analysis of linguistic variation. This is because they are not tokens that could have been realized in a different way--that is, they were not potential loci of variation at the time of speaking. Thus "The Man of La Mancha" could not be said "La Mancha's Man", if referring to a proper name. We include these examples in the corpus because while it is true that these "frozen" nominals could not have been produced differently by the speaker or writer, it is also true that, viewed diachronically, they could have been "frozen" in a different form. Thus, while we have proper names like "Schindler's List" and "The Double Life of Veronique", such names could frequently have been rendered differently (cf. "Veronique's Double Life", which, incidentally, is how the title comes out in Swedish). The same may apply to expressions that are "frozen" grammatically, or idioms ("for the love of God" vs. "for Pete's sake", etc.). Because we may wish to look at these examples in the future, we are not excluding them from our corpus. Rather, we are making a distinction between the examples we wish to count at this stage and the ones we do not by applying the tags "INCL" and "EXCL". The program "counter" counts only those examples marked with "INCL". As it happens, we do not have to apply these tags ourselves; they are automatically added on the basis of prior tags added by a human coder, which mark examples as belonging to certain classes of constructions. ALL EXAMPLES TO WHICH "EXCL" HAS BEEN APPLIED DO NOT NEED TO BE CODED FOR ANYTHING FURTHER! Because these constructions do not allow variation, we will not make further use of these examples in the present study. Therefore, we don't need to code them for definiteness, animacy, etc. Note that the autocoder will still do this, and we should leave those tags; however, you do not need to check them. To reiterate: you should not apply the tags EXCL and INCL; this is done automatically after you apply one of the tags discussed below. As mentioned in Section 5.2, all examples are automatically coded with the tags OFX or XS. Some examples will receive no further "construction type" tags. However, some examples will need to be tagged further with one of the following: PART ("partitive") PART2 ("partitive 2") DINS ("described instance") SORT ("sort") PWD ("preposition with determiner") PND ("preposition, no determiner") CMPD ("compound") IDIOM ("idiom") NAME ("name") The tag "EXCL" is applied by autocoder_rev when one of the following is found on an example: PND, CMPD, IDIOM, NAME, PART, PART2, DINS, SORT, or NREV.