CHAPTER 2 Basic English Concepts

This chapter discusses some of the basics of , which are relevant for the understanding of language analysis. Every language defines certain basic alphabets, words, word categories and language formation rules called grammar rules. These categories are made according to their role in parts of speech. From the language analysis point of view, the style of a language must be concretely defined to design a working parser for that language. Though there is no hard and fast rule to name the formal categories, but it is customary to give various parts of speech their traditional names. The set of grammatical categories (like noun, , etc.) which are taught in English literature are very informal and are not precisely defined as formal grammar. In addition to this, there are many more distinctions that have to be made in a real parser. Hence, it is evident that for language processing using computer, the grammar writer should very clearly understand the basic word categories of any language, types of words and other constituents of the language and the process in which they interact with each other. In linguistic analysis, Chomsky has done pioneer work in 1960s. He has formally defined various grammars, types of grammars, features and characteristics of grammars. These are described in detail later in this chapter. As a result of Chomsky’s work on transformational generative grammar, a vast amount of fairly descriptive linguistic analysis is carried out, and as a result of it, a large repository of terminology has grown up, which augments informal set of old fashioned terms. Now let us describe elementary terminology of English grammar.

2.1 FUNDAMENTAL TERMINOLOGY OF ENGLISH GRAMMAR The well-accepted English grammar terminology defines the following word categories:

(i) Noun: Traditionally, noun is considered a naming word. Formally, it is defined as “the name of a person place or thing”. However, noun can also be

Chapter-2.p65 23 10/11/2010, 10:15 AM Basic English Concepts 25

Sometimes, an adverb modifies the quality of even the complete sentence or phrase. For example, consider the following sentences: 4. Probably you are wrong. (modifies one complete sentence) 5. I will not read all through this book. (modifies a phrase) (ix) : It is a word, which specifies quality of noun. It is a describing word. It can be attached to a noun to modify its meaning or it can be used to assert some attribute of the subject of sentence, e.g., blue, large, fake, main, etc. (x) Verb phrase: A verb along with its object constitutes a verb phrase, e.g., she gave flower to the teacher.

2.2 SENTENCE A group of words which make a complete sense, is called a sentence. A sentence is created by joining the words according to grammar rules, for example, Adwet is a good boy. The sentences are of four types. (i) Assertive : Those which make statements or assertion; as Humpty dumpty sat on a wall (ii) Interrogative: Those which ask questions; as Where do you live? (iii) Imperative: A sentence that expresses a command or an entreaty e.g., Be quiet. (iv) Exclamatory sentence: A sentence that expresses a strong feeling is called exclamatory sentence. e.g., How cold the night is! What a shame!

2.2.1 Parts of the Sentence A sentence is divided into two parts, subject and predicate. Subject is the part which names the person or thing we are speaking about. The part which tells something about the subject is called predicate. Normally in a sentence the subject comes before the predicate. A sentence is made up of various constituents, these are known as parts of speech. These constituents are made according to their work in sentence. These parts of speech are: (i) Noun (ii) Adjective (iii) Pronoun (iv) Verb (v) Adverb (vi) Preposition

Chapter-2.p65 25 10/11/2010, 10:15 AM 26 Natural Language Processing

(vii) Conjunction (viii) Interjection

Basic terminology of English grammar is described as above. Now we discuss some other details of these constituents. Noun: The noun is of the following types: (i) Common noun: It is a name given in common to every person or thing of the same class or kind. (ii) Proper noun: Is the name of a particular person, place or thing. (iii) Collective noun: Is the name of a number (or collection) of persons or things taken together and spoken of as one whole, e.g., crowd, mob, team, flock, herd, army, etc. (iv) Abstract noun: Is usually the name of a quality action or state considered apart from the object to which it belongs as quality: goodness, kindness, whiteness, etc. (v) Countable nouns: Are the names of objects, that we can count, e.g., book, pen. (vi) Uncountable nouns: Are the names of things which we cannot count, e.g., milk, oil, sugar etc. Adjective: A word used with a noun to describe or point out the person, animal, place, thing with the noun names or to tell the number of quantity is called adjective. The adjective can be of following types: 1. of quality or descriptive adjective: It shows the kind or a quality of a person or thing. For example, he is an honest man. 2. Adjective of quantity: It shows how much of a thing is meant. I ate some rice. 3. Numeral adjective: Shows how many persons or things are meant, e.g., The hand has five fingers. Few cats like cold water. 4. Demonstrative adjective: It points out which person or thing is meant. As this boy is stronger than Harry and those mangoes are sweet. 5. Interrogative adjective: As, what manner of man is he? Which way shall we go? 6. Emphasizing adjective: The adjective used to emphasize some concept, e.g., I saw it with my own eyes. 7. Exclamatory adjective: The words used to show exclamation, e.g., what a genius! What folly! What an idea!

Adjectives can have degrees. The degrees mentioned quantity of the concept indicated by adjective. There can be three degrees. Positive degree, comparative degree, superlative degree. The positive degree is simple form of adjective. The comparative degree is used to indicate comparison between the concepts. And the superlative degree is highest degree of quality, e.g., strong, stronger, strongest.

Chapter-2.p65 26 10/11/2010, 10:15 AM Basic English Concepts 27

Article: The words a, an and the are called articles. They come before a noun. A and an are indefinite articles because these usually leave indefinite the persons or thing spoken of, as a doctor, an orange: “The” is called definite article because it normally points to some particular person or thing. Pronoun: A word that is used instead of noun is called pronoun. The pronouns can be of various types. Personal pronoun like, I, we, he, she, it, they, you. They indicate the personal category. The persons can be of three types. 1st person, 2nd person and 3rd person. Verb: A word that tells or asserts something about a person or thing. For example, Harry laughs, the clock strikes. The can be of two types. Types of verbs: Transitive and intransitive verbs. Transitive verb is a verb which denotes an action which passes over from the subject to an object. The intransitive verb is a verb which denotes an action which does not pass over to an object or which expresses a state or being. For example, he ran a long distance. Most transitive verbs take a single object. But such transitive verbs as give, ask, offer, promise, tell, etc. take two objects after them, an indirect object which denoted the person to whom something is given or for whom something is done, and a direct object which is usually the name of something, for example, His father gave him (indirect) a watch (direct). He told me (indirect) a secret (direct). Most verbs can be used both as transitive and intransitive verbs. It is therefore, better to say that a verb is used transitively or intransitively rather than that is transitive or intransitive. Some verbs, e.g., come, go, fall, die, sleep, lie, denote actions which cannot be done to anything, they can therefore never be used transitively.

2.3 ACTIVE AND PASSIVE VOICE Voice is the form of verb which shows whether whatever is denoted by the subject does something or has something done on it. Active and passive are two methods of framing an English sentence. They uses different types of verbs. In active voice the verb form shows that the person or thing denoted by the subject does something or we can say is doer of the action. e.g., Ram helps Hari. The active voice is so called because the person denoted by the subject acts. A verb is in passive voice when its form shows that something is denoted to the person or thing denoted by the subject, e.g., Hari is helped by Ram. The passive voice is so called because the person or thing denoted by the subject is not active but is passive, that is, suffers or receives some action.

Chapter-2.p65 27 10/11/2010, 10:15 AM 28 Natural Language Processing

Some sentences in active and passive form are given below: (i) (a) Sita loves Savitri. (b) Savitri is loved by Sita. (ii) The mason is building a wall. A wall is being built by the mason. (iii) The peon opened the gate. The gate was opened by the peon. The sentences represented by active and passive voice convey the same semantic meaning, hence, in the context of natural language processing there are grammars (namely transformational grammars) which convert a sentence represented in active voice to passive voice. It should be noted that when the verb is changed from the active voice to the passive voice, the object of the transitive verb in the active voice becomes the subject of the verb in the passive voice. When verbs that take both direct and indirect objects in active voice are changed to passive voice, either object may become a subject of the passive verb, while the other is retained. An indirect object denotes the person to whom or for whom something is done, while a direct object usually denotes a thing.

2.4 TENSES Tense is the concept which indicates about ‘time’. In literature, there are three demarcations done on timing template. (i) The time which is presently going (or present). (ii) The time which is before the present or the time which has passed (past). (iii) The time which will come after the present or the time which has not yet arrived, (future) to represent these three timing categories, language incorporates the concept of ‘tenses’. The tense of a verb shows the time of an action or an event. Corresponding to three categories there are three tenses. These are present tense, past tense and future tense. In English different verb categories represent these tenses. A verb that refers to present time is said to be in present tense. A verb that refers to past time is said to be in past tense, and a verb that refers to future time is said to be future tense. For example, see the following examples: (i) I write this letter to please you. (ii) I wrote the letter in his very presence. (iii) I shall write another letter tomorrow.

While performing the language analysis these verb forms of tenses are utilized to find the timing of the event. However, there are many variations of these verb forms in English language. Sometime a past tense may refer to present time, and a present tense may express a future time. For example,

Chapter-2.p65 28 10/11/2010, 10:15 AM Basic English Concepts 29

I wish, I knew the answer. (This sentence is equivalent to the saying that I am sorry I don’t know the answer. It is past tense, present time). Let’s wait till he comes (present tense – future degree)

Below we give the chief tenses (active voice, indicative mood) of the verb to love. Present tense Singular number Plural number 1st person I love We love 2nd person You love You love 3rd person He loves They love

Past tense Singular number Plural number 1st person I loved We loved 2nd person You loved You loved 3rd person He loved They loved

Future tense Singular number Plural number 1st person I shall/will love We shall/will love 2nd person You will love You will love 3rd person He will love They will love

In English language each tense is further divided into four categories, namely, simple present, present continuous, present perfect, present perfect continuous. See the following sentences: 1. I love (Simple present) 2. I am loving (Present continuous) 3. I have loved (Present perfect) 4. I have been loving (Present perfect continuous)

Verb in all of these sentences refers to the present time, and are therefore said to be in the present tense. In sentence 1, however, the verb shows that action is mentioned simply without anything being said about the completeness or incompleteness about the action. In sentence 2, the verb shows that action is mentioned as incomplete or continuous, that is, it is still going on. In sentence 3, the verb shows that the action mentioned as finished, complete or perfect, at the time of speaking. The tense of verb in sentence 4 is said to be present perfect continuous because the verb shows that the action is going on continuously and not completed at this present moment.

Chapter-2.p65 29 10/11/2010, 10:15 AM 30 Natural Language Processing

Thus, we see that the tense of a verb shows not only the time of an action or event, but also the state of an action referred to. Just as the present tense has four forms, the past tense has also following four forms: 1. I loved (Simple past) 2. I was loving (Past continuous) 3. I had loved (Past perfect) 4. I have been loving (Past perfect continuous)

Similarly, the future tense has the following four forms: 1. I shall/will love (Simple future) 2. I shall/will be loving (Future continuous) 3. I shall/will have loved (Future perfect) 4. I shall have been loving (Future perfect continuous)

According to English sentence formation rules, a verb agrees with its subject in number and person. There are different verb forms corresponding to different number and person. This requirement of type matching corresponding to number and person is utilized in language analysis to find out whether a sentence a syntactically valid or not. Besides the main verbs in English language, there are certain verbs which are known as auxiliary verbs. The verbs be (am, is, was, etc. have and do, when used with ordinary verbs to make tenses, passive forms, questions and negatives, are called auxiliary verbs. The verbs can, could, may, might, will, would, shall, should, must, and ought are called modal verbs. They are used before ordinary verbs and express meaning such as permission, possibility, certainty and necessity. Need and dare can sometimes be used like modal verbs.

2.4.1 Conjugation of the Verb Any language has a well-defined syntax of lexicons. The conjugation of a verb shows various forms it can assume either by or by combination with parts of other verbs, to mark voice, mood, tense, number, and person and to those must be added its infinitives and participles. Below is given the complete conjugation of verb ‘love’. (i) Tenses Simple present Active Passive I love I am loved You love You are loved He loves He is loved They love They are loved

Chapter-2.p65 30 10/11/2010, 10:15 AM Basic English Concepts 31

Present continuous Active Passive I am loving I am being loved You are loving You are being loved He is loving He is being loved We are loving We are being loved They are loving They are being loved

Present Perfect Active Passive I have loved I have been loved You have loved You have been loved He has loved He has been loved They have loved They have been loved

Present Perfect continuous Active Passive I have been loving —————— You have been loving —————— We have been loving —————— They have been loving ——————

Simple past Active Passive I loved I was loved You loved You were loved He loved He was loved They loved They were loved

Past continuous Active Passive I was loving I was being loved You were loving You were being loved He was loving He was being loved They were loving They were being loved

Past perfect Active Passive I had loved I am loved You had loved You are loved He was loved He is loved They had loved They had been loved

Chapter-2.p65 31 10/11/2010, 10:15 AM Basic English Concepts 33

(iii) Non-finites Present infinitive to love to be loved Continuous infinitive to be loving —————— Perfect participle to have loved to have been loved Present participle loving being loved Perfect participle having loved having been loved

2.5 ADVERB Words which modify meaning of a verb, an adjective, or another adverb and tells the quality of the verb are known as adverbs. e.g., quickly, very, and quite are adverbs in the following sentences: (i) Rama runs quickly. (ii) This is very sweet mango. (iii) Govind reads quite clearly.

Adverbs can be of the following types: (i) Adverb of time: It indicates the time, (which shows when). (ii) Adverb of frequency (which shows how often) (iii) Adverb of place (which shows where) (iv) Adverb of manner (which shows how or in what manner) (v) Adverb of degree or quantity (vi) Adverb of affirmation or negation (vii) Adverb of reason

Besides these, there are many cue phrases like however, anyway which mark the change of theme in the discourse. These have special significance in the linguistic analysis. It is used to analyze the theme of discourse.

2.6 DICTIONARY FEATURES We all know that dictionary is something that provides definition of words. From computer storage viewpoint how definitions are stored in it differ in some sense. This definition of word from the viewpoint of storage in computer database is important for linguistic analysis and it is this definition we will describe in this chapter.

The definition of word: It is defined as word. (Category root related features) The main objective of defining a word here is that they should provide everything that might help in parsing and understanding the sentence. Obviously, a sentence contains different parts of speech, so accordingly there comes a need to categorize the words into categories like noun, pronoun, etc.

Chapter-2.p65 33 10/11/2010, 10:15 AM 34 Natural Language Processing

In general, words are categorized into the following categories: (1) Articles (2) Nouns (3) Pronouns (4) Verbs (5) Adverbs (6) Adjectives (7) Prepositions (8) Conjunctions (9) Numbers (10) Punctuation marks (11) Whwords

Let us discuss these categories in little bit detail from lexicon storage point of view.

Articles It contains only three words a, an, the. The dictionary definition of ART looks like: A (ART A), AN (ART AN), THE (ART THE)

Nouns These are classified as animate or inanimate. These are further classified into singular and plural. The inanimate nouns are further classified into categories like place, conveyance, time, objects, etc. and the animates are further classified into male and female categories. Some examples of words are as follows: RAM (NOUN RAM ANIMATE MALE SINGULAR) BOY (NOUN ANIMATE MALE SINGULAR) CAR (NOUN CAR CONVEYANCE SINGULAR) RESTAURANT (NOUN RESTAURANT PLACE SINGULAR) SUNRISE (NOUN SUNRISE TIME SINGULAR)

Pronouns As such the pronouns have got maximum number of categories. First criterion for classification is person, based on this classification the categories are, first person, second person, and third person. Further criteria are number, gender and role. Some examples of pronouns are as follows: HE (PRONOUN HE THIRD PERSON MALE SINGULAR NOMINATIVE) THEY (PRONOUN THEY THIRD PERSON MALE FEMALE NEUTER PLURAL NOMINATIVE) YOU (PRONOUN YOU SECOND PERSON MALE FEMALE NOMINATIVE ACCUSATIVE SINGULAR PLURAL) I (PRONOUN I FIRST PERSON MALE FEMALE SINGULAR NOMINATIVE)

Chapter-2.p65 34 10/11/2010, 10:15 AM 36 Natural Language Processing

Numbers Number can also appear in the sentence and a peculiar feature about them is that they have got two representations, one in figures while other in words. An example dictionary of entry of number words may be: SIX (NUMBER SIX) TWENTY (NUMBER TWENTY)

Structure of Dictionary The dictionary should be structured so as to retrieve the definition as quickly as possible, i.e., the search time should be reduced to minimum. One possible method to reduce the search time is discussed below: (i) Break up the whole dictionary according to the first alphabet of the word. This way we will have 26 sublists of dictionary. (ii) If we have just 1000 words in dictionary, there will be on an average 40 words per list requiring less time for searching.

Lexicon serves the purpose of providing tokens to the parser. The words along with their definition remain stored in the dictionary. The dictionary specifies for each word, its part of speech, any non-default value for its features, and presumably something about its meaning. However, in English, as in all other languages, individual words, often can be given different prefixes and suffixes, for example, word “love” can appear in different guises, such as “loves”, “ loved”, “loving”, “unloving”, etc. all of these words have one basic word and various other derived forms. From computer storage point of view, it will be wasteful if dictionary had to include all of these. The better approach would be to have the lexicon use explicit knowledge of the structure of words (their morphology) and have it figure out when a word is simply a variant of one that is already in the dictionary. However, it needs to be mentioned that how to generate these variations of words. A care must be taken in generating these patterns, e.g., an error can be reported in the following: “Kiss” —Æ “kis” + “s” to some degree, such mistakes can be prevented by installing more stringent checks on which endings are allowed in which circumstances. For example, a singular noun ending in “s” will never form its plural by adding “s”, but rather by adding “es”. It also helps to first ensure that a word is already not in the dictionary before attempting to remove the “ endings” and the end product, after the lexicon has removed all the supposed endings, is itself in the dictionary. If such procedure is stored in the lexicon, then the dictionary can be made reasonably compact, as the morphological unit will take care of standard examples. Furthermore having default values for features will mean that for the root words, like singular nouns, the dictionary need not even indicate that the word is singular, since this is the default case. Such routines are called

Chapter-2.p65 36 10/11/2010, 10:15 AM 38 Natural Language Processing

name of the concept, information about the deep case structure of the concept and default values for those cases. For example, the case structure of the concept like “drink” might include the cases: agent object and instrument. The cases are meant to account for the fact that the concept ‘drink’ includes an agent who performs the action’, ‘an object that is drunk’ and sometimes’ an instrument that is used to ‘aid in drinking’. So, given the event description “Jatin drank a can of beer”, the agent of the action is Jatin, the object is beer, and the instrument is “can”. The default values associated with case instruments are meant to be used as tool for rejecting aberrant interpretations of text, whereas the text “Jatin has a coke. He drank.” is interpreted to mean that “Jatin drank a coke”, the text “Jatin drank a kite. He drank”, is not interpreted to mean “Jatin drank a kite”, since the default value for the object of a drinking event is ‘liquid’ and a kite is not a type of ‘liquid’. The case structure and the default and the default values associated with each case are stored in a 3 tuple. These are sometimes called templates for the event/state concept. The structure of the template is:

[TEMPLATE event/state- concept-name list-of-default-values-pairs] the template for the concept drink is : [TEMPLATE drink ((obj liquid) (instr container)( agt animal1) …..)] in principle, the dictionary contains the case structure for each of the concepts in its dictionary, but in practice it is not necessary. Since currently NEXUS does not use the dictionary to parse sentences, its only use for the case information is to constrain the text interpretation process. Consequently, the default values are added to the dictionary on the “need to” basis. The case relations used in NEXUS are primarily derived from Simmons but also include pieces of the case systems of Fillmore, etc.

Chapter-2.p65 38 10/11/2010, 10:15 AM