<<

e-Content Submission to INFLIBNET

Subject name: Paper name: Introduction to Linguistics

Principle Investigator Prof. Pramod Pandey Centre for Linguistics, SLL&CS, Jawaharlal Nehru University, New Delhi 110067 Phone: 011-26704226 (O), M- 9810979446 Email: [email protected]

Paper Coordinator Prof. RaghavachariAmritavalli Professor, English and Foreign University, Hyderabad – 500 007 amritavalli@.com +91-40-27097512, 9490148757

Module Id Lings_P1_M26 Module name Historical Perspectives on : Language families

Content Writer Anish Koshy Email id [email protected] Phone +914027689643 Content Reviewer Prof. Aditi Mukherjee Professor (Retd), Department of Linguistics, Osmania University, Hyderabad 500007

Module 26: Language Families

1. Language families: an introduction

What dowe mean when we talk of family relationships among languages? After all, our common experience tells us that family relationships can be postulated only for living organisms. We are familiar with the concept of genealogies drawn for families, where we place ourselves in a family tree which also contains our siblings, our parents, their siblings, their children, their parents and the like. By postulating that there is something like language families, what historical linguists and linguists in general are beginning to argue is that language is much more than a mere means of communication. Language has to be elevated to the level of an organism, part of our ecology, part of our environment. It is as natural a being as any other found in our natural surroundings. Like all things in our natural surroundings, languages as natural organisms also go through the stages of birth, growth and death. The idea of language families arises from the fact that one can relate languages by postulatingthat certain languages should have originated from a common ancestor or a parent language, called the protolanguage.

2. The discovery of Indo-European

Curiosity aboutthe history of words is quite common in people. In fact, the serious studyof familial relationships between languages began with a very accidental discovery about shared word origins by a British Orientalist, Sir William Jones. William Jones was in charge of the Asiatic Society headquartered in Kolkata.He beganstudying Sanskrit during his time in India, and was astonished tofind a number of similarities inthe roots of words in Sanskrit, Greek and .

Let us look at some wordsi from these three classical languages. Even a cursory glance at the sets of words below should impress on you the similarities amongthese languages::

Sanskrit Greek Latin Meaning 1. as-ti es-ti es-t ‘3SG’ 2. s-anti (h)enti s-unt ‘3PL’ 3. yugam zugon iugum ‘yoke’ 4. daśa deka decem ‘ten’ 5. aṣṭau oktō octō ‘eight’ 6. pitā patēr pater ‘father’ 7. nábhas nephélē nebula ‘fog/mist’ 8. bhrātā phrātēr frāter ‘brother/clansman’ 9. rudhirás eruthrós ruber ‘red/bloody’ 10. vásati hestía vesta ‘lives/hearth’ 11. ájras agrós ager ‘plain/field’

In a now very famous address to the Society in 1786, William Jones presented his findings in the following words:

The Sanscrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident; so strong indeed that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists…ii

Jones’ address in 1786 comparing these languages of antiquity has had a significant influence in the history of linguistics. It provided the necessary blueprint for the discipline of Comparative-Historical linguistics, whose serious engagement in systematically studying languages by comparison, has helped in the establishment of many language families, starting with the Indo-European family of languages. It also marked the beginning of a systematic comparative Indo-European linguistics, which “became the most thoroughly investigated area of historical and comparative linguistics and which to the present day has remained the most important source for our understanding of linguistic change.iii”

3. Language families around the world

As noted earlier, many language families have been established by painstaking work undertaken by historical linguists over the years. The largest (in terms of the number of speakers) among the families, is the one that was also the first to be established – the Indo-European family. The size of this family can be inferred from the family tree given below (Figure 1). This is a family that includes languages like English, Hindi, French, Spanish, Russian, and many others, spoken globally by more than a billion people across all the continents. With estimates of around 45% of global population speaking one of the Indo-European languages as a -tongue, this is the largest of the language families known to us today.

INSERT Figure 1 (Indo European Tree).png

Figure 1: The Indo-European familyiv

There are many other language families that have been postulated over the years. The classification and organization of language families is a constant challenge, as there is no one view on either the member languages of a family or the number of families to be postulated. This leads to different scholars coming up with different numbers of language families and different possible members for each family.

The following image (Figure 2) gives us a good idea of the spread of the different linguistic families around the globe. As the figure shows, the Indo-European languages are the most wide-spread of all families, spoken in almost all the continents. However, in terms of the number of languages in a family, it is not the Indo-European family that tops the list. According to scholars v , the top five families in descending order of membership of languages would be the following, with the number in the referring to the estimated number of languages within the family:

 Niger-Congo (1545) – spoken in Sub-Saharan Africa  Austronesian (1257) – spoken in Oceania, Madagascar,  Trans-New Guinea (480) – spoken in Papua New Guinea and neighbouring islands  Sino-Tibetan (460) – spoken in and Southeast Asia  Indo-European (445) – spoken in Europe, Southwest to South Asia, , North America, South America, Oceania, South Africa

While some families represented in the map below are enormous in size, some others are very small, as can be seen in the following list, with the number of languages within the family vi represented in the :

 Yukaghir (2) – spoken in  Zamucoan (2) – spoken in Paraguay  Aymaran (3)– spoken in Bolivia and Peru  Tsimshian (3) – spoken in Canada  Barbacoan (4)– spoken in Colombia and Ecuador  Jivaroan (4)– spoken in Peru and Ecuador

 Yanomaman (4)– spoken in Venezuela and Brazil

INSERT Figure 2 (Language families).png

Figure 2: Prominent language families around the worldvii

4. Language isolates

While linguists have been able to establish familial or genetic relationships between a vast majority of languages spoken around the globe today, there are still cases of certain languages whose relationship with any other language is still not established. They are sometimes postulated as unitary language families, that is, language families which just consist of that one language. Such languages are sometimes referred to as isolates, or language isolates or isolated languages. As the name suggests, these are languages to which no other language has been found to be genetically related. The Indian subcontinent is home to two of these language isolates, , spoken in modern Pakistan, and Nahali, spoken in Central India. Some other very famous language isolates include the , the , Japanese, the which was spoken in the ancient , and many other extinct languages whose records have been found but which cannot be related to any of the existing languages.

Does the existence of languageisolates that cannot be related to any of the existing languages discredit our attempt to build language families? Does it discredit the attempt to trace all existing languages to one common source? Not really. Language isolates are not languages without any ancestors, but languages whose ancestors cannot be traced. This is because in the long history of the evolution of human language, many languages have been lost or completely transformed. Biologists us that the evolution of human language is closely connected to biological and cognitive changes that separate Homo sapiens from its predecessors, with the larynx having “evolved over a period of 300 million yearsto facilitate the production of sound at the expense of respiratory efficiencyviii”Most linguists accept that human speech dates back to at least 10000 years. But if one is to accept, that the first who migrated out of Africa, already had a language, then the evolution of human language will go back to at least 50-60000 years. It is possible in this long period, for many intermediary stages of languages as well as protolanguages to have completely died out without leaving any traces, resulting in languageisolates, as we know them now.

5. Language families of India/South Asia

India is one of the most linguistically diverse places in the world.Let invite you to guess how many languages are spoken in India. You would be amazed that the 2001 Census of India report, which is the last officially released data on the languages of India, recorded that there are 1635 mother tongues in Indiaix! Of these 22 are listed in Schedule VIII of the Indian constitution, and are used as Official languages in different states. These languages belong to different language families, including:

o Indo-European o Dravidian o Sino-Tibetan o Austroasiatic o Andamanese group of languages.

The Indo-European family is represented by the Indo-Aryan subgroup (see Figure 3), spoken in Pakistan, Nepal, Bangladesh, Sri Lanka and in the northern, western and eastern regions of India, and includes languages like:

o Hindi-Urdu (in India and Pakistan)

o Panjabi (in India and Pakistan) o Bangla (in India and Bangladesh) o Nepali (in India and Nepal) o Marathi, Rajasthani, Konkani, Oriya, Assamese, Kashmiri, and others in different in India o Sinhala (in Sri Lanka)

INSERT Figure 3 (Indo-Aryan).jpeg

Figure 3: The Indo-Aryan languagesx

In terms of the number of speakers, the Indo-Aryan subgroup is the largest /subgroup in the Indian sub-continent. Of the 22 languages listed in the Schedule VIII of the Indian Constitution, 15 belong to this family/subgroup.

The Dravidian family (see Figure 4) is found only in the Indian sub-continent, and is represented by languages like:

o Tamil, Malayalam, Telugu, and others in the South of India o Malto and Kurukh in the North of India o Gadaba, Gondi, and others in Central provinces of India o Brahui in Pakistan

INSERT Figure 4 (Dravidian).png

Figure 4: The Dravidian Languagesxi

The Dravidian family is the only language family which is found only in the Indian sub-continent. Though there have been attempts in the past to explore its relationship with other families outside the sub-continent, they have largely been unsuccessful.

The Sino-Tibetan family which includes within it the world’s most spoken language,, is represented in the Indian sub-continent by the Tibeto-Burman subgroup (see Figure 5) and includes languages like:

o Mizo and others in Mizoram o Meitei, Tangkhul Naga, Kuki, Hmarand others in Manipur o Monpa, Miri, Adi, Nisi, and others in Arunachal Pradesh o Garo in Meghalaya o Sema, Tenyidie, Ao, Lotha, and others in Nagaland o Bodo, Rabha and others in Assam o Kokborok and others in Tripura o Ladakhi in Jammu and Kashmir o Dzongkha, and others in Bhutan o Tamang, Lepchaand others in Nepal

INSERT Figure 5 (Tibeto-Burman).jpeg

Figure 5: The Tibeto-Burman Languagesxii

The Tibeto-Burman subgroup is the most diverse of all language families/subgroups in South Asia, and in terms of the number of languages, is the largest subgroup in the sub-continent.

The (see Figure 6) are spoken in the eastern states of Jharkhand, Bihar, West Bengal, Orissa and Chhattisgarh, and in Meghalaya in the North-east and in the Andaman and Nicobar islands, and are represented by languages like:

o Munda sub-group o Santali, Mundari, Ho, and others in Jharkhand, Bihar,Chhattisgarh and West Bengal o Sora, Kharia,Gtaʔ, and others in Orissa and Chhattisgarh o Mon-Khmer sub-group o Khasi, Pnar, Amwi, and others in Meghalaya o Nicobarese in the Andaman and Nicobar islands

INSERT Figure 6 (Austroasiatic).jpeg

Figure 6: The Austroasiatic Languages

The Munda subgroup of the Austroasiatic family is spoken only in India, where as the Mon-Khmer subgroup, except the Khasian languages, is mostly found outside of India in Southeast Asia.

The Andamanese group of languages (see Figure 7) spoken in the Andaman Islands by the Jarawa, Sentinelese, Onge and the Great Andamanese tribes, may belong to multiple families and their connection with any other languages elsewhere is not known.

INSERT Figure 7 (Andamanese).jpeg

Figure 7: The Andamanese Languagesxiii

The are among the oldest languages spoken anywhere in the world. They are spoken by the indigenous people residing in the Andaman Islands, whose population has severely decreased over the years, which has led to the extinction of many of these languages. In the figure above (Figure 7), those languages marked with [Ø] are already extinct.

6. Methods of establishing family relationships

Family relationships are established primarily by two methods: (a) the comparative reconstruction method, and (b) the internal reconstruction method. Both these methods attempt to reconstruct ancestral languages based on evidence available in the daughter languages. We will now briefly look into these methods.

6.1 The Comparative method

If two languages are related, then they must definitely have something in common. There must be shared properties that have survived the pressures of language change that each of these languages must have been to. Historical linguists believe that what languages share is what they have inherited from their ancestor, that is, their mother language. These shared properties could be words, as well as grammatical rules (mainly phonological rules).

The comparative method begins the process of identifying shared vocabulary that must have had a common origin by collecting a set of words that are . Cognates are words in two or more languages that share a sound-meaning correspondence by virtue of having inherited the word from the same parent language. For example, the forms ‘bhrātar’ in Sanskrit, ‘phrātēr’ in Greek, and ‘frāter’ in Latin all mean the same, that is, ‘brother’. They are cognates. They are words that have the same meaning,and almost the same pronunciation, in different languages.

The principle of comparative reconstruction of cognates is that the relationship between the sound and the meaning of a word is arbitrary. That is, there is nothing inherent in the sound of a word that

gives it its meaning. Therefore, it cannot be merely a matter of chance that different languages contain words that sound the same, for the same or very similar meanings: that is, words that show the same,or very similar, sound-meaning relationship. This similarity is taken to be a sign that these languages have all inherited the word from a common ancestor language. (Moreover, as we explain below, any differences in the sound of these words, i.e. pronunciation, can be explained by natural processes of sound change that occur in languages over time.)

Phonologists tell us that languages can undergo systematic changes in their sound systems. Linguists are thus able to establish phonological rules that explain the minor differences in pronunciation between words. Sound changes are regulated by a systematic set of rules. For example, in the cognate words ‘bhrātar,’ ‘phrātēr’ and ‘frāter,’ meaning ‘brother’ in Sanskrit, Greek and Latin, respectively, we see that /bh/ in Sanskrit appears as /ph/ in Greek. This correlation should and will extend to all such cases where Sanskrit has a /bh/ and Greek a /ph/; and all exceptions should also be explained by phonological rules.

Once all such sound correspondences are established, then based on phonological principles, words are reconstructed as they must have been in the ancestral language, going back to the establishment of phonological, morphological as well as syntactic rules. This is possible only when a set of sister languages or a parent language isavailablefor comparison. In their absence, the linguist has to rely on internal reconstruction, which we discuss a little later in the module.

In identifying cognates, wemust take care not to compare merely borrowed words. The comparison of borrowed words and identifying them as cognates would give us false results of familial membership.For example, we have words like ‘apple’, ‘computer’, ‘bus’ in Malayalam and English. This does not make English and Malayalam belong to the same family, as it can be clearly established that Malayalam has borrowed these words from English! We know this, firstly, because these two languages are in contact now and were so historically, and, secondly, because the words that appear to be shared are often words that are part of the cultural milieu today. It is important that only words inherited from an ancestral language are compared across languages, and not words borrowed from one language into another. To make this distinction, we commonly use the concept of a basic vocabulary. This is a concept that we discuss next.

6.1.1 The notion of basic vocabulary

One of the most popular methods of studying family relationship between languages is to compare and study basic vocabulary items. Basic vocabulary is generally taken to refer to common body parts, kinship terms,and common objects of the natural world or household; as well as the lower numbers commonly used in counting. The reason why the words for these objects or concepts are called “basic” is that every linguistic community, including speakers of the earliest languages in the earliest human societies, must have had such words: for example, words forfamily relationships like mother, father, brother and sister; body parts such as the handor the leg; the numbers one, two, and three;words for wood and fire, and so on. (Even today, the so-called primitive societies would have words for such things.)These words relate to everyday aspects of human life in early societies.Their importance in the study of language families lies in the fact that they are unlikely to arise out of borrowing through language contact, in order to meet immediate or current needs, and therefore, can be used for comparison.

6.1.2 Subgrouping: shared innovation vs. shared retention:

Language family trees can be complicated, just as human genealogical family trees can be. It could happen that the original source of a set of languages is so ancient that there are intermediary stages within the family. That is, on present analysis, one finds that a certain set of languages seem to be closer to each other than another set of languages, although we have reasons to think that all these languages belong to the same family. These smaller sets of languages which are closer to each other are postulated to be subgroups within the family. Parallel to human genealogy, it is like a

grandmother with her own daughters and each of the daughters with their own set of daughters. But we also need to know, how these intermediary stages are established.

How do we know that one set of languages is closer to each other than to another set of languages, within the same family? The notion of subgrouping is about grouping a set of languages into smaller subgroups within a larger family. For example, if you look at Figure 8below, you will notice that the Indo-European family tree is organized in terms of many such subgroups, like Anatolian, Indo- Iranian, Celtic, Germanic, and others. We have the Indo-Aryan subgroupof India,on the Indo-Iranian sub-branch. All subgroups contain multiple languages, which are closer to each other than they are to languagesthat belong to another subgroup.

INSERT Figure 8 (subgrouping).png

Figure 8: Subgroups within Indo-European

Similarly, within the Austro-Asiatic family (Figure 6), we have the Munda and Mon-Khmer subgroups. Two important methods are often discussed with respect to subgrouping – shared retention and shared innovation.

Shared retention refers to those linguistic features which are retained in the daughter languages, by virtue of being found in the proto mother language. The daughter languages are all found to have retained these features within themselves, without muchchange. Shared retention is not very useful for establishing subgroups, because one cannot be certain that the sharing is due to two languages being closely related to each other; each of them could have individually retained the shared feature from the shared ancestor. On the other hand, the other criterion for subgrouping, shared innovation, is a very important aspect of language subgroups.

Shared innovation refers to certain linguistic features which are found in only a small set of languages within the larger family. It is assumed that a set of languages have these linguistic features becauseone of the intermediary languages, that is, one of the daughters of the grandmother, must have

 changed,  individually innovated, and then  passed on the change to her daughters.

In such a scenario, when we look at the current languages, we find a small set of languages that have the new features, and others that do not. These are not features that were there in the protolanguage. They are features developed at an intermediary stage of development. The set of languages that have these features would then be grouped together to form a distinct subgroup.

Let us look at an example from the . Dravidian languages are generally organized in four subgroups:

 South Dravidian I  South Dravidian II  Central Dravidian, and  North Dravidian

Scholars xivnote that except the South Dravidian I subgroup, all the other subgroups have lost the Proto Dravidian particles [–e:], [-o:] because they have independently innovated to form using other particles. Similarly, the use of [ta:n] ‘self’ as an emphatic particle is only

observed in the South Dravidian I subgroup. This is an innovation in this subgroup, as the proto- Dravidian emphatic particle [-e] is used in all subgroups including the South Dravidian I subgroup.

6.2 The Internal Reconstruction method

Internal reconstruction is a method of reconstructing an ancestral language based on evidence found in a single language. This is in clear contrast to the comparative method of reconstruction that involves at least two languages. Internal reconstruction is based on the principle that every change in a language leaves behind some trace of the old form, generally in the form of irregularity in the application of rules, and/or in the form of allomorphs or stylistic variations. This method can work only as long as the assumed changes are regular. If certain changes were abrupt with no traces left behind, internal reconstruction would not be possible. This method is useful in the study of language isolates and in the reconstruction of their histories.

For internal reconstruction to work, the recovery of thelinguistic environment of change is a crucial factor. Universal phonological principles like intervocalic voicing (voiceless sounds becoming voiced between two ), (an oral becoming nasalized before a nasal sound), etc., are principles that are believed to have been as operational in the ancient languages as they are today, and these principles are therefore useful tools in internal reconstruction.

One might for example, examine alternating forms like:

/wife-wives/, /knife-knives/, /life-lives/

But, the contrast with:

/strife-strifes/, /safe-safes/

One would then internally reconstruct the phonological form of the root for the words with the /f-v/ alternation in terms of phonological principles, like intervocalic voicing , and explain the lack of such alternation in other forms on the basis of other historical reasons, which are not necessarily phonological, like borrowing from another course, etc. This kind of internal reconstruction is possible only if a set of words can be found which show systematic and conditioned alternations.

7. The Stammbaum model or the Family Tree model

This is a standard means of representing genetic relationships, both in the linguistic as well as in the biological sciences. In linguistics, this tree model has, as its lowest node, individual languages. These may be either the modern languages in the family, or the last of the descendant languages, which are no longer extant. All the family trees we have presented to you in this module represent the Stammbaum model.

The intermediary nodes represent various subgroups within the family. These subgroups are important stages in the history of the language, when a daughter language underwent significant changes which it passed on to its daughters but did not share with its own sisters. They represent changes not found in the parent language, thus showing how some languages within the family are closer to each other than the rest. Thus, Hindi, Bangla and Marathi are closer to each other than any of them are to English, German, Greek or Russian, all of which belong to the same family, but to different subgroups of the Indo European family of languages.Figure 8 above, was one such representation of subgroups within the Indo-European family.

8. Conclusion:

Studying family relationships among languages gives us deep insights into not only the history of languages but also many a time into the cultural and material history of populations. Establishing

genetic links between languages provides anthropologists, evolutionary biologists and population geneticists with very useful insights into the of modern day genetically mixed, culturally homogenized populations. It has also been a great source of information on population migrations and settlements.

Notes: i Data taken from Hock, 1986:564, 593; Lehmann 1973: 81 ii From Jones’ ‘Third Anniversary Discourse on the Hindus’’ iii Hock, 1986: 556 iv "IndoEuropeanTree" by HP1740-B (uploader/minor editor) - Near exact replica of :Image:IndoEuropeanTree.svg. Some edits by HP1740 changing the history of Low-Franconian . Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:IndoEuropeanTree.png#mediaviewer/File:IndoEuropeanTree.png v The estimates on the size of the family in terms of the number of languages is taken from the data available online at The (2014) vi The estimates on the size of the family in terms of the number of languages is taken from the data available online at The Ethnologue (2014) vii Image taken from < http://upload.wikimedia.org/wikipedia/commons/e/ed/Primary_Human_Language_Families_Map.png> viiiMalmkjaer, K. 1991, quoting Lieberman, P. 1984. ix “General note”, available online at < http://www.censusindia.gov.in/Census_Data_2001/Census_Data_Online/Language/gen_note.html > x Adapted from the Indo-European tree ("IndoEuropeanTree" by HP1740-B (uploader/minor editor) - Near exact replica of en:Image:IndoEuropeanTree.svg. Some edits by HP1740 changing the history of Low-Franconian dialects. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:IndoEuropeanTree.png#mediaviewer/File:IndoEuropeanTree.png) xi "ClassDravidian" by Dravidianhero - Own work. Licensed under Creative Commons Zero, Public Domain Dedication via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:ClassDravidian.png#mediaviewer/File:ClassDravidian.png xii Adapted from the Sino-Tibetan tree ("SinoTibetanTree". Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:SinoTibetanTree.svg#mediaviewer/File:SinoTibetanTree.svg) xiii Taken from Endangered languages of the Andaman Islands, by AnvitaAbbi, 2006: pg 7 xiv From The Dravidian languages, by Bh. Krishnamurti, pg. 415-418