Entropic Evidence for Linguistic Structure in the Indus Script BREVIA
Total Page:16
File Type:pdf, Size:1020Kb
BREVIA contemporaneous with the Indus script) and Old Tamil (an alpha-syllabic script) and falls be- tween those for English words and for English Entropic Evidence for Linguistic characters. These observations are consistent with previous suggestions [e.g., (9)], made on the basis of the total number of Indus signs, that Structure in the Indus Script the Indus script may be logo-syllabic. The similar- Rajesh P. N. Rao,1* Nisha Yadav,2,3 Mayank N. Vahia,2,3 Hrishikesh Joglekar,4 ityinconditionalentropytoOldTamil,aDravidian R. Adhikari,5 Iravatham Mahadevan6 language, is especially interesting in light of the fact that many of the prominent decipherment he Indus civilization flourished ~2600 conditional entropy of Indus inscriptions from a efforts to date (9–11) have converged upon a to 1900 before the common era in what well-known concordance of Indus texts (8). proto-Dravidian hypothesis for the Indus script. Tis now eastern Pakistan and northwestern We found that the conditional entropy of Given the prior evidence for syntactic structure India (1). No historical information exists about Indus inscriptions closely matches those of lin- in the Indus script (9, 12), our results increase the the civilization, but archaeologists have uncov- guistic systems and remains far from nonlinguistic probability that the script represents language, com- ered samples of their writing on stamp seals, systems throughout the entire range of token set plementing other arguments that have been made sealings, amulets, and small tablets. The script sizes (Fig. 1A) (7). The conditional entropy of explicitly (13, 14)orimplicitly(15, 16) in favor of on these objects remains undeciphered, despite a Indus inscriptions is substantially below those of the linguistic hypothesis. number of attempts and claimed decipherments the two biological nonlinguistic systems (DNA 2 3 References and Notes ( ). A recent article ( ) questioned the assump- and protein) and above that of the computer pro- 1. A. Lawler, Science 320, 1276 (2008). tion that the script encoded language, suggesting gramming language (Fig. 1B). Moreover, this 2. G. L. Possehl, Indus Age: The Writing System (Univ. of instead that it might have been a nonlinguistic conditional entropy appears to be most similar to Pennsylvania Press, Philadelphia, 1996). symbol system akin to the Vinča inscriptions of those of Sumerian (a logo-syllabic script roughly 3. S. Farmer, R. Sproat, M. Witzel, Electron. J. Vedic Stud. southeastern Europe and Near Eastern em- 11, 19 (2004). 4. S. M. M. Winn, in The Life of Symbols, M. L. Foster and blem systems. We compared the statistical A 6 L. J. Botscharow, Eds. (Westview, Boulder, CO, 1990) on May 31, 2009 structure of sequences of signs in the Indus pp. 263–283. 5. J. A. Black, A. Green, Gods, Demons and Symbols of Ancient script with those from a representative group 5 of linguistic and nonlinguistic systems. Mesopotamia (British Museum Press, London, 1992). 6. C. E. Shannon, Bell Syst. Tech. J. 27, 379 (1948). Two major types of nonlinguistic systems 4 7. Materials and methods are available on Science Online. are those that do not exhibit much sequential 8. I. Mahadevan, The Indus Script: Texts, Concordance and structure (type 1 systems) and those that fol- 3 Tables (Archaeological Survey of India, New Delhi, 1977). low rigid sequential order (type 2 systems). 9. A. Parpola, Deciphering the Indus Script (Cambridge Univ. Press, Cambridge, 1994). For example, the sequential order of signs 2 10. I. Mahadevan, J. Tamil Stud. 2, 157 (1970). in Vinča inscriptions appears to have been Conditional Entropy 11. Y. V. Knorozov, M. F. Albedil, B. Y. Volchok, Proto-Indica: www.sciencemag.org unimportant (4). On the other hand, the se- 1 1979, Report on the Investigations of the Proto-Indian quences of deity signs in Near Eastern inscrip- Texts (Nauka Publishing House, Moscow, 1981). 12. K. Koskenniemi, Stud. Orient. 50, 125 (1981). tions found on boundary stones (kudurrus) 0 0 100 200 300 400 13. A. Parpola, Trans. Int. Conf. Eastern Stud. 50, 28 (2005). typically follow a rigid order that is thought 14. A. Parpola, in Airavati: Felicitation Volume in Honor of Number of Tokens to reflect the hierarchical ordering of the Iravatham Mahadevan (Varalaaru.com, Chennai, India, (ordered by frequency) – deities (5). 2008), pp. 111 131. B 1 15. N. Yadav, M. N. Vahia, I. Mahadevan, H. Joglekar, Int. J. Linguistic systems tend to fall somewhere Dravidian Linguist. 37, 39 (2008). 16. N. Yadav, M. N. Vahia, I. Mahadevan, H. Joglekar, Int. J. between these two extremes: The tokens of 0.8 Downloaded from a language (such as characters or words) do Dravidian Linguist. 37, 53 (2008). 0.6 17. This work was supported by the Packard Foundation, the not follow each other randomly nor are they Sir Jamsetji Tata Trust, and the University of Washington. juxtaposed in a rigid order. There is typically 0.4 We thank T. Sejnowski, M. Shapiro, and B. Wells for their some amount of flexibility in the ordering of comments and suggestions. 0.2 tokens to compose words or sentences. One Rel. Cond. Entropy Supporting Online Material way of quantifying this flexibility is to use 0 www.sciencemag.org/cgi/content/full/1170391/DC1 conditional entropy (6), which measures the Nonling DNA ProtSansk Eng Sumer Old Indus Eng Prog Nonling Materials and Methods type 1 (words) Tamil (chars) lang type 2 Fig. S1 amount of randomness in the choice of a Fig. 1. Conditional entropy of Indus inscriptions com- Table S1 token given a preceding token (7). References We computed the conditional entropies of pared to linguistic and nonlinguistic systems. (A)The conditional entropy (in units of nats) is plotted as a 30 December 2008; accepted 14 April 2009 five types of known natural linguistic systems function of the number of tokens (signs, characters, or Published online 23 April 2009; (Sumerian logo-syllabic system, Old Tamil 10.1126/science.1170391 words) ordered according to their frequency in the texts Include this information when citing this paper. alpha-syllabic system, Rig Vedic Sanskrit used in this analysis (7). (B) Relative conditional entropy alpha-syllabic system, English words, and (conditional entropy relative to a uniformly random se- 1Department of Computer Science and Engineering, University of English characters), four types of nonlinguis- quence with the same number of tokens) for linguistic Washington, Seattle, WA 98195, USA. 2Department of Astronomy tic systems (two artificial control data sets and nonlinguistic systems. Prot indicates protein se- and Astrophysics, Tata Institute of Fundamental Research, representing type 1 and type 2 nonlinguistic Mumbai 400005, India. 3Centre for Excellence in Basic Sciences, quences;Sansk,Sanskrit;Eng,English;Sumer,Sumerian; 4 systems as described above, human DNA Mumbai 400098, India. 14, Dhus Wadi, Laxminiketan, and Prog lang, programming language. Besides the sys- Thakurdwar, Mumbai 400002, India. 5The Institute of Mathema- sequences, and bacterial protein sequences), tems in (A), this plot includes two biological nonlinguistic tical Sciences, Chennai 600113, India. 6Indus Research Centre, and an artificially created linguistic system (the systems (a human DNA sequence and bacterial protein Roja Muthiah Research Library, Chennai 600113, India. computer programming language Fortran). We sequences), as well as Rig Vedic Sanskrit and a computer *To whom correspondence should be addressed. E-mail: compared these conditional entropies with the program in the programming language Fortran (7). [email protected] www.sciencemag.org SCIENCE VOL 324 29 MAY 2009 1165.