Decipherment

Total Page:16

File Type:pdf, Size:1020Kb

Decipherment Decipherment Kevin Knight USC/ISI 4676 Admiralty Way Marina del Rey CA 90292 [email protected] Abstract oped by Alan Turing and Warren Weaver. We de- scribe recently published work on building auto- The first natural language processing sys- matic translation systems from non-parallel data. tems had a straightforward goal: deci- We also demonstrate how some of the same algo- pher coded messages sent by the en- rithmic tools can be applied to natural language emy. This tutorial explores connections tasks like part-of-speech tagging and word align- between early decipherment research and ment. today’s NLP work. We cover classic mili- Turning back to historical ciphers, we explore a tary and diplomatic ciphers, automatic de- number of unsolved ciphers, giving results of ini- cipherment algorithms, unsolved ciphers, tial computer experiments on several of them. Fi- language translation as decipherment, and nally, we look briefly at writing as a way to enci- analyzing ancient writing as decipher- pher phoneme sequences, covering ancient scripts ment. and modern applications. 1 Tutorial Overview 2 Outline The first natural language processing systems had 1. Classical military/diplomatic ciphers (15 a straightforward goal: decipher coded messages minutes) sent by the enemy. Sixty years later, we have many 60 cipher types (ACA) more applications, including web search, ques- • Ciphers vs. codes tion answering, summarization, speech recogni- • tion, and language translation. This tutorial ex- Enigma cipher: the mother of natural • plores connections between early decipherment language processing research and today’s NLP work. We find that – computer analysis of text many ideas from the earlier era have become core – language recognition to the field, while others still remain to be picked – Good-Turing smoothing up and developed. We first cover classic military and diplomatic 2. Foreign language as a code (10 minutes) cipher types, including complex substitution ci- Alan Turing’s ”Thinking Machines” phers implemented in the first electro-mechanical • Warren Weaver’s Memorandum encryption machines. We look at mathematical • tools (language recognition, frequency counting, 3. Automatic decipherment (55 minutes) smoothing) developed to decrypt such ciphers on Cipher type detection proto-computers. We show algorithms and exten- • Substitution ciphers (simple, homo- sive empirical results for solving different types of • ciphers, and we show the role of algorithms in re- phonic, polyalphabetic, etc) cent decipherments of historical documents. – plaintext language recognition We then look at how foreign language can be how much plaintext knowledge is ∗ viewed as a code for English, a concept devel- needed 3 Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 3–4, Sofia, Bulgaria, August 4-9 2013. c 2013 Association for Computational Linguistics index of coincidence, unicity dis- 7. Undeciphered writing systems (15 minutes) ∗ tance, and other measures Indus Valley Script (3300BC) – navigating a difficult search space • Linear A (1900BC) frequencies of letters and words • ∗ Phaistos disc (1700BC?) pattern words and cribs • ∗ Rongorongo (1800s?) EM, ILP, Bayesian models, sam- • ∗ pling 8. Conclusion and further questions (15 min- – recent decipherments utes) Jefferson cipher, Copiale cipher, ∗ civil war ciphers, naval Enigma 3 About the Presenter Application to part-of-speech tagging, • Kevin Knight is a Senior Research Scientist and word alignment Fellow at the Information Sciences Institute of the Application to machine translation with- University of Southern California (USC), and a • out parallel text Research Professor in USC’s Computer Science Parallel development of cryptography Department. He received a PhD in computer sci- • and translation ence from Carnegie Mellon University and a bach- Recently released NSA internal elor’s degree from Harvard University. Profes- • newsletter (1974-1997) sor Knight’s research interests include natural lan- guage processing, machine translation, automata 4. *** Break *** (30 minutes) theory, and decipherment. In 2001, he co-founded Language Weaver, Inc., and in 2011, he served 5. Unsolved ciphers (40 minutes) as President of the Association for Computational Zodiac 340 (1969), including computa- Linguistics. Dr. Knight has taught computer sci- • tional work ence courses at USC for more than fifteen years and co-authored the widely adopted textbook Ar- Voynich Manuscript (early 1400s), in- • tificial Intelligence. cluding computational work Beale (1885) • Dorabella (1897) • Taman Shud (1948) • Kryptos (1990), including computa- • tional work McCormick (1999) • Shoeboxes in attics: DuPonceau jour- • nal, Finnerana, SYP, Mopse, diptych 6. Writing as a code (20 minutes) Does writing encode ideas, or does it en- • code phonemes? Ancient script decipherment • – Egyptian hieroglyphs – Linear B – Mayan glyphs – Ugaritic, including computational work – Chinese N¨ushu, including computa- tional work Automatic phonetic decipherment • Application to transliteration • 4.
Recommended publications
  • The Origin of the Alphabet: an Examination of the Goldwasser Hypothesis
    Colless, Brian E. The origin of the alphabet: an examination of the Goldwasser hypothesis Antiguo Oriente: Cuadernos del Centro de Estudios de Historia del Antiguo Oriente Vol. 12, 2014 Este documento está disponible en la Biblioteca Digital de la Universidad Católica Argentina, repositorio institucional desarrollado por la Biblioteca Central “San Benito Abad”. Su objetivo es difundir y preservar la producción intelectual de la Institución. La Biblioteca posee la autorización del autor para su divulgación en línea. Cómo citar el documento: Colless, Brian E. “The origin of the alphabet : an examination of the Goldwasser hypothesis” [en línea], Antiguo Oriente : Cuadernos del Centro de Estudios de Historia del Antiguo Oriente 12 (2014). Disponible en: http://bibliotecadigital.uca.edu.ar/repositorio/revistas/origin-alphabet-goldwasser-hypothesis.pdf [Fecha de consulta:..........] . 03 Colless - Alphabet_Antiguo Oriente 09/06/2015 10:22 a.m. Página 71 THE ORIGIN OF THE ALPHABET: AN EXAMINATION OF THE GOLDWASSER HYPOTHESIS BRIAN E. COLLESS [email protected] Massey University Palmerston North, New Zealand Summary: The Origin of the Alphabet Since 2006 the discussion of the origin of the Semitic alphabet has been given an impetus through a hypothesis propagated by Orly Goldwasser: the alphabet was allegedly invented in the 19th century BCE by illiterate Semitic workers in the Egyptian turquoise mines of Sinai; they saw the picturesque Egyptian inscriptions on the site and borrowed a number of the hieroglyphs to write their own language, using a supposedly new method which is now known by the technical term acrophony. The main weakness of the theory is that it ignores the West Semitic acrophonic syllabary, which already existed, and contained most of the letters of the alphabet.
    [Show full text]
  • The Cretan Script Family Includes the Carian Alphabet
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Journal Articles Computer Science and Engineering, Department of 2017 The rC etan Script Family Includes the Carian Alphabet Peter Z. Revesz University of Nebraska-Lincoln, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/csearticles Revesz, Peter Z., "The rC etan Script Family Includes the Carian Alphabet" (2017). CSE Journal Articles. 196. https://digitalcommons.unl.edu/csearticles/196 This Article is brought to you for free and open access by the Computer Science and Engineering, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in CSE Journal Articles by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. MATEC Web of Conferences 125, 05019 (2017) DOI: 10.1051/ matecconf/201712505019 CSCC 2017 The Cretan Script Family Includes the Carian Alphabet Peter Z. Revesz1,a 1 Department of Computer Science, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA Abstract. The Cretan Script Family is a set of related writing systems that have a putative origin in Crete. Recently, Revesz [11] identified the Cretan Hieroglyphs, Linear A, Linear B, the Cypriot syllabary, and the Greek, Old Hungarian, Phoenician, South Arabic and Tifinagh alphabets as members of this script family and using bioinformatics algorithms gave a hypothetical evolutionary tree for their development and presented a map for their likely spread in the Mediterranean and Black Sea areas. The evolutionary tree and the map indicated some unknown writing system in western Anatolia to be the common origin of the Cypriot syllabary and the Old Hungarian alphabet.
    [Show full text]
  • Royal Inscriptions from Persepolis in Electronic Form Matthew W
    oi.uchicago.edu THE ORIENTAL INSTITUTE NEWS & NOTES SPRING 1998 ©THE ORIENTAL INSTITUTE OF THE UNIVERSITY OF CHICAGO ACHAEMENID ROYAL INSCRIPTIONS FROM PERSEPOLIS IN ELECTRONIC FORM MATTHEW W. STOLPER. PROFESSOR OF ASSYRIOLOGY GENE GRAGG. PROFESSOR OF NEAR EASTERN LANGUAGES AND DIRECTOR OF THE ORIENTAL INSTITUTE From 550 BC on, Cyrus the Great and his successors, the Persian structed narrative of Darius's triumph over his competitors for kings of the Achaemenid dynasty, conquered and held an em­ control of the empire, and it was not only addressed to posterity, pire on a scale that was without precedent in earlier Near East­ but also translated and disseminated to the conquered lands, and ern history, and without parallel until the formation of the an Aramaic version was copied out in Egypt about a century af­ Roman Empire. At its greatest extent, its corners were in Libya ter the text was composed. But the other inscriptions, for the and Ethiopia, Thrace and Macedonia, Afghanistan and Central most part, present the empire not as an accomplishment, the re­ Asia, and the Punjab. It incorporated ancient literate societies in sult of royal efforts, but as a divinely sanctioned order. Elam, Mesopotamia, Egypt, and elsewhere. It engaged the The characteristic that best represented this order was its di­ versity, which was in turn an expression of the empire's size. In I ~ 13DDO!lDC,lDcnIS I nybljc nm,'mms I I w< t?sjtc infocmali oo & SIlUj Sljcs I clIoyriyhl$ & pcn)Jjs.~ioos I ~ I a world where few people had seen maps, where area and dis­ I webs;l!! nllviegriQnal aid I tance could not be expressed in geometric figures, the most im­ pressive way of putting them in words was to name the many ACHAEMENID ROYAL INSCRIPTIONS people who served the king, and so some of the inscriptions give lists of subject lands, twenty, or twenty-three, or twenty-eight items long.
    [Show full text]
  • General Historical and Analytical / Writing Systems: Recent Script
    9 Writing systems Edited by Elena Bashir 9,1. Introduction By Elena Bashir The relations between spoken language and the visual symbols (graphemes) used to represent it are complex. Orthographies can be thought of as situated on a con- tinuum from “deep” — systems in which there is not a one-to-one correspondence between the sounds of the language and its graphemes — to “shallow” — systems in which the relationship between sounds and graphemes is regular and trans- parent (see Roberts & Joyce 2012 for a recent discussion). In orthographies for Indo-Aryan and Iranian languages based on the Arabic script and writing system, the retention of historical spellings for words of Arabic or Persian origin increases the orthographic depth of these systems. Decisions on how to write a language always carry historical, cultural, and political meaning. Debates about orthography usually focus on such issues rather than on linguistic analysis; this can be seen in Pakistan, for example, in discussions regarding orthography for Kalasha, Wakhi, or Balti, and in Afghanistan regarding Wakhi or Pashai. Questions of orthography are intertwined with language ideology, language planning activities, and goals like literacy or standardization. Woolard 1998, Brandt 2014, and Sebba 2007 are valuable treatments of such issues. In Section 9.2, Stefan Baums discusses the historical development and general characteristics of the (non Perso-Arabic) writing systems used for South Asian languages, and his Section 9.3 deals with recent research on alphasyllabic writing systems, script-related literacy and language-learning studies, representation of South Asian languages in Unicode, and recent debates about the Indus Valley inscriptions.
    [Show full text]
  • DIES DIEM DOCET: the DECIPHERMENT of UGARITIC Peggy L. Day in the Field of Ugaritic Studies, Three Scholars Are Generally Credit
    DIES DIEM DOCET: THE DECIPHERMENT OF UGARITIC Peggy L. Day In the field of Ugaritic studies, three scholars are generally credited with playing major roles in the decipherment of Ugaritic: Hans Bauer, Paul Dhorme and Charles Virolleaud. To date, only one scholar, Alan D. Corre\ has produced a detailed analysis of the precise roles that each of these scholars played1. While Corre's reconstruction of the decipherment process is, in general, commendably sound, it nevertheless does not take all of the relevant data into account, leaves certain facts unexplained (or underexplained) and is methodologically weak in its assigning dates to two important elements of the primary record that involve instances where lecture presentations preceded the published versions of the papers2. Thus there is room for improvement on Corre's work. The present article will proceed by laying out in chronological order and considerable detail the pertinent facts, attending more fully to data such as paper presentation dates, article completion dates, and intellectual context. It will also glean relevant data from articles, records and private communications heretofore overlooked as sources for reconstructing the decipherment process. It will then bring all of this information to bear on assessing the above scholars' respective roles, especially the problematic role of Charles Virolleaud. In March of 1928 a local farmworker plowing near Minet el-Beida dislodged a stone slab that covered a passageway leading to a vaulted tomb. When the Assyriologist Charles Virolleaud, then Director of the French Service des Antiquites* in Beirut, was informed A.D. Corre, "Anatomy of a Decipherment", Wisconsin Academy of Sciences, Arts and Letters 55, 1966, 11-20.
    [Show full text]
  • 21052020 Origin and Development of Brahmi Script.Pdf
    When talking about writing, we see absence of writing in early India or we can say traditional India. There were many reservations about writing in early India. On the whole, traditional India was much less oriented toward the written word than many other ancient and traditional cultures such as those of classical China and Japan or of the Islamic world. Brahma and also his wife (or daughter) Sarasvati, the goddess of learning, being regularly depicted in sculpture with a book in hand. But in contrast written knowledge was referred to as money in someone else’s hand in early times. But we do get many inscriptions from Indian subcontinent and they pose a serious proof that writing do existed. Panini used the word LIPI to denote the script. Jatakas and Vinaya-Pitaka of Buddhist text refers to numerous explicit references to writing. Megasthenes suggested that Indians knew writing but his contemporary nearchos said that Indians do not know writing. Some scholars have proposed a connection with the proto-historic Harappan script. Mahasthan and Sohgaura inscriptions have been proposed as precursor. Recent claim of ‘pre-Asokan Brahmi’ on the basis of evidence from Anuradhapura. B.B. Lal proposed that the ‘script’ on the pottery from Vikramkhol (Odisha) is a ‘missing link’ between historical Brahmi and the proto-historic script of Harappa. But this theory is generally not accepted. Richard Salomon argued that these are ‘pseudo-inscriptions’ or ‘Graffiti’. Ahmed Hasan Dani later proved that inscriptions of Mahasthan and Sohagaura were either contemporary to or later than Asokan inscriptions. F.R. Allchin and Robin Coningham excavated the famous site of Anuradhapura in Sri Lanka and suggested an early date on the basis of stratigraphic evidence.
    [Show full text]
  • The Origin and Transmission of the Alphabet
    Andrews University Digital Commons @ Andrews University Master's Theses Graduate Research 1994 The Origin and Transmission of the Alphabet Joaquim Azevedo Andrews University Follow this and additional works at: https://digitalcommons.andrews.edu/theses Recommended Citation Azevedo, Joaquim, "The Origin and Transmission of the Alphabet" (1994). Master's Theses. 28. https://digitalcommons.andrews.edu/theses/28 This Thesis is brought to you for free and open access by the Graduate Research at Digital Commons @ Andrews University. It has been accepted for inclusion in Master's Theses by an authorized administrator of Digital Commons @ Andrews University. For more information, please contact [email protected]. Thank you for your interest in the Andrews University Digital Library of Dissertations and Theses. Please honor the copyright of this document by not duplicating or distributing additional copies in any form without the author’s express written permission. Thanks for your cooperation. INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly firom the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be firom any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photogrsq>hs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.
    [Show full text]
  • Bark-Cloth Piece: a Newly Recognized Rongorongo Fragment
    The Rangitoki Bark-Cloth Piece: A Newly Recognized Rongorongo Fragment ... THE RAŊITOKI (RANGITOKI) BARK-CLOTH PIECE: A NEWLY RECOGNIZED RONGORONGO FRAGMENT FROM EASTER ISLAND Robert M. SCHOCH (Boston, Massachusetts, USA) [email protected] Tomi S. MELKA (Las Palmas de G.C., Spain) [email protected] The Raŋitoki fragment, here described in the literature for the first time, consists of a piece of bark-cloth collected on Easter Island in March 1869 that has painted on its surface a short rongorongo (RR) sequence of glyphs. Analysis of the historiography and inscription of the Raŋitoki fragment suggests that it is, most likely, a genuine relic of the rongorongo tradition on Easter Island and thus represents an authentic addition to the known corpus of RR inscriptions. Keywords: bark-cloth, corpus, genuine artifact, Rapa Nui, Rangitoki, Raŋitoki fragment, relic, rongorongo script Introduction In a publication of 1971, the German ethnologist and epigrapher Thomas S. Barthel commented, among many other things, on the technical features related to the production of rongorongo (RR) artifacts.1 Besides their possible thematic content; the dimensional size of objects; directionality (left-to-right) and writing method (essentially, reverse boustrophedon), and assumed stages during the writing / carving process, Barthel expresses his concern about the material media used to record inscriptions. While wood is by and large the basic medium, “It appears from the account of a calabash, which has unfortunately 1 BARTHEL, T. S. Pre-contact Writing in Oceania, pp. 1167–1169. 113 Asian and African Studies, Volume 28, Number 2, 2019 disappeared, that other materials were also inscribed. On the other hand we have no proof that rongo-rongo texts were possibly painted in color on bark- cloth”.2 The said “calabash” is first reported in William J.
    [Show full text]
  • World Heritage List on the Basis Middle East
    3. THE PROPERTY Bisotun (Iran) Description The nominated site is located on the main trade route No 1222 leading from Kurdistan and the Mesopotamian region to the Iranian Central Plateau. It is some 30km north-east of the city of Kermanshah. The core zone (ca 1200 x 500m) of the site covers the heart of the archaeological site, 1. BASIC DATA containing remains dating from pre-historic times through the history of ancient Persia, associated with the sacred State Party: Islamic Republic of Iran mountain of Bisotun and the renown relief and inscription Name of property: Bisotun by the Achaemenid king of Persia, Darius I The Great. The site has a ‘specific buffer zone’ extending ca 500m from Location: Province of Kermanshah the core zone on the plain side. On the mountain side, the Date received by core zone and the buffer zone coincide with the top of the the World Heritage Centre: 28 January 2005 mountain. The whole area, including the visible part of the mountain and a large area of the plain are covered by a Included in the Tentative List: 22 May 1997 landscape buffer zone with planning control. International Assistance from the World Heritage Fund for The prehistoric remains within the nominated site include preparing the nomination No Paleolithic cave finds, the earliest evidence of human Category of property: presence at the spring-fed pool of Bisotun (Sarâb), on the plain under the rock. These finds provide testimony to a In terms of the categories of cultural property set out in highly developed industry datable to the Middle Article 1 of the 1972 World Heritage Convention, this is Palaeolithic era, indicating that Bisotun was inhabited an archaeological site.
    [Show full text]
  • The Alphabet Tree Dren’S Puzzles
    TUGboat, Volume 26 (2005), No. 3 199 A major intellectual step was the invention of Philology the rebus device. This is where the sounds cor- responding to pictograms are combined to form a word. We have all come across these, often as chil- The alphabet tree dren’s puzzles. For example, a picture of a bee Peter Wilson plus a picture of a tray can represent the word ‘be- tray’, or more obscurely a picture of a bee plus a 1 Introduction picure of a female deer can stand for the word ‘be- We got from via Akn and ’ k n to hind’. Even a single pictogram can suffice; for in- XPD stance a pictogram of the sun can be used for both ` k p A K N, and and @ ¼ à in only about 2000 the words ‘sun’ and ‘son’. Consequently symbols years. This is a short story of how that happened can be created that just stand for sounds and then and how we know what the strange symbols mean, they can be combined to form words. This can a k n a k n e.g., and . The fonts in all the markedly reduce the number of symbols required for examples were designed for use with TEX, especially a full writing system. Symbols representing sounds by humanities scholars, and are freely available. are called phonograms. Phonetic writing requires 1.1 Writing systems few glyphs. In these writing systems, the glyphs represent sounds. All writing systems are a com- The earliest known writing is from Sumeria dating bination of phonetic and logographic elements but from about 3400 bc.
    [Show full text]
  • The Rosetta Stone and Its Decipherment*
    UNCLASSIFIED The Rosetta Stone and Its Decipherment* BY LAMBROS D. CALLIMAHOS Unclassified An account of the discovery of the Rosetta Stone and of historical attempts to decipher Egyptian hieroglyphs, culminating in their successful decryption by Jean Frani;ois Champollion. I feel partic;ularly qualified to give this lecture because, as you might know-in any case, Security knows-I was born in Egypt, and further­ more the Ptolemies were Greeks. The Ptolemy with whom we are principally concerned is Ptolemy V: he was five years old when he ascended to the throne, and I was four years old when I first set foot on the American shore, so you can see the similarities. Moreover, one of the other Ptolemies, Ptolemy XI, was known by the nickname Ptolemy Auletes-"Ptolemy the flute player." And so on. In addition to all of the foregoing, I had the thrill of actually touching the Rosetta Stone three months ago when I was in the British Museum-as a visitor, not as an exhibit-and this tactile ecstasy imbued me with renewed venera­ tion and inspiration about the whole blessed subject. Now to get on with it. The decipherment of the Rosetta Stone was a brilliant piece of cryptanalysis, and one of the greatest linguistic achievements of the 19th century. But let us start at the beginning. Napoleon Bonaparte's expedition into Egypt, although ill-fated militarily, nevertheless resulted in enormous cultural advances, for in his entourage were 175 of what he termed "learned civilians" -known, however, to his soldiers as "donkeys." This brain trust brought along a large library of practically every book on Egypt available in France, and many crates of scientific apparatus and measuring instruments.
    [Show full text]
  • A Computational Study of the Evolution of Cretan and Related Scripts Peter Revesz University of Nebraska-Lincoln, [email protected]
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Conference and Workshop Papers Computer Science and Engineering, Department of 10-1-2015 A Computational Study of the Evolution of Cretan and Related Scripts Peter Revesz University of Nebraska-Lincoln, [email protected] Follow this and additional works at: http://digitalcommons.unl.edu/cseconfwork Part of the Ancient History, Greek and Roman through Late Antiquity Commons, Comparative and Historical Linguistics Commons, Computational Linguistics Commons, Computer Engineering Commons, Electrical and Computer Engineering Commons, Language Interpretation and Translation Commons, and the Other Computer Sciences Commons Revesz, Peter, "A Computational Study of the Evolution of Cretan and Related Scripts" (2015). CSE Conference and Workshop Papers. 313. http://digitalcommons.unl.edu/cseconfwork/313 This Article is brought to you for free and open access by the Computer Science and Engineering, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in CSE Conference and Workshop Papers by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Mathematical Models and Computational Methods A Computational Study of the Evolution of Cretan and Related Scripts Peter Z. Revesz major influence for many other European alphabets. The Abstract— Crete was the birthplace of several ancient writings, classical Greek alphabet derives from the Phoenician alphabet including the Cretan Hieroglyphs, the Linear A and the Linear B except for the letters Φ, Χ, Ψ and Ω [24]. scripts. Out of these three only Linear B is deciphered. The sound The Old Hungarian alphabet is the alphabet used by values of the Cretan Hieroglyph and the Linear A symbols are Hungarians before the adoption of the Latin alphabet.
    [Show full text]