Decipherment

Decipherment

Decipherment Kevin Knight USC/ISI 4676 Admiralty Way Marina del Rey CA 90292 [email protected] Abstract oped by Alan Turing and Warren Weaver. We de- scribe recently published work on building auto- The first natural language processing sys- matic translation systems from non-parallel data. tems had a straightforward goal: deci- We also demonstrate how some of the same algo- pher coded messages sent by the en- rithmic tools can be applied to natural language emy. This tutorial explores connections tasks like part-of-speech tagging and word align- between early decipherment research and ment. today’s NLP work. We cover classic mili- Turning back to historical ciphers, we explore a tary and diplomatic ciphers, automatic de- number of unsolved ciphers, giving results of ini- cipherment algorithms, unsolved ciphers, tial computer experiments on several of them. Fi- language translation as decipherment, and nally, we look briefly at writing as a way to enci- analyzing ancient writing as decipher- pher phoneme sequences, covering ancient scripts ment. and modern applications. 1 Tutorial Overview 2 Outline The first natural language processing systems had 1. Classical military/diplomatic ciphers (15 a straightforward goal: decipher coded messages minutes) sent by the enemy. Sixty years later, we have many 60 cipher types (ACA) more applications, including web search, ques- • Ciphers vs. codes tion answering, summarization, speech recogni- • tion, and language translation. This tutorial ex- Enigma cipher: the mother of natural • plores connections between early decipherment language processing research and today’s NLP work. We find that – computer analysis of text many ideas from the earlier era have become core – language recognition to the field, while others still remain to be picked – Good-Turing smoothing up and developed. We first cover classic military and diplomatic 2. Foreign language as a code (10 minutes) cipher types, including complex substitution ci- Alan Turing’s ”Thinking Machines” phers implemented in the first electro-mechanical • Warren Weaver’s Memorandum encryption machines. We look at mathematical • tools (language recognition, frequency counting, 3. Automatic decipherment (55 minutes) smoothing) developed to decrypt such ciphers on Cipher type detection proto-computers. We show algorithms and exten- • Substitution ciphers (simple, homo- sive empirical results for solving different types of • ciphers, and we show the role of algorithms in re- phonic, polyalphabetic, etc) cent decipherments of historical documents. – plaintext language recognition We then look at how foreign language can be how much plaintext knowledge is ∗ viewed as a code for English, a concept devel- needed 3 Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 3–4, Sofia, Bulgaria, August 4-9 2013. c 2013 Association for Computational Linguistics index of coincidence, unicity dis- 7. Undeciphered writing systems (15 minutes) ∗ tance, and other measures Indus Valley Script (3300BC) – navigating a difficult search space • Linear A (1900BC) frequencies of letters and words • ∗ Phaistos disc (1700BC?) pattern words and cribs • ∗ Rongorongo (1800s?) EM, ILP, Bayesian models, sam- • ∗ pling 8. Conclusion and further questions (15 min- – recent decipherments utes) Jefferson cipher, Copiale cipher, ∗ civil war ciphers, naval Enigma 3 About the Presenter Application to part-of-speech tagging, • Kevin Knight is a Senior Research Scientist and word alignment Fellow at the Information Sciences Institute of the Application to machine translation with- University of Southern California (USC), and a • out parallel text Research Professor in USC’s Computer Science Parallel development of cryptography Department. He received a PhD in computer sci- • and translation ence from Carnegie Mellon University and a bach- Recently released NSA internal elor’s degree from Harvard University. Profes- • newsletter (1974-1997) sor Knight’s research interests include natural lan- guage processing, machine translation, automata 4. *** Break *** (30 minutes) theory, and decipherment. In 2001, he co-founded Language Weaver, Inc., and in 2011, he served 5. Unsolved ciphers (40 minutes) as President of the Association for Computational Zodiac 340 (1969), including computa- Linguistics. Dr. Knight has taught computer sci- • tional work ence courses at USC for more than fifteen years and co-authored the widely adopted textbook Ar- Voynich Manuscript (early 1400s), in- • tificial Intelligence. cluding computational work Beale (1885) • Dorabella (1897) • Taman Shud (1948) • Kryptos (1990), including computa- • tional work McCormick (1999) • Shoeboxes in attics: DuPonceau jour- • nal, Finnerana, SYP, Mopse, diptych 6. Writing as a code (20 minutes) Does writing encode ideas, or does it en- • code phonemes? Ancient script decipherment • – Egyptian hieroglyphs – Linear B – Mayan glyphs – Ugaritic, including computational work – Chinese N¨ushu, including computa- tional work Automatic phonetic decipherment • Application to transliteration • 4.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    2 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us