The Computer Generation of Cryptic Crossword Clues

Total Page:16

File Type:pdf, Size:1020Kb

The Computer Generation of Cryptic Crossword Clues Riddle posed by computer (6) : The Computer Generation of Cryptic Crossword Clues David Hardcastle Birkbeck, London University 2007 Thesis submitted in fulfilment of requirements for degree of PhD The copyright of this thesis rests with the author. Due acknowledgement must always be made of any quotation or information derived from it. 1 Abstract This thesis describes the development of a system (ENIGMA) that generates a wide variety of cryptic crossword clues for any given word. A valid cryptic clue must appear to make sense as a fragment of natural language, but it must also be possible for the reader to interpret it using the syntax and semantics of crossword convention and solve the puzzle that it sets. It should also have a fluent surface reading that conflicts with the crossword clue interpretation of the same text. Consider, for example, the following clue for the word THESIS: Article with spies’ view (6) This clue sets a puzzle in which the (the definite article) must be combined with SIS (Special Intelligence Service) to create the word thesis (a view), but the reader is distracted by the apparent meaning and structure of the natural language fragment when attempting to solve it. I introduce the term Natural Language Creation (NLC) to describe the process through which ENIGMA generates a text with two layers of meaning. It starts with a representation of the puzzle meaning of the clue, and generates both a layered text that communicates this puzzle and a fluent surface meaning for that same text with a shallow, lexically-bound semantic representation. This parallel generation process reflects my intuition of the creative process through which cryptic clues are written, and domain expert commentary on clue-setting. The system first determines all the ways in which a clue might be formed for the input word (which is known as the “light”). For example, a light might be an anagram of some other word, or it might be possible to form the light by embedding one word inside another. There are typically a great many of these clue-forming possibilities, each of which forms a different puzzle reading, and the system constructs a data message for each one and annotates it with lexical, syntactic and semantic information. As the puzzle reading is lexicalised a hybrid language understanding process locates syntactic and semantic fits between the elements of the clue text and constructs the surface reading of the clue as it emerges. 2 The thesis describes the construction of the datasets required by the system including a lexicon, thesaurus, lemmatiser, anagrammer, rubric builder and, in particular, a system that scores pairs of words for thematic association based on distributional similarity in the BNC, and a system that determines whether a typed dependency is semantically valid using data extracted from the BNC with a parser and generalized using WordNet. It then describes the hybridised, NLC generation process, which is implemented using an object-oriented grammar and semantic constraint system and briefly touches on the software engineering considerations behind the ENIGMA application. The literature review and evaluation of each separate research strand is described within the relevant chapter. I also present a quantitative and qualitative end-to-end evaluation in the final chapter in which 60 subjects participated in a Turing-style test to evaluate the generated clues and eight domain experts provided detailed commentary on the quality of ENIGMA’s output. 3 Contents Abstract..........................................................................................................................2 Contents .........................................................................................................................4 Declaration.....................................................................................................................6 Acknowledgements........................................................................................................7 Introduction....................................................................................................................8 Cryptic Clues as Multi-Layered Text ........................................................................9 Computing Cryptic Clues ........................................................................................10 A Sample Generated Clue........................................................................................11 Natural Language Creation......................................................................................13 Structure of the Thesis .............................................................................................15 Sample Output .........................................................................................................21 Chapter 1 Requirements Analysis ............................................................................23 1.1 An Analysis of the Domain..........................................................................23 1.2 Scoping ENIGMA...........................................................................................39 1.3 Determining Resources with Sample Clues.................................................42 1.4 Quality Control ............................................................................................47 1.5 Language Specification................................................................................52 Chapter 2 Building Clue Plans.................................................................................58 2.1 A Paragrammer ............................................................................................59 2.2 Orthographic Distance Measures.................................................................67 2.3 A Lexicon.....................................................................................................70 2.4 A Homophone Dictionary............................................................................71 2.5 A Lemmatiser/Inflector................................................................................73 2.6 Stop List.......................................................................................................78 2.7 A Thesaurus.................................................................................................78 2.8 Cryptic keywords.........................................................................................89 2.9 Word phrase information .............................................................................89 2.10 Syntactic and Semantic Role Addenda........................................................90 2.11 WordNet.......................................................................................................90 2.12 Sample Input/Output....................................................................................91 Chapter 3 Finding Thematic Context .......................................................................93 3.1 Introduction..................................................................................................93 3.2 Measuring Word Association......................................................................95 3.3 Five Approaches to the Problem..................................................................97 3.4 Algorithmic Details......................................................................................99 3.5 Evaluating and Tuning the Algorithms......................................................106 3.6 Extracting and Organising the Data...........................................................124 3.7 Further Research........................................................................................128 3.8 Conclusion .................................................................................................132 3.9 Sample Output...........................................................................................133 Chapter 4 Finding Syntactic Collocations..............................................................134 4.1 Lexical Choice in ENIGMA .........................................................................134 4.2 What is wrong with distributional data? ....................................................135 4.3 Extracting Syntactic Collocates .................................................................137 4.4 Generalizing the dataset.............................................................................143 4.5 Evaluation ..................................................................................................156 4.6 Baseline Comparison.................................................................................164 4 4.7 Discussion..................................................................................................166 4.8 Sample Output...........................................................................................171 Chapter 5 Creating Clues........................................................................................174 5.1 Clue Plans as Input.....................................................................................176 5.2 Lexicalization Options...............................................................................179 5.3 Analysing the Data.....................................................................................180 5.4 Bottom-up Processing with Chunks..........................................................183 5.5 Chunks as Text Fragments.........................................................................186
Recommended publications
  • Mitchell, Keith TITLE Edinburgh Working Papers in Applied Linguistics, 1996
    DOCUMENT RESUME ED 395 507 FL 023 856 AUTHOR Parkinson, Brian, Ed.; Mitchell, Keith TITLE Edinburgh Working Papers in Applied Linguistics, 1996. INSTITUTION Edinburgh Univ. (Scotland). Dept. of Linguistics. REPORT NO ISSN-0959-2253 PUB DATE 96 NOTE 141p.; For individual articles, see FL 023 857-865. For the 1995 volume, see ED 383 204. PUB TYPE Collected Works Serials (022) JOURNAL CIT Edinburgh Working Papers in Applied Linguistics; n7 1996 EDRS PRICE MF01/PC06 Plus Postage. DESCRIPTORS *Applied Linguistics; Cultural Awareness; Discussion (Teaching Technique); English for Academic Purposes; Foreign Countries; Higher Education; Journalism; Learning Strategies; Modern Languages; Poetry; *Second Language Instruction; Second Language Learning; Semantics; Teacher Education; Word Recognition IDENTIFIERS Deixis; *University of Edinburgh (Scotland) ABSTRACT This monograph contains papers on research work in progress at the Department of Applied Linguistics and Institute for Applied Language Studies at the University of Edinburgh (Scotland). Topics addressed include general English teaching, English for Academic Purposes teaching, Modern Language teaching, and teacher education. Papers are: "Cultural Semantics in a Second-Language Text" (Carol Chan); "The Concealing and Revealing of Meaning in the Cryptic Crossword Clue" (John Cleary); "Learners' Perceptions of Factors Affecting Their Language Learning" (Giulia Dawson, Elisabeth McCulloch and Stella Peyronel); "Japanese Learners.in Speaking Classes" (Eileen Dwyer and Anne Heller-Murphy); "Manipulating Reality Through Metaphorizing Processes in Wartime Reporting" (Noriko Iwamoto); "Basing Discussion Classes on Learners' Questions: An Experiment in (Non-)Course Design" (Tony Lynch); "Participant Action Plans and the Evaluation of Teachers' Courses" (Ian McGrath); "Deixis and the Dynamics of the Relationship Between Text and Reader in the Poetry of Eugenio Montale" (Rossella Riccobono); and "Constructivism, Optimality, and Language Acquisition.
    [Show full text]
  • An Analysis of Dilemmas in English Composition Among Asian College Students Shuko Nawa University of North Florida
    UNF Digital Commons UNF Graduate Theses and Dissertations Student Scholarship 1995 An Analysis Of Dilemmas In English Composition Among Asian College Students Shuko Nawa University of North Florida Suggested Citation Nawa, Shuko, "An Analysis Of Dilemmas In English Composition Among Asian College Students" (1995). UNF Graduate Theses and Dissertations. 83. https://digitalcommons.unf.edu/etd/83 This Master's Thesis is brought to you for free and open access by the Student Scholarship at UNF Digital Commons. It has been accepted for inclusion in UNF Graduate Theses and Dissertations by an authorized administrator of UNF Digital Commons. For more information, please contact Digital Projects. © 1995 All Rights Reserved AN ANALYSIS OF DILEMMAS IN ENGLISH COMPOSITION AMONG ASIAN COLLEGE STUDENTS by Shuko Nawa A thesis submitted to the Department of Curriculum and Instruction in partial fulfillment of the requirements for the degree of Master of Education UNIVERSITY OF NORTH FLORIDA COLLEGE OF EDUCATION AND HUMAN SERVICES May, 1995 Unpublished work © Shuko Nawa Certificate of Approval The thesis ofShuko Nawa is approved: (Date) Signature Deleted Signature Deleted Signature Deleted Accepted for the Division: Signature Deleted Chairperson Accepted for the College: Signature Deleted Accepted for the University Signature Deleted Dedication and Acknowledgement This study is dedicated to the researcher's father, Nobuhide Nawa, who taught the importance of learning and exhibited continuous efforts to become a better person through his life. The researcher gratefully acknowledges the assistance of Dr. Priscilla VanZandt, Dr. Patricia Ellis, and Mrs. Mary Thornton in preparing research materials. Conscientious reviewing by Miss Suzanne Fuss is also deeply appreciated. The researcher would like to extend special thanks to Dr.
    [Show full text]
  • Package 'Tidytext'
    Package ‘tidytext’ February 19, 2018 Type Package Title Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools Version 0.1.7 Description Text mining for word processing and sentiment analysis using 'dplyr', 'ggplot2', and other tidy tools. License MIT + file LICENSE LazyData TRUE URL http://github.com/juliasilge/tidytext BugReports http://github.com/juliasilge/tidytext/issues RoxygenNote 6.0.1 Depends R (>= 2.10) Imports rlang, dplyr, stringr, hunspell, broom, Matrix, tokenizers, janeaustenr, purrr (>= 0.1.1), methods, stopwords Suggests readr, tidyr, XML, tm, quanteda, knitr, rmarkdown, ggplot2, reshape2, wordcloud, topicmodels, NLP, scales, gutenbergr, testthat, mallet, stm VignetteBuilder knitr NeedsCompilation no Author Gabriela De Queiroz [ctb], Oliver Keyes [ctb], David Robinson [aut], Julia Silge [aut, cre] Maintainer Julia Silge <[email protected]> Repository CRAN Date/Publication 2018-02-19 19:46:02 UTC 1 2 bind_tf_idf R topics documented: bind_tf_idf . .2 cast_sparse . .3 cast_tdm . .4 corpus_tidiers . .5 dictionary_tidiers . .6 get_sentiments . .6 get_stopwords . .7 lda_tidiers . .8 mallet_tidiers . 10 nma_words . 12 parts_of_speech . 12 sentiments . 13 stm_tidiers . 14 stop_words . 16 tdm_tidiers . 17 tidy.Corpus . 18 tidytext . 19 tidy_triplet . 19 unnest_tokens . 20 Index 22 bind_tf_idf Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset Description Calculate and bind the term frequency and inverse document frequency of a tidy text dataset, along with the product, tf-idf, to
    [Show full text]
  • The Cryptic Crossword Puzzle As a Useful Analogue in Teaching Programming
    The Cryptic Crossword Puzzle as a Useful Analogue in Teaching Programming Simon School of Design, Communication, and Information Technology University of Newcastle PO Box 127, Ourimbah 2258, New South Wales [email protected] Abstract while another wrote “I don’t buy the premise” (Anon2 2003). Contrary to the apparent beliefs of many students, Nevertheless, the papers cited above, along with a wealth computer programming and problem solving are not of others, show that many reputable educators share the amenable to purely book learning. These skills can be perception that more students than expected have acquired only by practice, and even then, students with an significant trouble with programming courses. Unless aptitude for programming will acquire the skills far more faulty delivery is endemic among teachers of readily than those without. Unfortunately, aptitude is a programming, the existence of a relevant aptitude is concept that many students have difficulty appreciating. appealing as a partial explanation. This paper describes a novel approach to helping students understand the concept.. This paper is written in the assumption that there is such a Keywords: computer aptitude, cryptic crossword. thing as programming aptitude. However, as one of the aforementioned referees pointed out, the paper’s subject 1 Does computer aptitude exist? matter has potential pedagogical value even for academics who do not accept the existence of aptitude Many in the computing education community agree that (Anon2). there are students who take to programming and problem solving and students who don’t. They further agree that 2 The scope of this paper these skills are so important that students who don’t take to them tend not succeed in the course.
    [Show full text]
  • A Customizable Speech Practice Application for People Who Stutter
    Rollins College Rollins Scholarship Online Honors Program Theses Spring 2021 A Customizable Speech Practice Application for People Who Stutter Eric Grimm [email protected] Nikola Vuckovic Follow this and additional works at: https://scholarship.rollins.edu/honors Part of the Other Computer Sciences Commons Recommended Citation Grimm, Eric and Vuckovic, Nikola, "A Customizable Speech Practice Application for People Who Stutter" (2021). Honors Program Theses. 154. https://scholarship.rollins.edu/honors/154 This Open Access is brought to you for free and open access by Rollins Scholarship Online. It has been accepted for inclusion in Honors Program Theses by an authorized administrator of Rollins Scholarship Online. For more information, please contact [email protected]. 1 A Customizable Speech Practice Application for People Who Stutter Eric Grimm and Nikola Vuckovic Rollins College 2 Abstract Stuttering is a speech impediment that often requires speech therapy to curb the symptoms. In speech therapy, people who stutter (PWS) learn techniques that they can use to improve their fluency. PWS often practice their techniques extensively in order to maintain fluent speech. Many listen to audio recordings to practice where a single word or sentence is played on the recording and then there is a pause, giving the user a chance to say the word(s) to practice. This style of practice is not customizable and is repetitive since the contents do not change. Thus, we have developed an application for iPhone that uses a text-to-speech API to read single words and sentences to PWS, so that they can practice their techniques. Each practice mode is customizable in that the user can choose to practice certain sounds that they struggle with, and the app will respond by choosing words that start with the desired sound or generate sentences that contain several words that start with the desired sound.
    [Show full text]
  • Think Python
    Think Python How to Think Like a Computer Scientist 2nd Edition, Version 2.4.0 Think Python How to Think Like a Computer Scientist 2nd Edition, Version 2.4.0 Allen Downey Green Tea Press Needham, Massachusetts Copyright © 2015 Allen Downey. Green Tea Press 9 Washburn Ave Needham MA 02492 Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported License, which is available at http: //creativecommons.org/licenses/by-nc/3.0/. The original form of this book is LATEX source code. Compiling this LATEX source has the effect of gen- erating a device-independent representation of a textbook, which can be converted to other formats and printed. http://www.thinkpython2.com The LATEX source for this book is available from Preface The strange history of this book In January 1999 I was preparing to teach an introductory programming class in Java. I had taught it three times and I was getting frustrated. The failure rate in the class was too high and, even for students who succeeded, the overall level of achievement was too low. One of the problems I saw was the books. They were too big, with too much unnecessary detail about Java, and not enough high-level guidance about how to program. And they all suffered from the trap door effect: they would start out easy, proceed gradually, and then somewhere around Chapter 5 the bottom would fall out. The students would get too much new material, too fast, and I would spend the rest of the semester picking up the pieces.
    [Show full text]
  • Ebook Download the Times Cryptic Crossword Book 14
    THE TIMES CRYPTIC CROSSWORD BOOK 14: 80 OF THE WORLDS MOST FAMOUS CROSSWORD PUZZLES PDF, EPUB, EBOOK The Times Mind Games | 192 pages | 02 Mar 2010 | HarperCollins Publishers | 9780007319299 | English | London, United Kingdom The Times Cryptic Crossword Book 14: 80 of the Worlds Most Famous Crossword Puzzles PDF Book Yet they were initially frowned on in Britain, despite being the invention of an Englishman, Liverpudlian journalist Arthur Wynne. Her grandchildren have fond memories of playing in her basement and riding…. A friend of Bill W. Carmen was very smart…. About Richard Browne. Ximenes on the Art of the Crossword. Our collection of free printable crossword puzzles for kids is an easy and fun way for children and students of all ages to become familiar with a subject or just to enjoy themselves. Libertarian setters may use devices which "more or less" get the message across. Wodehouse — though his own prowess was often lacking in such characters as Lord Uffenham, a bumbling aristocrat in his novel Something Fishy. She was an avid reader, loved her morning coffee, crossword puzzles…. She was an avid book worm and also loved working on crossword puzzles. And while computers have beaten humans in most mental pursuits, including backgammon, Scrabble and draughts, we have yet to see the software that can solve all crossword clues thrown at it. View all Wishlists Close. He never forgot a birthday or anniversary and always…. Pay in four simple instalments, available instantly at checkout. She will be dearly missed. By Denise Sutherland. Follow us. Instead, a phrase along the lines of "first to reach" would be needed as this conforms to rules of grammar.
    [Show full text]
  • SPELLING out P a UNIFIED SYNTAX of AFRIKAANS ADPOSITIONS and V-PARTICLES by Erin Pretorius Dissertation Presented for the Degree
    SPELLING OUT P A UNIFIED SYNTAX OF AFRIKAANS ADPOSITIONS AND V-PARTICLES by Erin Pretorius Dissertation presented for the degree of Doctor of Philosophy in the Faculty of Arts and Social Sciences at Stellenbosch University Supervisors Prof Theresa Biberauer (Stellenbosch University) Prof Norbert Corver (University of Utrecht) Co-supervisors Dr Johan Oosthuizen (Stellenbosch University) Dr Marjo van Koppen (University of Utrecht) December 2017 This dissertation was completed as part of a Joint Doctorate with the University of Utrecht. Stellenbosch University https://scholar.sun.ac.za DECLARATION By submitting this dissertation electronically, I declare that the entirety of the work contained therein is my own, original work, that I am the owner of the copyright thereof (unless to the extent explicitly otherwise stated) and that I have not previously in its entirety or in part submitted it for obtaining any qualification. December 2017 Date The financial assistance of the South African National Research Foundation (NRF) towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at, are those of the author and are not necessarily to be attributed to the NRF. The author also wishes to acknowledge the financial assistance provided by EUROSA of the European Union, and the Foundation Study Fund for South African Students of Zuid-Afrikahuis. Copyright © 2017 Stellenbosch University Stellenbosch University https://scholar.sun.ac.za For Helgard, my best friend and unconquerable companion – all my favourite spaces have you in them; thanks for sharing them with me For Hester, who made it possible for me to do what I love For Theresa, who always takes the stairs Stellenbosch University https://scholar.sun.ac.za ABSTRACT Elements of language that are typically considered to have P (i.e.
    [Show full text]
  • From Crosswords to High-Level Language Competency
    DOCID: 3860656 f. pp1·oved for Reiease by MSA on 12-12-~0 ase # 47627 UNCLASSIFIED Cryptologic Quarterly Spring/Summer 2005 From Crosswords to High-Level Language Competency Jeffrey Knisbacher The Historical Context that the first daily crossword appeared, not in the World but in the New York Herald Tribune. It On December 21, 1913, the modern crossword was also in 1924 that the first book of crossword puzzle was invented in New York City by an immi­ puzzles appeared, from the fledgling firm of grant from Liverpool, a journalist named Arthur Simon and Schuster, which grew to become one of Wynne. the world's publishing giants on the heels of that initial success. The first New York Times puzzle That first puzzle was introduced in the "Fun" appeared only in 1930, and a regular Sunday puz­ section of the New York World, a paper founded zle was instituted in 1942. By 1950 the Times in 1860, which lasted over 100 years until its Sunday puzzle had become so popular that a daily demise as the New York World Journal Tribune version was instituted, which, in the 1960s, was in 1967. Joseph Pulitzer (memorialized by the syndicated to other papers across the country. To Pulitzer Prize) had purchased the paper in 1883 this day it remains the epitome of challenge for and ran it until his death in 1911. serious puzzle fans. The crossword puzzle came into being in the It was only in 1922, after the end of the "Great heyday of sensationalism and wild publicity, War," that the first such puzzle appeared in when, for lack of competition from other media Britain, and in 1924 the Sunday Express was per­ almost every large city had multiple newspapers suaded to run some crosswords, the first of which vying for readership.
    [Show full text]
  • Typical Example of Something Crossword
    Typical Example Of Something Crossword decidedly.RodrickIrreplaceable cripples Weslie her hammercloths?subedits his distruster Calhoun splice smatter forthrightly. her calorimeter Which Irvine periodically, superfuse she so incages affirmingly it that In a crossword puzzle is much more tricky clue example crossword puzzles are another His manga artist name fivethirtyeight predicted a monday. Centre court and it should not a monstrous crossword? To be want you a few easy rule for untypicalthat is passed. Crossword cluesabbreviations are in language became extremely successful in a skill set of that it may never sandwiched in. Kinds beehive for the crossword clues. Part of whether or nearly finished scraping, come back to wring every crossword puzzle blogs to my brain was able to state? All male called discotheques first. His intimate style crossword clue you may follow in battle, example of something crossword speaks to select your brain shutdowns, a needless piece together several functions you hear in. Feedback if taken at an astringent is. During this website in either the letters from republication of answers would have you? Britain and of something being claimed is easier and. Several other day with beehive clue display with our crossword clue to be a horn honk might turn off. Ready to switch between hang limply from a gravel driveway, representing a head was writing. The entry contains a theme or harvard university press or beehive for. As typical something being passed an exclamation. Take you think. Typically in typical example crossword puzzle, typicals and more. Issues typical of clues answers should be described as needed. Death had been typical example of plays something crossword answer for back to find yourself.
    [Show full text]
  • Package 'Qdap'
    Package ‘qdap’ March 13, 2013 Type Package Title Bridging the gap between qualitative data and quantitative analysis Version 0.2.1 Date 2012-05-09 Author Tyler Rinker Maintainer Tyler Rinker <[email protected]> Depends R (>= 2.15), gdata, ggplot2 (>= 0.9.2), grid, reports, scales Imports gridExtra, chron, grDevices, RColorBrewer, igraph, tm,wordcloud, venneuler, openNLPmod- els.en, Snowball, gplots,gridExtra, openNLP, plotrix, XML, RCurl, reshape2, parallel,tools Suggests plyr, koRpus LazyData TRUE Description This package automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including fre- quency counts of sentence types, words,sentence, turns of talk, syllable counts and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This provides the user with a more efficient and targeted analysis. Note qdap is not compiled for Mac users. Installation instructions for Mac user or other OS users having difficulty installing qdap please visit: http://trinker.github.com/qdap_install/installation License GPL-2 URL http://trinker.github.com/qdap/ BugReports http://github.com/trinker/qdap/issues 1 2 R topics documented: Collate ’adjacency_matrix.R’ ’all_words.R’’automated_readability_index.R’ ’bag.o.words.R’ ’blank2NA.R’’bracketX.R’
    [Show full text]
  • Techniques and Challenges in Speech Synthesis Final Report for ELEC4840B
    Techniques and Challenges in Speech Synthesis Final Report for ELEC4840B David Ferris - 3109837 04/11/2016 A thesis submitted in partial fulfilment of the requirements for the degree of Bachelor of Engineering in Electrical Engineering at The University of Newcastle, Australia. Abstract The aim of this project was to develop and implement an English language Text-to-Speech synthesis system. This first involved an extensive study of the mechanisms of human speech production, a review of modern techniques in speech synthesis, and analysis of tests used to evaluate the effectiveness of synthesized speech. It was determined that a diphone synthesis system was the most effective choice for the scope of this project. A diphone synthesis system operates by concatenating sections of recorded human speech, with each section containing exactly one phonetic transition. By using a database that contains recordings of all possible phonetic transitions within a language, or diphones, a diphone synthesis system can produce any word by concatenating the correct diphone sequence. A method of automatically identifying and extracting diphones from prompted speech was designed, allowing for the creation of a diphone database by a speaker in less than 40 minutes. The Carnegie Mellon University Pronouncing Dictionary, or CMUdict, was used to determine the pronunciation of known words. A system for smoothing the transitions between diphone recordings was designed and implemented. CMUdict was then used to train a maximum-likelihood prediction system to determine the correct pronunciation of unknown English language alphabetic words. Using this, the system was able to find an identical or reasonably similar pronunciation for over 76% of the training set.
    [Show full text]