Syllable Identification

Total Page:16

File Type:pdf, Size:1020Kb

Syllable Identification The University of Birmingham School of Computer Science MSc in Advanced Computer Science Summer Project Syllable Identification Norshuhani Zamin Supervisor: Dr. W H Edmondson September 2004 Abstract Syllabification is part of the linguistic problems and developing computer software to predict the syllable boundaries is a challenging task. In practice, it is easier to determine the syllable boundaries manually especially in a syllabic spelling system with the fact that we know the linguistic element of the language. Identifying syllable boundaries for English is a daunting process because English is an alphabetic spelling system. To write software, it is traditionally assumed that various sources of linguistic knowledge should be incorporated in order to convert words into their syllable structure with reasonable accuracy. The linguistic knowledge is important to define the graphotactic and phonetic rules. The purpose of this project has been to investigate the problem in English syllabification and to represent 2 different approaches to automatic detection of syllable boundaries. The first approach syllabifies a text from its grapheme or symbol while the second approach syllabifies a text from its sound. It was found that, many existing research on syllabification adopted the second approach. Although different researchers propose different knowledge structure but most of them used the typical architecture for grapheme-to-phoneme conversion while to go from text to grapheme or symbol is a new technique. In this project, I demonstrate the use of hand-written rules for English syllabification and knowledge structures trained on both approaches and compare the performance and accuracy of these approaches. The evaluation shows that going from text to symbol is easier and it performs better on finding the syllable boundaries than going from text to sound. Recommendations for future projects of this nature are made. Keywords Syllable; syllabification; maximum onset principle; phonotactic; graphotactic; diagraph rule; silent rules; orthography; syllabic consonant; consonant clusters; segmentation; constraints. 11 Acknowledgement After a long period of completing this master degree thesis, I would like to express my sincere gratitude to the following people who contributed in some way to this thesis. Dr. William Edmondson, my supervisor for the incredible amount of patience he had with me since the first time he knew me. He was the one who inspired me to do natural language processing which I never thought before. It was an absolute pleasure to have him as the supervisor. Thank you for the many discussions, motivations and wise words. I owe him lots of gratitude and I am very glad to get to know him in my life. Dr. Peter Coxhead, my second supervisor who took over the supervision when Dr. William Edmondson was away for his sabbatical studies. I learned many things from him and he was always very kind to me. As the Academic Manager in the school, he is always busy but always was available when I needed his advises. I am grateful for his invaluable support and excellent guidance. Dr. Ela Claridge, my Academic Advisor for providing academic and support service. Thank you for your advises while monitoring my academic progress. Last but not least, I am very grateful to my husband, Azmi for his love and patience during my study period. One of the best experiences that we lived through in this period was the birth of our son Anwar Aliff, who provided and additional and joyful dimension to our life. Ill Contents Abstract and keywords .......................................................................... ii Acknowledgement. .............................................................................. iii Figures .............................................................................................. vi Tables .............................................................................................. vii I Introduction .................................................................................. l 1.1 Background ......................................................................... I 1.2 Objective ............................................................................. 2 1.3 Organization of the studies ........................................................ 3 1.4 Scope and limitation ............................................................... 3 1.5 Research methodology ............................................................ .4 2 Literature Review ........................................................................... 4 3 An Overview of English Spelling .......................................................... 9 3.1 Phonology and orthography ..................................................... 10 3.2 Consonants and vowels .......................................................... I 7 3.3 Syllable structure .................................................................. 22 3.4 Problem to overcome ............................................................. 26 4 Methods ..................................................................................... 27 4.1 Approaches 4.1.1 Text- Symbol -Syllable ............................................ 28 4.1.2 Text- Sound- Syllable ............................................... 29 4.2 Data collection and analysis ..................................................... 23 4.3 Rules construction ................................................................ 30 4.4 Syllabification with Maximum Onset Principle ............................... 30 IV 5 Implementations ........................................................................... 47 5.1 Memory ............................................................................. 47 5.2 Data structures ...................................................................... 47 5.3 Modularity .......................................................................... 48 5.4 Input I Output.. ..................................................................... 48 5.5 Features ............................................................................. 42 5.6 Tools ................................................................................ 53 6 Performance and Justifications ........................................................... 54 7 Conclusions and Future Work ............................................................. 46 References ........................................................................................ 48 Bibliographies .................................................................................... 58 Appendices A Summer project declaration B IP A symbols with corresponding ASCII C English loanwords 0 IP A full chart E System Requirements and User Guide v Figures 3.1 Poem illustrating difficulties of English spelling and sounds ........................ I 0 3.2 Great Vowel Shift process ............................................................... 15 3.3 The consonants of English ............................................................... 19 3.4 The vowels of English ................................................................... 20 3.5 Diagram of vocal organs and articulatory regions .................................... 20 3.6 Conventional Syllable Structure ........................................................ 22 3.7 Example of William and Zhang's approach on syllable structure ................... 23 4.1 Syllabification flow chart ................................................................. 27 4.2 Maximum Onset Principle process ..................................................... 44 5.1 User interface ............................................................................... 39 5.2 Sample output for word 'signification' .................................................. 50 5.3 Sample output for word 'surreptitious' .................................................. 50 5.4 Sample output for word 'bedridden' .................................................... 51 5.5 Sample output for word 'representation' ............................................... 51 5.6 Sample output for word 'antidisestablishmentarianism' .............................. 52 5.7 Sample output for text 'access accurate occident accompany' ...................... 52 6.1 Text -7 Symbol-7 Sound with syllable boundaries ................................... 57 VI Tables 3.1 Name of vocal organs and articulatory regions ...................................... 21 3.2 English open syllables .................................................................. 24 3.3 English closed syllables ................................................................. 25 4.1 Example of Text~ Symbol~ Syllable approach .................................. 28 4.2 Example of Text~ Sound~ Syllable approach .................................... 23 4.3 Graphotactic rules ........................................................................ 30 4.4 List of permissible onset sequences for symbol. .................................... 32 4.5 Pattern of consonant clusters for symbol.. ............................................ 33 4.6 Diagraph rules ............................................................................ 35 4.7 Vowel rules ............................................................................... 37 4.8 Consonant rules ........................................................................... 39 4.9 Silent rules ...............................................................................
Recommended publications
  • Studying Word-Formation in English
    VYTAUTAS MAGNUS UNIVERSITY FACULTY OF HUMANITIES DEPARTMENT OF ENGLISH PHILOLOGY Jūratė Ruzaitė Studying Word-Formation in English A resource book Vytautas Magnus University Kaunas, 2012 UDK 811.111(075.8) Ru218 This resource book was approved for publication at the meeting of the Department of English Philology, Faculty of Humanities on 23 May 2012 (Protocol No. 2) and the meeting of the Committee of the Faculty of Humanities, Vytautas Magnus University on 19 June 2012 (Protocol No. 3). Reviewed by Assoc. Prof. Dr. Violeta Kalėdaitė ISBN 978-9955-12-801-4 e-ISBN 978-9955-12-802-1 © Jūratė Ruzaitė, 2012 © Vytautas Magnus University, 2012 Table of Contents Introduction ...............................................................................................4 Part 1: Theoretical and methodological preliminaries .........................9 1.1. Abstract morphological facts, background, and beliefs .......10 1.2. Morphology in action ...........................................................15 1.3. Studying word formation: an introduction to morphological analysis ....................................................17 1.3.1. Important distinctions in studying word formation ....17 1.3.2. What is a word? ..........................................................21 1.3.3. Analytic principles of morphological analysis .........25 Part 2: Inflectional morphology ............................................................29 Part 3: Studying complex words ...........................................................33 3.1. Analysis of complex
    [Show full text]
  • English Vocabulary Elements
    www.IELTS4U.blogfa.com ENGLISH VOCABULARY ELEMENTS www.IELTS4U.blogfa.com This page intentionally left blank www.IELTS4U.blogfa.com ENGLISH VOCABULARY ELEMENTS Keith Denning Brett Kessler William R. Leben www.IELTS4U.blogfa.comSecond edition 1 2007 3 Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Copyright © 1995, 2007 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. www.IELTS4U.blogfa.comLibrary of Congress Cataloging-in-Publication Data Denning, Keith M. English vocabulary elements / Keith Denning, Brett Kessler, William R. Leben.—2nd ed. p. cm. Includes bibliographical references and index. ISBN-13 978-0-19-516802-0; 978-0-19-516803-7 (pbk.) ISBN 0-19-516802-X; 0-19-516803-8 (pbk.) 1. Vocabulary. 2. English language—Grammar. I. Kessler, Brett, 1956– II. Leben, William Ronald, 1943– III. Title. PE1449.D424 2006 428.1—dc22 2006049863 1 3 5 7 9 8 6 4 2 Printed in the United States of America on acid-free paper Preface Intended Audience for This Book This book is intended for use in college-level courses dealing with English word structure.
    [Show full text]
  • Make a Word with My Letters
    Make A Word With My Letters Hubert is acetic and sparklings participially while isochronal Keil rape and impasted. Abbey often misread haphazard when thatinvertebrate Haskell hookWilley her keypunches mezzo-soprano? disobligingly and confusing her romanticisation. Which Kincaid relearned so penitentially Longest word in English Wikipedia. How you Create Printable Block Letters in Microsoft Word. Mac and dozens more serious crossword sets of whatever type each time that you a word with my letters make a sentence out. Word Unscrambler UnscrambleDescramble letters and words It helps you win lots of word games such as Scrabble Words with Friends Text. This tool rearranges all kinds of word with references or even the heart. You get better still shows the sentence with a word game, vous savez donc le defi consiste a picture with. All perfect pangrams of English An English pangram is imperative sentence. This makes little longer connected to show host and letters make with a word search to design: a words with fellow players, no clues can adjust your day. KIDS' SQUIGGLES LETTERS MAKE WORDS Learn write Read. 13 Letter Words You Go Words. The computer will always have accomplished more letters make with a word my computer. Could find words in them easy with visual basic english word off the word with a my letters make one from these letters to place an unnamed computer generated words. Which word takes 3 hours to say? Oversized Letter Cards for Spelling This Reading Mama. KIDS' SQUIGGLES LETTERS MAKE WORDS Learn to watch Sound Out. Words Created Using the Letters in 'daily' Word Game Helper.
    [Show full text]
  • Language Trivia
    Language Trivia "Euouae," a medieval music term, is the longest word in English that contains only vowels. It’s also the word with the most consecutive vowels. "Screeched," which means to make a harsh sound, is the longest one-syllable word in English. "Unprosperousness", meaning not wealthy or profitable, is the longest word in English in which each letter is used at least two times. The words "facetiously," "abstemiously," and "arseniously," each contain all six vowels (including “y”) in alphabetical order. The word "duoliteral" contains all five vowels (not including “y”) in reverse alphabetical order. At 45 letters, "pneumonoultramicroscopicsilicovolcanoconiosis," which refers to a lung disease, is often considered the longest word in English. "Feedback" is the shortest word in English that has the letters a, b, c, d, e, and f. "Floccinaucinihilipilification," is the longest word in English that does not contain letter “e” No words in English rhyme with: "month," "orange," "silver," or "purple." “Q” is the only letter that does not occur in any of the U.S. state names. "Maine" is the only U.S. state whose name is just one syllable. "Bookkeeper" is the only English word that has three consecutive double letters. The word “therein” contains only seven letters, but it contains 10 words that can be formed using consecutive letters: the, there, I, he, in, rein, her, here, ere, herein. The sentence “The quick brown fox jumps over the lazy dog” is a pangram, which is a sentence that uses every letter of the alphabet. "United Arab Emirates," a small country in the Middle East, is made up of alternating vowels and consonants.
    [Show full text]
  • An Introduction to English Morphology: Words and Their Structure
    An Introduction to English Morphology: Words and Their Structure Andrew Carstairs-McCarthy Edinburgh University Press 01 pages i-viii prelims 18/10/01 3:42 pm Page i An Introduction to English Morphology 01 pages i-viii prelims 18/10/01 3:42 pm Page ii Edinburgh Textbooks on the English Language General Editor Heinz Giegerich, Professor of English Linguistics (University of Edinburgh) Editorial Board Laurie Bauer (University of Wellington) Derek Britton (University of Edinburgh) Olga Fischer (University of Amsterdam) Norman Macleod (University of Edinburgh) Donka Minkova (UCLA) Katie Wales (University of Leeds) Anthony Warner (University of York) An Introduction to English Syntax Jim Miller An Introduction to English Phonology April McMahon An Introduction to English Morphology Andrew Carstairs-McCarthy 01 pages i-viii prelims 18/10/01 3:42 pm Page iii An Introduction to English Morphology Words and Their Structure Andrew Carstairs-McCarthy Edinburgh University Press 01 pages i-viii prelims 18/10/01 3:42 pm Page iv To Jeremy © Andrew Carstairs-McCarthy, 2002 Edinburgh University Press Ltd 22 George Square, Edinburgh Typeset in Janson by Norman Tilley Graphics and printed and bound in Great Britain by MPG Books Ltd, Bodmin A CIP Record for this book is available from the British Library ISBN 0 7486 1327 7 (hardback) ISBN 0 7486 1326 9 (paperback) The right of Andrew Carstairs-McCarthy to be identified as author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. Contents Acknowledgements viii
    [Show full text]
  • Amazing, Interesting, Wonderful, Weird, Funny
    A-PDF Watermark DEMO: Purchase from www.A-PDF.com to remove the watermark INTERESTING FACTS Tuesday is considered as the most productive day of the week. In human body the right lung takes in more air than the left one. The sun is 330330 times larger than the earth. Bill gates house was designed using Macintosh computer which is a brand of the microsoft’s rival company. Almost all varieties of breakfast cereals are made from grass. In the 1930’s America track star Jesse Owens used to race against horses and dogs to earn a living. There is a great mushroom in Oregon that is 2,400 years old. It Covers 3.4 square miles of land and is still growing. Jimmy Carter is the first USA president to have born in hospital. Elephants are the only animals that cannot jump. Cleopatra married two of her brothers. It is illegal to own a red car in shanghai china. Tru to spin an egg, Its strange that a hard-boiled egg will spin but an uncooked or soft-boiled egg will not. Astronauts cannot burp in space. People with blue eyes see better in dark. The snowiest city in the USA is Blue Canyon, California. Lake Nicaragua in Nicaragua is the only fresh water lake in the world that has sharks. Kite flying is a professional sport in Thailand. The gasoline cannot freeze no matter how cold the temperature falls. Human stomach produces a new layer of mucus every two weeks otherwise it will digest itself. Every person has a unique tongue print.
    [Show full text]
  • The Book of Word Records
    www.ZTCprep.com THE BOOK OF WORD RECORDS A Look at Some of the Strangest, Shortest, Longest, and Overall Most Remarkable Words in the English Language Asher Cantrell www.ZTCprep.com For my grandfathers, Benjamin J. Lisle and C. Thomas Cantrell. I wouldn’t have done this without your constant encouragement. www.ZTCprep.com Contents Introduction Twelve of the Most Popular Passwords (and Why They’re Awful) The Fifteen Longest Words in the Dictionary Words That Just Might Win You a Game of Scrabble Eleven of the Longest Books Ever Written The Eight Longest Album Titles Ever The Seven Longest Messages Sent Into Space The Fifteen Longest One-Syllable Words in the English Language The Thirty Grossest Words in the English Language The Thirty Prettiest Words in the English Language The Four Greatest Historical Contributors to the English Language How a Meme Became the Most Popular Word in the Whole World Six Gibberish-Sounding Sentences That Are Grammatically Correct Ten Everyday Words That Have No Rhyme The Most Common Words in the English Language Three of the Largest Publications Ever Created Three of the Smallest Publications Ever Created The Seven Longest Speeches Ever Given The Five Varieties of Shortest Stories Ever Written The Five Longest Words in English That Can Be Typed One-Handed Twenty of the Longest Words Without A, E, I, O, or U The Five Longest Names in the World Ten of the Craziest Auto-Antonyms in the English Language The Six Longest Song Titles Ever The Five Longest English Palindromes The Five Longest Sentences Ever Published The Five Longest Brand Names in the English Language The Eight Longest Made-Up Words in Fiction The Six Best-Selling Novels of All Time The Five Oldest Curse Words We Still Use The Ten Most Common English Names in Movies www.ZTCprep.com Introduction The greatest invention in human history is language.
    [Show full text]