Syllable Identification
Total Page:16
File Type:pdf, Size:1020Kb
The University of Birmingham School of Computer Science MSc in Advanced Computer Science Summer Project Syllable Identification Norshuhani Zamin Supervisor: Dr. W H Edmondson September 2004 Abstract Syllabification is part of the linguistic problems and developing computer software to predict the syllable boundaries is a challenging task. In practice, it is easier to determine the syllable boundaries manually especially in a syllabic spelling system with the fact that we know the linguistic element of the language. Identifying syllable boundaries for English is a daunting process because English is an alphabetic spelling system. To write software, it is traditionally assumed that various sources of linguistic knowledge should be incorporated in order to convert words into their syllable structure with reasonable accuracy. The linguistic knowledge is important to define the graphotactic and phonetic rules. The purpose of this project has been to investigate the problem in English syllabification and to represent 2 different approaches to automatic detection of syllable boundaries. The first approach syllabifies a text from its grapheme or symbol while the second approach syllabifies a text from its sound. It was found that, many existing research on syllabification adopted the second approach. Although different researchers propose different knowledge structure but most of them used the typical architecture for grapheme-to-phoneme conversion while to go from text to grapheme or symbol is a new technique. In this project, I demonstrate the use of hand-written rules for English syllabification and knowledge structures trained on both approaches and compare the performance and accuracy of these approaches. The evaluation shows that going from text to symbol is easier and it performs better on finding the syllable boundaries than going from text to sound. Recommendations for future projects of this nature are made. Keywords Syllable; syllabification; maximum onset principle; phonotactic; graphotactic; diagraph rule; silent rules; orthography; syllabic consonant; consonant clusters; segmentation; constraints. 11 Acknowledgement After a long period of completing this master degree thesis, I would like to express my sincere gratitude to the following people who contributed in some way to this thesis. Dr. William Edmondson, my supervisor for the incredible amount of patience he had with me since the first time he knew me. He was the one who inspired me to do natural language processing which I never thought before. It was an absolute pleasure to have him as the supervisor. Thank you for the many discussions, motivations and wise words. I owe him lots of gratitude and I am very glad to get to know him in my life. Dr. Peter Coxhead, my second supervisor who took over the supervision when Dr. William Edmondson was away for his sabbatical studies. I learned many things from him and he was always very kind to me. As the Academic Manager in the school, he is always busy but always was available when I needed his advises. I am grateful for his invaluable support and excellent guidance. Dr. Ela Claridge, my Academic Advisor for providing academic and support service. Thank you for your advises while monitoring my academic progress. Last but not least, I am very grateful to my husband, Azmi for his love and patience during my study period. One of the best experiences that we lived through in this period was the birth of our son Anwar Aliff, who provided and additional and joyful dimension to our life. Ill Contents Abstract and keywords .......................................................................... ii Acknowledgement. .............................................................................. iii Figures .............................................................................................. vi Tables .............................................................................................. vii I Introduction .................................................................................. l 1.1 Background ......................................................................... I 1.2 Objective ............................................................................. 2 1.3 Organization of the studies ........................................................ 3 1.4 Scope and limitation ............................................................... 3 1.5 Research methodology ............................................................ .4 2 Literature Review ........................................................................... 4 3 An Overview of English Spelling .......................................................... 9 3.1 Phonology and orthography ..................................................... 10 3.2 Consonants and vowels .......................................................... I 7 3.3 Syllable structure .................................................................. 22 3.4 Problem to overcome ............................................................. 26 4 Methods ..................................................................................... 27 4.1 Approaches 4.1.1 Text- Symbol -Syllable ............................................ 28 4.1.2 Text- Sound- Syllable ............................................... 29 4.2 Data collection and analysis ..................................................... 23 4.3 Rules construction ................................................................ 30 4.4 Syllabification with Maximum Onset Principle ............................... 30 IV 5 Implementations ........................................................................... 47 5.1 Memory ............................................................................. 47 5.2 Data structures ...................................................................... 47 5.3 Modularity .......................................................................... 48 5.4 Input I Output.. ..................................................................... 48 5.5 Features ............................................................................. 42 5.6 Tools ................................................................................ 53 6 Performance and Justifications ........................................................... 54 7 Conclusions and Future Work ............................................................. 46 References ........................................................................................ 48 Bibliographies .................................................................................... 58 Appendices A Summer project declaration B IP A symbols with corresponding ASCII C English loanwords 0 IP A full chart E System Requirements and User Guide v Figures 3.1 Poem illustrating difficulties of English spelling and sounds ........................ I 0 3.2 Great Vowel Shift process ............................................................... 15 3.3 The consonants of English ............................................................... 19 3.4 The vowels of English ................................................................... 20 3.5 Diagram of vocal organs and articulatory regions .................................... 20 3.6 Conventional Syllable Structure ........................................................ 22 3.7 Example of William and Zhang's approach on syllable structure ................... 23 4.1 Syllabification flow chart ................................................................. 27 4.2 Maximum Onset Principle process ..................................................... 44 5.1 User interface ............................................................................... 39 5.2 Sample output for word 'signification' .................................................. 50 5.3 Sample output for word 'surreptitious' .................................................. 50 5.4 Sample output for word 'bedridden' .................................................... 51 5.5 Sample output for word 'representation' ............................................... 51 5.6 Sample output for word 'antidisestablishmentarianism' .............................. 52 5.7 Sample output for text 'access accurate occident accompany' ...................... 52 6.1 Text -7 Symbol-7 Sound with syllable boundaries ................................... 57 VI Tables 3.1 Name of vocal organs and articulatory regions ...................................... 21 3.2 English open syllables .................................................................. 24 3.3 English closed syllables ................................................................. 25 4.1 Example of Text~ Symbol~ Syllable approach .................................. 28 4.2 Example of Text~ Sound~ Syllable approach .................................... 23 4.3 Graphotactic rules ........................................................................ 30 4.4 List of permissible onset sequences for symbol. .................................... 32 4.5 Pattern of consonant clusters for symbol.. ............................................ 33 4.6 Diagraph rules ............................................................................ 35 4.7 Vowel rules ............................................................................... 37 4.8 Consonant rules ........................................................................... 39 4.9 Silent rules ...............................................................................