Automatic Syllabification in European Languages: a Comparison of Data-Driven Methods

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Syllabification in European Languages: a Comparison of Data-Driven Methods AUTOMATIC SYLLABIFICATION IN EUROPEAN LANGUAGES: A COMPARISON OF DATA-DRIVEN METHODS by Connie R. Adsett Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie University Halifax, Nova Scotia June 2008 c Copyright by Connie R. Adsett, 2008 DALHOUSIE UNIVERSITY FACULTY OF COMPUTER SCIENCE The undersigned hereby certify that they have read and recommend to the Faculty of Graduate Studies for acceptance a thesis entitled \AUTOMATIC SYLLABIFICATION IN EUROPEAN LANGUAGES: A COMPARISON OF DATA-DRIVEN METHODS" by Connie R. Adsett in partial fulfillment of the requirements for the degree of Master of Computer Science. Dated: June 24, 2008 Supervisors: Dr. Yannick Marchand Dr. Vlado Keselj Reader: Dr. Qigang Gao ii DALHOUSIE UNIVERSITY DATE: June 24, 2008 AUTHOR: Connie R. Adsett TITLE: AUTOMATIC SYLLABIFICATION IN EUROPEAN LANGUAGES: A COMPARISON OF DATA-DRIVEN METHODS DEPARTMENT OR SCHOOL: Faculty of Computer Science DEGREE: MCSc CONVOCATION: October YEAR: 2008 Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. Signature of Author The author reserves other publication rights, and neither the thesis nor extensive extracts from it may be printed or otherwise reproduced without the author's written permission. The author attests that permission has been obtained for the use of any copyrighted material appearing in the thesis (other than brief excerpts requiring only proper acknowledgement in scholarly writing) and that all such use is clearly acknowledged. iii Syllables govern the world. - John Selden (1584{1654) iv Table of Contents List of Tables ................................... viii List of Figures .................................. x Abstract ...................................... xii List of Abbreviations and Symbols Used .................. xiii Acknowledgements ............................... xiv Chapter 1 Introduction .......................... 1 1.1 Goals and Objectives . 2 1.2 Main Contributions . 3 1.3 Thesis Outline . 3 Chapter 2 Motivation and Related Work ............... 5 2.1 Why Study Automatic Syllabification? . 5 2.1.1 Natural Language Processing . 6 2.1.2 Modeling Human Language Processing . 9 2.1.3 Comparison of Syllabic Complexity . 10 2.2 Data-driven or Rule-based Automatic Syllabification? . 13 2.2.1 English . 14 2.2.2 Italian . 15 2.3 Data-driven Syllabification Approaches . 16 2.4 Problem Definition . 19 Chapter 3 Languages and Lexicons Used ............... 21 3.1 Basque . 22 3.2 Dutch . 24 v 3.3 English . 25 3.4 French . 27 3.5 Frisian . 28 3.6 German . 29 3.7 Italian . 30 3.8 Norwegian . 32 3.9 Spanish . 32 3.10 General Lexicon Information . 34 3.11 Common Lexicon Creation . 37 Chapter 4 Algorithms ........................... 40 4.1 Syllabification as a Classification Problem . 40 4.2 Instance-Based Learning . 41 4.2.1 IB1 . 42 4.2.2 Look Up Procedure . 46 4.2.3 Example . 47 4.3 Liang's Hyphenation Algorithm . 50 4.4 Syllabification by Analogy . 59 4.5 Assessing Algorithm Performance . 65 Chapter 5 Results and Discussion .................... 67 5.1 Syllabification Benchmark . 67 5.1.1 IB1 Results . 68 5.1.2 Liang Results . 71 5.1.3 Look-up Procedure Results . 77 5.1.4 Syllabification by Analogy Results . 78 5.1.5 Comparison of All Algorithms . 80 5.2 Spelling versus Pronunciation . 85 5.3 Cross-language Study . 87 vi Chapter 6 Conclusion ............................ 91 References ..................................... 95 vii List of Tables Table 3.1 Character and entry information for each lexicon. 34 Table 3.2 The alphabets used for each lexicon. 36 Table 3.3 The maximum and average word and syllable lengths for each lexicon. 37 Table 4.1 The IB1 instance base entries for the word `able'. 46 Table 4.2 The weight vectors used in the Look Up Procedure as used by Weijters (1991); Marchand, Adsett, and Damper (in press); Ad- sett and Marchand (2007). The juncture to be classified lies between the letter at position −1 and the one at +1. 47 Table 4.3 The instance base resulting from the storage of the words `syl- lable', `able', and `available' with feature weights for both the Look Up Procedure and IB1-IG. 49 Table 4.4 The results of the calculations necessary to determine the weight of feature −3. ........................... 50 Table 4.5 The distances between each stored instance and those from `ta- ble' using both the Look Up Procedure and IB1-IG weights. 51 Table 4.6 The best matches and corresponding classifications for each in- stance from `table' according to the Look Up Procedure and IB1-IG weights. 52 Table 4.7 Potential patterns as processed at level 1 in Liang's algorithm using the words `syllable', `table', `tabulate', and `able'. 54 Table 4.8 Potential patterns as processed at level 2 in Liang's algorithm using the words `syllable', `table', `tabulate', and `able'. 56 Table 4.9 The parameters used to run Liang's algorithm. The abbrevia- tions g, b, and t are used to represented good weight, bad weight, and threshold, respectively. 57 Table 4.10 The matches found in the lexicon for the substrings of length six from the word `table'. 61 viii Table 4.11 Sample values for the paths from Figure 4.2 using each of the five scoring strategies for SbA. 64 Table 4.12 The rankings and points for each candidate according to the scoring strategy results reported in Table 4.11. 64 Table 4.13 Example of juncture accuracy evaluation for the word `table'. 66 Table 5.1 The average IB1-IG word accuracy results for each left and right context size. 71 Table 5.2 The word accuracy results obtained using Liang's Algorithm for each lexicon and parameter set. 72 Table 5.3 The juncture accuracy results for each lexicon using version 1 of the parameter sets for Liang's algorithm. 73 Table 5.4 All combinations of the five scoring strategies used to test the Syllabification by Analogy algorithm. 80 Table 5.5 Summary of the algorithm results (number of lexicons in which each algorithm had the best score, minimum word accuracy score, standard deviation of word accuracies over all lexicons) for both Full and Common lexicons. 84 Table 5.6 Comparison of performance between spelling and pronunciation domain mean results and the results of χ2 tests for significance. The `*' and `**' indicate that the results are statistically signif- icant with p < 0:05 and p < 0:01, respectively. 86 Table 5.7 Spelling and pronunciation character set sizes for each language ranked from the greatest difference between the two to the least. 87 Table 5.8 The rank order of the syllabic complexity of languages according to the mean word accuracy in the spelling and pronunciation domains. 87 Table 5.9 Languages ordered from highest to lowest frequency of CV syl- lables and from lowest to highest frequency of closed syllables (according to Frota and Vig´ario(2001)). 89 ix List of Figures Figure 2.1 Frequency of CV syllables in Dutch (Frota & Vig´ario,2001), English (Dauer, 1983), European Portuguese (EP) (Frota & Vig´ario,2001), French (Laks, 1995), Italian (Bortolini, 1976), and Spanish (Dauer, 1983). 12 Figure 3.1 The relationships between the nine languages studied with re- spect to the Indo-European language family. 21 Figure 3.2 The word length distribution for the spelling domain Common lexicons. 38 Figure 3.3 The word length distribution for the pronunciation domain Com- mon lexicons. 39 Figure 4.1 A subset of the syllabification lattice for `table' generated using the words `syllable', `able', `available', and `tabular'. 62 Figure 4.2 The possible shortest paths for the substring `a?b?l' from `ta- ble', were the arc from `a' to `l' not to exist. 63 Figure 5.1 The word accuracies for each lexicon with left and right contexts of six. The first letter of each lexicon label denotes the language and the second represents the domain (French and Frisian are distinguished by using `Fc' and `Fs', respectively). 69 Figure 5.2 The normalized IB1 weights generated for the German Spelling domain lexicon using the Information Gain (IG), Gain Ratio (GR), and χ2 equations. 70 Figure 5.3 Average pattern lengths across lexicons for each parameter set- ting of Liang's algorithm. 74 Figure 5.4 The average number of patterns generated for each parameter setting of Liang's algorithm (with standard error). 75 Figure 5.5 The percent of the patterns generated at each level for each parameter setting of Liang's algorithm. 76 Figure 5.6 The average word accuracy for each weight version of the Look Up Procedure. 77 x Figure 5.7 The word accuracy of version 10 of the Look Up Procedure weights for each lexicon. The first letter of each lexicon la- bel denotes the language and the second represents the domain (French and Frisian are distinguished by using `Fc' and `Fs', respectively). 78 Figure 5.8 The version 10 weights for the Look Up Procedure. 79 Figure 5.9 The mean word accuracies (with standard error) across all lex- icons for each strategy combination of Syllabification by Anal- ogy. 81 Figure 5.10 The number of lexicons where each scoring strategy combina- tion achieves word and juncture accuracy in the top three for the Syllabification by Analogy algorithm. 82 Figure 5.11 Comparison of algorithm word accuracy results over all the Full lexicons. The first letter of each lexicon label represents the language and the second denotes the domain. 83 Figure 5.12 The difference between the pronunciation and spelling domain mean word accuracies on those language with lexicons in both domains. 85 Figure 5.13 Each language ranking in the spelling and pronunciation do- main and the regression line given by the Spearman correlation statistical test.
Recommended publications
  • The Kleene Language for Weighted Finite-State Programming: User Documentation, Version 0.9.4.0
    The Kleene Language for Weighted Finite-State Programming: User Documentation, Version 0.9.4.0 This Document is Work in Progress Corrections and Suggestions Are Welcome Kenneth R. Beesley SAP Labs, LLC P.O. Box 540475 North Salt Lake Utah 84054, USA [email protected] 19 May 2014 Copyright © 2014 SAP AG. Released under the Apache License, Version 2. http://www.apache.org/licenses/LICENSE-2.0.html Beesley i Preface What is Kleene? Kleene is a programming language that can be used to create many useful and efficient linguistic applications based on finite-state machines (FSMs). These applications include tokenizers, spelling checkers, spelling correc- tors, morphological analyzer/generators and shallow parsers. FSMs are also widely used in speech synthesis and speech recognition. Kleene allows programmers to define, build, manipulate and test finite- state machines using regular expressions and right-linear phrase-structure grammars; and Kleene supports variables, rule-like expressions, user-defined functions and familiar programming-language control syntax. The FSMs can include acceptors and two-projection transducers, either weighted un- der the Tropical Semiring or unweighted. If this makes no sense, the pur- pose of the book is to explain it. Pre-edited Kleene scripts can be run from the command line, and a graphical user interface is provided for interactive learning, programming and testing. Operating Systems Kleene runs on OS X and Linux, requiring Java version 1.5 or higher. Prerequisites This book assumes only superficial familiarity with regular languages, reg- ular relations and finite-state machines. While readers need not have any experience with finite-state program- ming, those who have no programming experience at all, e.g.
    [Show full text]
  • Singing in English in the 21St Century: a Study Comparing
    SINGING IN ENGLISH IN THE 21ST CENTURY: A STUDY COMPARING AND APPLYING THE TENETS OF MADELEINE MARSHALL AND KATHRYN LABOUFF Helen Dewey Reikofski Dissertation Prepared for the Degree of DOCTOR OF MUSICAL ARTS UNIVERSITY OF NORTH TEXAS August 2015 APPROVED:….……………….. Jeffrey Snider, Major Professor Stephen Dubberly, Committee Member Benjamin Brand, Committee Member Stephen Austin, Committee Member and Chair of the Department of Vocal Studies … James C. Scott, Dean of the College of Music Costas Tsatsoulis, Interim Dean of the Toulouse Graduate School Reikofski, Helen Dewey. Singing in English in the 21st Century: A Study Comparing and Applying the Tenets of Madeleine Marshall and Kathryn LaBouff. Doctor of Musical Arts (Performance), August 2015, 171 pp., 6 tables, 21 figures, bibliography, 141 titles. The English diction texts by Madeleine Marshall and Kathryn LaBouff are two of the most acclaimed manuals on singing in this language. Differences in style between the two have separated proponents to be primarily devoted to one or the other. An in- depth study, comparing the precepts of both authors, and applying their principles, has resulted in an understanding of their common ground, as well as the need for the more comprehensive information, included by LaBouff, on singing in the dialect of American Standard, and changes in current Received Pronunciation, for British works, and Mid- Atlantic dialect, for English language works not specifically North American or British. Chapter 1 introduces Marshall and The Singer’s Manual of English Diction, and LaBouff and Singing and Communicating in English. An overview of selected works from Opera America’s resources exemplifies the need for three dialects in standardized English training.
    [Show full text]
  • Metathesis Is Real, and It Is a Regular Relation A
    METATHESIS IS REAL, AND IT IS A REGULAR RELATION A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics By Tracy A. Canfield, M. S. Washington, DC November 17 , 2015 Copyright 2015 by Tracy A. Canfield All Rights Reserved ii METATHESIS IS REAL, AND IT IS A REGULAR RELATION Tracy A. Canfield, M.S. Thesis Advisor: Elizabeth C. Zsiga , Ph.D. ABSTRACT Regular relations are mathematical models that are widely used in computational linguistics to generate, recognize, and learn various features of natural languages. While certain natural language phenomena – such as syntactic scrambling, which requires a re-ordering of input elements – cannot be modeled as regular relations, it has been argued that all of the phonological constraints that have been described in the context of Optimality Theory can be, and, thus, that the phonological grammars of all human languages are regular relations; as Ellison (1994) states, "All constraints are regular." Re-ordering of input segments, or metathesis, does occur at a phonological level. Historically, this phenomenon has been dismissed as simple speaker error (Montreuil, 1981; Hume, 2001), but more recent research has shown that metathesis occurs as a synchronic, predictable phonological process in numerous human languages (Hume, 1998; Hume, 2001). This calls the generalization that all phonological processes are regular relations into doubt, and raises other
    [Show full text]
  • Typesetting Catalan Texts with TEX
    Typesetting Catalan Texts with TEX Gabriel Valiente Feruglio Universitat de les Illes Balears Departament de Ciencies Matematiques i Inforrnatica E-07071 Palma de Mallorca (Spain) Internet: dmi gvaO@ps. ui b . es Robert Fuster Universitat Politecnica de Valencia Departament de Matematica Aplicada Cami de Vera, 14. E-46071 Valencia (Spain) Internet: mat5rf c@cci. upv .es Abstract As with other non-American English languages, typesetting Catalan texts imposes some special requirements on TEX. These include a particular set of hyphenation patterns and support for a special ligature: unlike other Romanic languages, Catalan incorporates the middle point in the 11 digraph. Hyphenation rules for Catalan are reviewed in ths paper, after a short introduction to hyphenation by TEX. A minimal set of hyphenation patterns covering all Catalan accents and diacritics is also presented. A discussion about the 1'1 ligature concludes the paper. This work represents a first step towards the Catalan TLP (TEX Language Package), under development within the TWGMLC (Technical Working Group on Multiple Language Coordination), where the first author is chairing the Catalan linguistic subgroup. Resum Aixi com en altres llengiies, la composicio de textos escrits en catala demana requeriments especials a1 TEX. Aquests inclouen un conjunt particular de patrons de guionat, aixi com suport per a un lligam especial ja que, a diferencia d'altres llengiies romaniques, el catala incorpora el punt volat a1 digraf I'l. En aquest paper es fa una introduccio a1 guionat arnb TEX i es revisen les regles de guionat per a1 catala. Tanmateix, es presenta un conjunt minim de patrons de guionat que cobreix tots els accents i marques diacritiques del catala.
    [Show full text]
  • Department of English Linguistics School of English and American Studies Eötvös Loránd University Budapest, Hungary Table of Contents
    The Odd Yearbook 2014 Department of English Linguistics School of English and American Studies Eötvös Loránd University Budapest, Hungary Table of Contents Foreword ................................................................................................................................i Fodor, Brigitta: Scottish Vowel Length: Regular Vowel Length Alternations and the Raising of /ae/ in Scottish Standard English ........................................................................................ 1 Pándi, Julianna Sarolta: Flapping in American English: A Theoretical Approach .............. 14 Szalay, Tünde: L Vocalisation in Three English Dialects .................................................... 35 Szabó, Petra Florina: Social and Regional Variation and Intrusive /r/ ................................. 49 Szabó, Lilla Petronella: “You’re a good friend, bro!”—A Corpus-based Investigation of the Meanings of ‘bro’ ................................................................................................................. 81 Kiss, Angelika: A remark on the Individual/Stage-Level Predicate Distinction in English ... 98 Biczók, Bálint: Approaches to Antecedent-Contained Deletion ......................................... 110 Tomacsek, Vivien: Approaches to the Structure of English Small Clauses ......................... 128 Kucsera, Márton: Restrictive relative clauses in Alignment Syntax .................................. 155 Foreword This is the ninth volume in the series ODD Yearbook, which is the collection of undergraduate
    [Show full text]
  • Archisegment-Based Letter-To-Phone Conversion for Concatenative Speech Synthesis in Portuguese
    ARCHISEGMENT-BASED LETTER-TO-PHONE CONVERSION FOR CONCATENATIVE SPEECH SYNTHESIS IN PORTUGUESE Eleonora Cavalcante Albano and Agnaldo Antonio Moreira LAFAPE-IEL-UNICAMP, Campinas, SP, Brazil [email protected] ABSTRACT sive. Interestingly, such exceptions occur precisely where complexity at both the phonetic and the morphological level leads A letter-to-phone conversion scheme is proposed for Portuguese to ambiguity in phonological analysis. which excludes representation of allophonic detail. Phonetically unstable segments are treated as archisegments, their articulatory Take, for example, the representation of rhyme nasality, which weakness being analyzed in terms of feature underspecification. motivates the controversy whether Portuguese has distinctive Besides solving classical problems of allophony and allomorphy, nasal vowels [4,5]. Such nasality, which tends to be rather heavy this analysis provides an efficient principle for building a unit phonetically, is orthographically represented as 'm' or 'n' before a inventory for concatenative speech synthesis. word internal consonant (e.g., samba, santa, sanca) and as a tilde diacritic on the vowel before another vowel or word finally (e.g., 1. PHONOLOGY AND LETTER-TO-PHONE são, sã). This cannot be said to be a reasonable phonemic representation because /m/ and /n/ do not contrast in rhymes, a CONVERSION fact that orthography acknowledges by making the choice of one or the other letter dependent on the following consonant ('m' Concatenative speech synthesis depends crucially on the adequacy before 'p,b' and 'n' elsewhere). Nor can it be said to be a of the phonological analysis underlying its unit list. In the same reasonable allophonic representation because the phonetic way as intelligibility requires concatenative units to be based on realization of nasal rhymes shows no correspondence with the a consistent minimal set of allophones, quality requires enriching diacritic/digraph distinction.
    [Show full text]
  • 1 Stress and Syllable Structure in English
    1 Stress and Syllable Structure in English: Approaches to Phonological Variations* San Duanmu, Hyo-Young Kim, and Nathan Stiennon University of Michigan Abstract 1. What is phonological variation? We use phonological variation to refer to alternative forms that can be used for more or less similar purposes. For example, in English a word made of CVCVCV can have stress on the first syllable, as in Canada, or on the second syllable, as in banana. There is no reason why the stress pattern could not have been the other way round, i.e. for Canada to have stress on the second syllable and for banana to have stress on the first. Nor is there any reason why stress in such words cannot be all on the first syllable, or all on the second. English just happens to use both forms. Similarly, an English word can be VC, such as Ann, CVC, such as sit, or CCCVC, such as split. There is no reason why a word must use one or another form and English just happens to use all those forms. Besides variations within a language, there are also variations across different languages. For example, before the nuclear vowel Standard Chinese allows CG- but not CC-, whereas English allows both CG- and CC-. Similarly, Standard Chinese only allows [–n] and [–ŋ] after the nuclear vowel, whereas English allows many more consonants. * Portions of this work were presented at University of Michigan in 2002, the Second North American Phonology Conference in 2002, Wayne State University in 2004, Peking University in 2004, the 10th Mid- Continental Workshop on Phonology in 2004, and National Chengchi University, Taipei, in 2005.
    [Show full text]
  • The Phonetics and Phonology of Some Syllabic Consonants in Southern British English
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Hochschulschriftenserver - Universität Frankfurt am Main The Phonetics and Phonology of some Syllabic Consonants in Southern British English Zoë Toft School of Oriental and African Studies University of London 1 Introduction This article presents new experimental data on the phonetics of syllabic /l/ and syllabic /n/ in Southern British English and then proposes a new phonological account of their behaviour. Previous analyses (Chomsky and Halle 1968:354, Gimson 1989, Gussmann 1991 and Wells 1995) have proposed that syllabic /l/ and syllabic /n/ should be analysed in a uniform manner. Data presented here, however, shows that syllabic /l/ and syllabic /n/ behave in very different ways, and in light of this, a unitary analysis is not justified. Instead, a proposal is made that syllabic /l/ and syllabic /n/ have different phonological structures, and that these different phonological structures explain their different phonetic behaviours. This article is organised as follows: First a general background is given to the phenomenon of syllabic consonants both cross linguistically and specifically in Southern British English. In §3 a set of experiments designed to elicit syllabic consonants are described and in §4 the results of these experiments are presented. §5 contains a discussion on data published by earlier authors concerning syllabic consonants in English. In §6 a theoretical phonological framework is set out, and in §7 the results of the experiments are analysed in the light of this framework. In the concluding section, some outstanding issues are addressed and several areas for further research are suggested.
    [Show full text]
  • Unicode Line Breaking Algorithm Technical Reports
    4/26/2018 UAX #14: Unicode Line Breaking Algorithm Technical Reports Proposed Update Unicode® Standard Annex #14 UNICODE LINE BREAKING ALGORITHM Version Unicode 11.0.0 (draft 1) Editors Andy Heninger ([email protected]) Date 2018-04-10 This Version http://www.unicode.org/reports/tr14/tr14-40.html Previous http://www.unicode.org/reports/tr14/tr14-39.html Version Latest http://www.unicode.org/reports/tr14/ Version Latest http://www.unicode.org/reports/tr14/proposed.html Proposed Update Revision 40 Summary This annex presents the Unicode line breaking algorithm along with detailed descriptions of each of the character classes established by the Unicode line breaking property. The line breaking algorithm produces a set of "break opportunities", or positions that would be suitable for wrapping lines when preparing text for display. Status This is a draft document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress. A Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, but is published online as a separate document. The Unicode Standard may require conformance to normative content in a Unicode Standard Annex, if so specified in the Conformance chapter of that version of the Unicode Standard. The version number of a UAX document corresponds to the version of the Unicode Standard of which it forms a part. https://www.unicode.org/reports/tr14/tr14-40.html 1/59 4/26/2018 UAX #14: Unicode Line Breaking Algorithm Please submit corrigenda and other comments with the online reporting form [Feedback].
    [Show full text]
  • What Is a Written Word? and If So, How Many? Martin Evertz-Rittich | University of Cologne
    What is a written word? And if so, how many? Martin Evertz-Rittich | University of Cologne /gʁafematik/ Grapholinguistics in the 21st century | 17.06.2020 Outline 1. Defining the written word in alphabetical writing systems 2. Properties of written words 3. Correspondence to elements in spoken language 4. Typological considerations 5. Summary /gʁafematik/ | Martin Evertz-Rittich | 17.06.2020 Defining the written word in alphabetical writing systems Part I Definition by spaces (e.g. Coulmas 1999, 550; Jacobs 2005, 22; Fuhrhop 2008, 193f.) (1) A graphematic word is a string of graphemes that is bordered by spaces and may not be interrupted by spaces. Problems: . <you.>, <you?>, <you!> . <Smiths’> (e.g. in the Smiths’ house), <mother-in-law> /gʁafematik/ | Martin Evertz-Rittich | 17.06.2020 Definition by spaces (Zifonun et al. 1997, 259; my translation) (1) A graphematic word is a string of graphemes that is bordered by spaces and may not be interrupted by spaces. (2) A graphematic word is a string of graphemes that is preceded by a space and may not be interrupted by spaces. Problems: . <you.>, <you?>, <you!> . <Smiths’> (e.g. in the Smiths’ house), <mother-in-law> . <“you”>, <(you)> /gʁafematik/ | Martin Evertz-Rittich | 17.06.2020 Towards a typographic definition: fillers and clitics . Characters and punctation marks can be divided into two classes (Bredel 2009) . Fillers . They can independently fill a segmental slot . Letters, numbers, apostrophes, hyphens . Clitics . They need the support of a filler . periods, colons, semi-colons, commas, brackets, question marks, quotation marks, exclamation marks /gʁafematik/ | Martin Evertz-Rittich | 17.06.2020 A typographic definition Evertz (2016a, 391-392 based on works of Bredel; my translation) (3) A graphematic word is a sequence of slot-filler-pairs surrounded by empty slots in which at least one filler must be a letter.
    [Show full text]
  • Resume Template
    SHUBHAM PATEL +91-7622086123 [email protected] http://shubhampatel.in/ EXPERIENCE AMAZON DEVELOPMENT CENTRE (INDIA) PVT. LTD. Bangalore, IN Software Development Engineer Aug ‘16-Present • Working with Seller Flex team on Warehouse Management System for Indian Sellers; launching worldwide in 2018. • Developed server-side multi-threaded RESTful APIs using Java. • Developed client-side web interface supporting Internationalization and Localization in Indic Languages. BERKMAN KLEIN CENTER, HARVARD UNIVERSITY Remote Open Source Software Developer (Google Summer of Code 2016) April ‘16-Aug ‘16 • Worked on easing reservations approval process and report generation in Book-a-nook. [code] • Developed 20 features including data visualization tool and notification system. [report][video][proposal] PRACTO TECHNOLOGIES Bangalore, IN Software Engineer – Intern Jan ‘16-April ‘16 • Created a scalable architecture for one-click production infrastructure replica. [term paper][slides] • Enabled automated software provisioning, code deployment and configuration management. AMAZON DEVELOPMENT CENTRE (INDIA) PVT. LTD. Bangalore, IN Software Development Engineer – Intern May ‘15-July ‘15 • Developed an automation tool for internal teams which facilitates periodic and on-demand notifications. • Successfully completed the project till beta production in 50% of the estimated time. BOOKLY WEB LLP. Gandhinagar, IN Technical Lead, Backend Developer Mar ‘14-Dec ‘14 • Led the team of 8 in building a college-oriented portal to buy/sell used books. [site][code] • Developed the server-side code in 2 months using Ruby on Rails. EDUCATION DHIRUBHAI AMBANI INSTITUTE OF INFORMATION AND COMMUNICATION TECHNOLOGY (DA-IICT) Gandhinagar, IN Bachelor of Technology, Information and Communication Technology, CPI: 8.57/10 2012-2016 • Student Advisor, Data Structures & Algorithms course, Autumn 2015-16 • Secondary Teaching Assistant, Data Mining and Warehousing course, Autumn 2015-16 • Head Manager of Google Developer Group, DA-IICT Chapter.
    [Show full text]
  • Ancient Latin Grammar
    Synopsis of Ancient Latin Grammar J. Matthew Harrington Ph.D. Tufts University – Department of Classics © August 2016 - Version 3.0 Section I: Writing Systems and Pronunciation 1.1 Writing Systems 1 1.2 Vocalic Phonemes 2 1.3 Consonantal Phonemes 2 1.4 Chart of Phonemes and Symbols in Classical Latin 3 1.5 Aspiration 3 1.6 Conditioned Sound Change 3 1.7 Syllabification 4 1.8 Quantity 4 1.9 Accent 5 1.10 Elision 5 1.11 Proto-Indo-European (PIE) 5 1.12 Latin Dialects 5 1.13 Educated Latin Pronunciation in the Early 1st Century CE 8 1.14 Vulgar and Late Latin Pronunciation 9 Section II: Case and Syntactic Function 2.1 Syntax 10 2.2 Semantic Word Order 10 2.3 Case and Morpho-Syntax 11 2.4 Adjectival Modification 12 2.5 Nominative 12 2.6 Genitive 12 2.7 Dative 14 2.8 Accusative 16 2.9 Ablative 19 2.10 Vocative 21 2.11 Locative 22 2.12 The Four Forms of Agreement 23 Section III: Mood and Subordination 3.1 Coordination/Subordination 24 3.2 Syntax of Verbal Forms 24 3.3 Independent Subjunctive Usages 25 A. Hortatory Subjunctive 25 B. Deliberative Subjunctive 25 C. Potential Subjunctive 25 D. Optative Subjunctive 25 3.4 Dependent Usages 25 I. ADJECTIVAL CLAUSES 26 A. Relative Clause 26 B. Relative Clause of Characteristic 26 C. Relative Clause in Indirect Statement 26 II. ADVERBIAL CLAUSES 26 A. Purpose Clause 27 B. Result Clause 27 i C. Consecutive Clause 28 D. Conditional Constructions 28 E.
    [Show full text]