An Indonesian Resource Grammar (INDRA) : and Its Application to a Treebank (JATI)
Total Page:16
File Type:pdf, Size:1020Kb
This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore. An Indonesian resource grammar (INDRA) : and its application to a treebank (JATI) Moeljadi, David 2018 Moeljadi, D. (2018). An Indonesian resource grammar (INDRA) : and its application to a treebank (JATI). Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/82540 https://doi.org/10.32657/10220/46580 Downloaded on 23 Sep 2021 20:50:33 SGT AN INDONESIAN RESOURCE GRAMMAR (INDRA) —AND ITS APPLICATION TO A TREEBANK (JATI)— DAVID MOELJADI SCHOOL OF HUMANITIES 2018 2 AN INDONESIAN RESOURCE GRAMMAR (INDRA) —AND ITS APPLICATION TO A TREEBANK (JATI)— DAVID MOELJADI School of Humanities A thesis submitted to the Nanyang Technological University in partial fulfilment of the requirement for the degree of Doctor of Philosophy 2018 Acknowledgement Many people have guided, helped, and supported me during my four year PhD candidature. It has been a period of intense learning for me in computational linguistics field, especially in grammar engineering. Without their guidance, support, and help, I might not survive. Thus, I would like to reflect on them and express my gratitude. I would like to thank my supervisor, Dr. Francis Bond, who introduced me to compu- tational linguistics field. Thank you very much for your guidance and help, as well asfor all of the opportunities I was given to conduct my research, present it, and publish several papers. Thank you for your inspiration and advice to build INDRA and JATI. I also thank my co-supervisor, Dr. I Wayan Arka, for supporting me writing my dissertation and re- plying my e-mails. I would like to thank Dr. František Kratochvíl in my thesis advisory committee. Thank you for sharing your deep knowledge on Malay linguistics and your comments on my papers and dissertation drafts. Thank you for giving me an opportunity to join your expedition in Alor and for starting a Malay meeting group at NTU. The NTU Computational Linguistics Lab is a stimulating and friendly place to do research and work. I am grateful to Song Sanghoun for his technical help in the implementation, especially the control and raising constructions in the early stage; to Michael Goodman for his help in GitHub and tools such as gTest; to Lê Tuấn Anh who always has something to dis- cuss and suggestions how to tackle problems; to Luís Morgado da Costa who organized a grammar engineering session and who always makes me laugh with his jokes; to Hannah Choi Yun Jung for her support and help (especially for Wordnet Bahasa); to Fan Zhen- zhen who is always willing to help; to Giulia Bonansinga who lent a helping hand when I was in trouble; and to Wang Huizhen for sharing information about HPSG and MRS. I would also like to thank the Malay meeting members at NTU: Nomoto Hiroki, Atiqah, and Jojo. I thank the HSS (now SoH) staff: Nurazean Binte Ismail, Amutha Shanmugam, and Christina Seet Mei Lin for their kind help on NTU administration/paper work. My PhD life was also supported by the NTU library staff. I would like to express my sincere gratitude to the HSS library support staff: Rashidah Ismail, Raihana Abdul Wahid, and Tan Chuan Ko for allowing me to borrow the fourth edition of KBBI paper dictionary for months and to Wong Oi May who helped me order the dictionary and other books such as reference grammars for Indonesian and Malay. Beyond NTU, the wider circle of DELPH-IN consortium was very generous. I owe a i ii special gratitude to Dan Flickinger who always has time to discuss problems, especially during the Global Wordnet Conference at NTU in January 2018, when I was busy writing this dissertation. I would like to thank Emily Bender for the precious comments, especially on clitics and passive voice, and for organizing the VLAD sessions. I am grateful to all of the following people for many different kinds of assistance in (computational) linguistics field: thank you very much to people in Badan Bahasa, especially to Ibu Dora Amaliawho gave me a precious experience to create the KBBI database and let me take part in a bigger KBBI project, I also thank her for the permission to use a part of the fifth edition of KBBI data for my treebank project; to Pak Adi Budiwiyanto, Ibu Menuk Suharjo, Pak Azhari Dasman Darnis, Mbak Vita Luthfia, Mbak Meryna Afrila, Mbak Winda Luthfita, Mbak Ambiya Ikrami Adji, and other staff who welcomed me warmly in the big KBBI project. Thank you very much for inviting me to various talks at different seminars and meetings in Jakarta. I thank Ivan Lanin for improving the KBBI database I created and making it more efficient. I also thank Lim Lian Tze who inspired me to write a paper aboutthe KBBI Database and for many things: for her help in improving Wordnet Bahasa, replying my emails and questions about LATEXand Overleaf, etc. My PhD journey is also supported by people in Singapore outside NTU and outside (computational) linguistics field. Thank you to Pastor Stephen Tong (唐崇荣) whose spirit and teaching encouraged me to learn and live the Word. Thank you to the Reformed Evangelical Church Singapore (GRII Singapura) members: Ian Kamajaya who helped me learn programming during my first years at NTU and lent a helping hand tomake the online KBBI page; Randy Sugianto (Yuku) who helped me with KBBI data, regular expressions, and other small things related to programming, also with the KBBI Android application; Juan Intan Kanggrawan who organized small group meetings and invited me to go there, through the meetings I learned how science and faith are related; Franky Chioh who gave me the book In The Beginning was The Word written by Vern Sheridan Poythress; my OSG small group members: Ko Carlos Wiyono Kurniawan who guides us patiently and lovingly, Steven Rusli, Ivan Edison, Harris, Frano Sumarauw, Willy Halim, thank you all for the prayers! I also thank my housemates in Singapore: I Made Riko, I Made Darmayuda (Kadek), Ratih Oktarini, and the little Bella, for supporting and helping me during my PhD life. I would express my special gratitude to my famiy: my mother Rut Widjiutami who loves me and prays for me every day, my uncle Subagio Budiharjo (CiKiang) who supported my big family (I still cannot believe that he passed away last December), and my cousins. I am extremely grateful to the government of Singapore, through the Ministry of Ed- ucation (MOE), and NTU, for full financial support in the form of an NTU scholarship which allowed me to work full-time on this dissertation for four years. My PhD project was supported in part by Singapore Ministry of Education (MOE) Tier 2 grant That’s what you meant: a Rich Representation for Manipulation of Meaning (MOE ARC41/13) and MOE iii Tier 2 grant Grammar Matrix Reloaded: Syntax and Semantics of Affectedness (MOE ARC21/13). I am grateful to the HSS (now SoH), NTU for a travel grant which enabled me to attend several conferences: the 19th International Symposium on Malay/Indonesian Linguistics (ISMIL 19) in Jambi, Grammar Engineering Across Frameworks 2015 (GEAF 2015) in Beijing, the Eighth Global WordNet Conference (GWC 2016) in Bucharest, the 12th DELPH-IN Summit at Stanford University, and the 20th International Symposium on Malay/Indonesian Linguistics (ISMIL 20) in Melbourne. I am also grateful to Fuji Xerox for the internship opportunity on Indonesian sentiment analysis in Yokohama in February-March 2016 and for supporting me to present my research results. Portions of this work have previously been presented at the following conferences: Grammar Engineering Across Frameworks 2015 (GEAF 2015) in Beijing, China on 30 July 2015; Joint 2016 Conference on Head-driven Phrase Structure Grammar and Lexical Functional Grammar (HeadLex16) in Warsaw, Poland on 26 July 2016; Chulalongkorn International Student Symposium on Southeast Asian Linguistics (Chula-ISSSEAL) in Bangkok, Thailand on 9 June 2017; The 11th International Conference of the Asian As- sociation for Lexicography (ASIALEX 2017) in Guangzhou, China on 10 June 2017; The 4th Atma Jaya Conference on Corpus Studies (ConCorps 4) in Jakarta, Indonesia on 21 July 2017; The 25th International Conference on Head-Driven Phrase Structure Gram- mar (HPSG 2018), in Tokyo, Japan, on 1 July 2018; and the International Symposium on Malay/Indonesian Linguistics (ISMIL), as well as DELPH-IN Summit in 2015, 2016, 2017, and 2018. I thank all the anonymous reviewers for the comments and participants for the discussion. This dissertation is typeset with LATEX, using Overleaf (www.overleaf.com). The gram- mar INDRA is on GitHub (https://github.com/davidmoeljadi/INDRA). I am grateful to all the people who developed GitHub and Overleaf. My laptop crashed twice when I was writing this dissertation. I did not worry very much because my dissertation, as well as INDRA, can be accessed online. Again, I thank Lim Lian Tze and my supervisor, Francis, for replying to my e-mails and giving me useful suggestions for my dissertation, Overleaf, and LATEX. δωρεὰν ελάβετε, δωρεὰν δότε “you received without paying; give without pay” Matthew 10:8b (ESV) Singapore, August 2018 iv Contents Acknowledgement i List of Abbreviations and Conventions xi List of Figures xiii List of Tables xv Abstract xvii I Background xix 1 Introduction 1 1.1 Statement of research ............................ 2 1.2 Major contribution of the dissertation ................... 2 1.3 Organization of the dissertation ....................... 4 2 Literature Review 7 2.1 Indonesian language ............................ 7 2.1.1 Historical and sociolinguistic background ............. 7 2.1.2 Overview of Indonesian grammar ................. 10 2.2 Theoretical framework ........................... 13 2.2.1 Head-driven Phrase Structure Grammar .............