Resources for Philippine Languages: Collection, Annotation, and Modeling
PACLIC 30 Proceedings Resources for Philippine Languages: Collection, Annotation, and Modeling Nathaniel Ocoa, Leif Romeritch Syliongkaa, Tod Allmanb, Rachel Edita Roxasa aNational University 551 M.F. Jhocson St., Sampaloc, Manila, PH 1008 bGraduate Institute of Applied Linguistics 7500 W. Camp Wisdom Rd., Dallas, TX 75236 {nathanoco,lairusi,todallman,rachel_roxas2001}@yahoo.com The paper’s structure is as follows: section 2 Abstract discusses initiatives in the country and the various language resources we collected; section In this paper, we present our collective 3 discusses annotation and documentation effort to gather, annotate, and model efforts; section 4 discusses language modeling; various language resources for use in and we conclude our work in section 5. different research projects. This includes those that are available online such as 2 Collection tweets, Wikipedia articles, game chat, online radio, and religious text. The Research works in language studies in the different applications, issues and Philippines – particularly in language directions are also discussed in the paper. documentation and in corpus building – often Future works include developing a involve one or a combination of the following: language web service. A subset of the “(1) residing in the place where the language is resources will be made temporarily spoken, (2) working with a native speaker, or (3) available online at: using printed or published material” (Dita and http://bit.ly/1MpcFoT. Roxas, 2011). Among these, working with resources available is the most feasible option given ordinary circumstances. Following this 1 Introduction consideration, the Philippines as a developing country is making its way towards a digital age, The Philippines is a country in Southeast Asia which highlights – as Jenkins (1998) would put it composed of 7,107 islands and 187 listed – a “technological culture of computers”.
[Show full text]