I/ETS: Indonesian-English Machine Translation System using Collaborative P2P Corpus Hammam Riza, Budiono, Adiansya Prasetya and Henky Mulyadi Science and Technology Network Information Center (IPTEKnet) Agency for the Assessment and Application of Technology (BPPT), Indonesia
[email protected] Abstract. This paper is a preliminary result in developing a bidirectional machine translation system of Indonesian-English, by using open source software and creative common corpus. We will describe our method, starting with corpus collection process, followed by corpus processing and the software system for translation. The corpus is developed through a collaborative P2P development framework, a collective intelligence approach to building a parallel text of Indonesian- English. We further describe the component of the translation system which combine a hybrid symbolic-statistical technique. 1. Introduction In the era of globalization, communication among languages becomes much more important. People has been hoping that natural language processing and speech processing, which are part of ICT (Information and Communication Technology), can assist in smoothening the communication among people with different languages. However, especially for Indonesian language, there were only few researches in the past. Based on the fact that there is no large corpus available and it is of crucial importance, the first phase of this project is to build large bilingual Indonesian- English corpus. We use collective intelligence approach to build this corpus, which in turn are used to build modules for the hybrid symbolic-statistical Machine Translation (MT). 2. System Components There are two main components in building statistical machine translation system where both of these components are crucial. Additional supporting component is the symbolic modules.