Pola Technique to Identify and in Malaysian

Mohd Juzaiddin Ab Fatimah Dato’ Abdul Azim Abdul Ramlan Mahmod Aziz Ahmad Ghani Fakulti Teknologi & Fakulti Sains Fakulti Sains Fakulti Sains Sains Maklumat, Komputer & Komputer & Komputer & Universiti Kebangsaan Teknologi Maklumat, Teknologi Maklumat, Teknologi Maklumat, Malaysia, Universiti Putra Universiti Putra Universiti Putra 43600 Bangi, Malaysia Malaysia Malaysia Selangor, Malaysia 43400 Serdang, 43400 Serdang, 43400 Serdang, Selangor, Malaysia Selangor, Malaysia Selangor, Malaysia [email protected] [email protected] [email protected]. [email protected]. .edu.my edu.my edu.my

(Nik Safiah, 1975). The pola that were Abstract suggested by Asmah (1968) are:

The Malaysian Language is a . Pelaku + Perbuatan (Actor + ) formation of subject, predicate and ii. Pelaku+Perbuatan+Pelengkap (Actor + . The subject is the that take Verb + ) the action on the object and the iii. Perbuatan + Pelengkap (Verb + predicate is the verb in the Complement) . Without a good corpus that iv. Diterangkan + Menerangkan (Signified + can provide the part of , Signified) is a complex process. As an option to v. Digolong + Penggolong ( Classified + the parsing, this paper discusses a way ) to identify the subject and the vi. Pelengkap + Perbuatan + Pelaku predicate, known as the pola-grammar (Complement + Verb + Actor) technique. A pola or a pattern to be vii. Pelengkap + Perbuatan (Complement + identified in the sentence are the Verb) , Subject, , Predicate and Object. From the list above, the pola that used to start a sentence are: pelaku (actor), perbuatan (verb), 1 Introduction diterangkan (signified), digolongkan (classified) and pelengkap (complement). Pelaku (actor) is a The Malaysian language is a context free noun and perbuatan is a verb. The grammar where there is a subject and a predicate diterangkan (signified), digolongkan (classified) (Nik Safiah et. al., 1993). According to the and pelengkap (complement) are adjuncts. An research done by Azhar (1988), there are three adjunct is an that has a less tightly types of Malaysian language context- related to the subject and predicate. do not independent grammar. is sentence grammar represent the subject, verb or the object of the (Nik Safiah, 1975) and (Yeoh, 1979), the second sentence. is the partial discourse grammar (Asmah, 1980) and the third is ‘pola’ sentence (Asmah, 1968). Abdullah (1980) modified the ealier version of pola grammar (Asmah,1968) with a new set Asmah (1968) worked on pola grammar was of pola. The examples of the Abdullah’s pola are accepted as a standard format for the Malaysian noun+noun and noun+verb. language’s grammar before was replaced by the transformational-generative type grammar

185 2 Pola Grammar Tujuan pengkompil adalah untuk menukar Bahasa Paras Tinggi kepada Bahasa Paras The term pola refers to ‘pattern’ as in “sentence Rendah. --(2) pattern”. Asmah et al. (1995) use the regular expression representation of Nn, N1, A, N1, V.*. Walaupun pengkompil menukar Bahasa Paras to represent the pola. Nik Safiah et. al (1993) Tinggi kepada Bahasa Paras Rendah, tetapi, use the format of (NP) + Noun tugas utamanya adalah untuk menyemak sintaks Phrase (NP), Noun Phrase (NP) + bahasa. -- (3) (VP), Noun Phrase (NP) + Phrase (NP), Noun Phrase (NP) + Preposition Phrase Table 1a : the pola for sentence (1) (PP) to show the basic format of the language, Sentence (1) which consist of pola. These format will be used Adjunct Null as the basic to identify the subject and predicate. Subject Pengkompil There will be more pola added into the format, PostSubject Null Conjunction Null they are adjunct, subject, postSubject, Predicate Menukar Bahasa Paras Tinggi kepada conjunction, and predicate. Bahasa Paras Rendah

The subject is either a noun, pronoun or a Table 1b : the pola for sentence (2) verb that functions as a noun, or an adjective Sentence (2) that functions as noun. The postSubject Adjunct Tujuan describes the subject. In Malaysian language, Subject Pengkompil the postSubject is normally starts with the PostSubject Null Conjunction Adalah untuk ‘yang’ or ‘dengan’. The conjunction represent Predicate Menukar Bahasa Paras Tinggi kepada the words that join words or or Bahasa Paras Rendah sentences together. The predicate is a theme that says something about the subject. Table 1c : the pola for sentence (3) Sentence (3) 2.1 The Rules Adjunct Walaupun From a basic pola, the components are inserted Subject Pengkompil PostSubject Null into the to produce a new rules. Examples of the Conjunction Null rules are: Predicate Menukar Bahasa Paras Tinggi kepada Bahasa Paras Rendah, tetapi, tugas [Adjunct + (NP)1 + conjunction] + [(NP)2] utamanya adalah untuk menyemak (NP)1 Æ Noun + postSubject sintaks bahasa. (NP)2 Æ Predicate Predicate Æ object -- rule (1) The pola shows that “Pengkompil” is the subject of sentences (1), (2) and (3), even [Adjunct + (NP) + conjunction] + [(VP)] though there are no adjunct and conjunction in (VP) Æ Predicate sentence (1). The sentence (3) do not has a Predicate Æ verb + object + ()1 conjunction, but the predicate is longer than the (adverb)1 Æ conjunction + object + conjuction + predicates in sentence (1) and (2). (adverb)2 -- rule (2) To test the design, let take the sentence (1) as Table 1a, 1b and 1c. show the examples of the input. the Malaysian sentences in the pola format. The sentences used in the table are the sentences “Pengkompil menukar bahasa paras tinggi taken from (1), (2) and (3) as below: kepada bahasa mesin”.

Example of sentences, Step 1

Pengkompil menukar Bahasa Paras Tinggi Choose a basic format --- rules 2, kepada Bahasa Paras Rendah. -- (1)

186 Step 2 Nod B yang mempunyai nilai terendah berada berhampiran dengan Nod A merupakan Nod Identify the pola of the adjunct, subject, yang paling sesuai untuk dilalui. --(4) postSubject, conjunction and predicate. Subject (Pengkompil) Predicate[menukar The terminal for the Verb Phrase is a verb. In bahasa paras tinggi kepada bahasa mesin]. sentence (4), the verb is the word “mempunyai” where it is in the third position of the sentence. Step 3 If the parser select “mempunyai” as a verb, it will cause an ill-grammar problem because the Identify the pola of the verb, object, conjunction word is actually in the noun . This is due and adverb. to the fact that some which a word Predicate: Verb (menukar) Object (bahasa paras “yang” do not has a sign to stop (Azhar, 1988). tinggi) Conjunction (kepada) Adverb( bahasa The pola grammar techniques solve this problem mesin) by introducing a pola called postSubject. Adverb : Object (bahasa mesin) 4 The Model 3 Related Work The sequence of the pola in the sentence is Rosmah (1997) developed an algorithm to shown as below: derive Malaysian language using the Context Free Grammar (CFG) rules and a . The Adjunct + Subject + postSubject + conjunction CFG was initially developed by Nik + conjunction + predicate (Output) Safiah(1975), followed by Yeoh (1979). A finite automaton will be used to recognize The derivation by Rosmah (1997) identified the pola. It is a mathematical model of a system the subject and the predicate in a simple represented as: Malaysian sentences. To do that, there was a module to identify the lexical values, such as a (Q, ∑, S, R), where noun, a verb and an adverb. Q is a finite set of states The major problem occurs in the process is ∑ is a finite set of input to solve the . There are a lots of S is the initial state Malaysian words that can be either a verb or a R is the transition relational which maps the noun. For example, the word ‘mereka’ can be input and states either a pronoun (‘they’) or a verb (‘design’) and the word ‘pohon’ is either a verb (‘request’) The states are the adjunct, subject, or a noun (‘tree’). As the result, the parsing was postSubject (postSub), conjunction (conj), and very costly and it easily produced a wrong predicate. They are the pola used in this study. tree. The input will be a list of words in the sentence.

The second problem was the problem to The transition relational capture the pola parse a complex CFG rules for the based on the rules of the pola grammar sentences when there is no sign to stop. For technique. The algorithm of the transition instance, based on sentence (4) and the Context relational is shown as follow: Free Grammar below: Case 1 – adjunct Sentence Æ Subject + Predicate If input is Subject Æ Noun Phrase 1. subornating, insert adjunct Predicate Æ Verb Phrase 2. classifier, insert adjunct Noun Phrase Æ Noun 3. numeric, insert adjunct Verb Phrase Æ Verb 4. ini, itu or “,” insert adjunct a. if “,” insert adjunct

187 start at subject prStart subject 5. , insert adjunct 3. Ncon insert subject a. if start = adjunct, insert subject start conj start subject, prStart subj prStart adjunct 4. conjuction insert conj b. else insert predicate start conj start predicate, prStart conj prStart adjunct 5. subornating insert predicate 6. yang, itu, ini insert adjunct start predicate a. if j = 1 while not “adalah” or prStart subject not “ialah” 6. verbs or nafi b. if “.” insert null if prStart = adjunct or prToken = {stop process} “dan” or “atau” insert subject i. else insert adjunct start subject start conj prStart subject prStart adjunct else insert predicate c. else while jumpa = false and j start predicate <= no of token prStart subject d. if “.” insert null {stop process} 7. “,” insert subject i. elseif “,” i. if lookahead (j, 7, “,”) insert adjunct, TRUE start subject jumpa true, prStart subject start subject, ii. else if lookahead (j, 7, prStart adjunct “and”) TRUE ii. elseif “itu”, “ini” start subject insert adjunct, prStart subject jumpa true, ii. else start subject, start subject prStart adjunct prStart subject 1. if “,” 8. else insert subject insert adjunct start subject 2. else j = j –1 prStart subject iii. else {not “,” or itu, ini} insert adjunct Case 3 – postSubject start adjunct If input is e. wend 7. if “,” insert adjunct While token <> “,” and jumpa = False start subject and token < ListCount 8. else insert subject 1. “.” Start 6, prStart postSub start subject, 2. “ini” or “itu” insert postSub prStart subject start conj prStart postSub Case 2 – subject jumpa TRUE If input is 3. conjunction insert conj start conj 1. “yang” or “adalah” insert postSub prStart postSub start postSub jumpa TRUE prStart subject 4. else insert postSub start postSub prStart postSub 2. “itu” or “ini” or “tersebut” insert subject jumpa TRUE start subject Wend

188 5. if “,” insert postSub The words ‘dengan’ and ‘komputer’ should start conj be as a part of the subject. prStart postSub 6. if Ncon insert postSub 2. start postSub Adjunct : prStart postSub Subject : Sasaran 7. verbs insert predicate PostSubject : yang tidak tentu tidak start predicate Conjuction : akan prStart postSub Predicate : mewujudkan penyelesaian Case 4 – conjunction yang lengkap If input is The words ‘tidak’ shows the negative of 1. conjunction insert conj ‘akan’. So, they should be together ini the start conj conjunction. prStart conj 2. subornating insert conj 3. start conj Adjunct : Pada prStart conj Subject : peringkat awal dan 3. else insert predicate PostSubject : start prediacte Conjuction : pada prStart predicate Predicate : peringkat akhir,penjanaan jadual waktu dengan komputer masih memerlukan Case 5 – predicate penglibatan penskedul jadual If input is The words ‘pada’, ‘peringkat’ and ‘akhir’ While token <> “.” And token < ListCount should be the second subject. Insert predicate wend 4. Adjunct : Case 6 – stop Subject : Aliran End PostSubject : Conjuction : 5 Testing Predicate : kerja boleh ditakrifkan sebagai satu kaedah untuk mengautomasikan dan The algorithm was tested with thirteen (13) mengawal pergerakan proses yang melibatkan abstracts’ thesis, Masters in Computer Science sekurang-kurangnya dua entiti bergerak dari sati and Information Technology from Faculty of entiti secara turutan atau serentak berpandukan Technology and Information Science. The total pada syarat-syarat yang telah ditetapkan bagi number of sentences used in the testing were mencapai matlamat yang sama one hundred and twelve (112). The words ‘aliran’ and ‘kerja’ are . In The test show that 6 sentences do not this ‘kerja’ sentence, it was interpreted as a verb produce a precise results. The sentences are as by the program. follow: 5. 1. Adjunct : Tetapi, Adjunct : Adalah didapati bahawa Subject : Subject : penyelesaian masalah PostSubject : jadual waktu Conjuction : PostSubject : Predicate : didapati pelaksanaan Conjuction : dengan pembelajaran dengan paten-paten yang agak besar Predicate : komputer memerlukan mewujudkan kesilapan yang agak besar yang satu pemindahan paradigma menghadkan proses penjanaan sistem pengetahuan domain tersaur

189 This sentence contains the word ‘didapati’ correct lexical values. A corpus such as which was interpreted as a verb. It is actually an WordNet (Fellbaum, 1998), will reduce the adjunct where the subject is “perlaksanaan problems such as and backtracking. pembelajaran”. Since there is no such corpus in Malaysian 6. language, a pola grammar technique is Adjunct : introduced to identify the Subject : Sistem Pengurusan for the language. The result discussed in this Maklumat Makmal Kimia paper proved that the pola grammar can extract PostSubject : the subject, verb and object. Conjuction : Predicate : Berasaskan Multimedia: Satu Kajian Kes dibangunkan untuk tujuan pengurusan stok bahan kimia peralatan dan radas yang digunakan di makmal kimia sekolah menengah Abdullah Hassan. 1980. Linguistik Am Untuk Guru Bahasa Malaysia. Penerbit Fajar Bakti, The subject of the sentence should be Kuala Lumpur. “Sistem Pengurusan Maklumat Makmal Kimia Berasaskan Multimedia. Asmah Haji Omar and Rama Subbiah. 1995. An Introduction To . Dewan 6 Analysis Bahasa dan Pustaka, Kuala Lumpur. The results show that the pola sentence can be Asmah Haji Omar. 1980. Nahu Melayu Mutakhir. used to clarify the subject and predicate in the Dewan Bahasa dan Pustaka, Kuala Lumpur. Malaysian sentence. The problems occurs in the Asmah Haji Omar. 1968. Morfologi-sintaksis 6 sentences were caused by : Bahasa Melayu (Malaya) dan Bahasa Indonesia: Satu Perbandingan Pola. Dewan a. The existing of the conjunction ‘dengan’ in Bahasa dan Pustaka, Kuala Lumpur. the subject. The words that follow this Azhar M. Simin. 1988. Discourse-Syntax of word can either be as a postSubject or a “YANG” in Malay (Bahasa Malaysia). Dewan subject. Bahasa dan Pustaka, Kuala Lumpur. b. The nouns are varied and do not have a common pattern. Fellbaum.C. 1998. WordNet: An Electronic c. The words ‘tidak’, to show a negative Lexical Database. The MIT Press, Cambridge, sentence do not locate in the right position. Massachusetts. d. The verbs that act as a noun. Nik Safiah Karim.1975. The Major Syntactic

Structures of Bahasa Malaysia and their Problem (b) can be fixed by supplying the Impilcations of Standardization of the noun information to the application. Problems Language. Ph.D. dissertation. Ohio (c) and (d) can be fixed by improving the University, USA. algorithm. Problem in (a), needs further studies and enhancement due to the fact that the word Nik Safiah Karim, Farid M. Onn, Hashim Hj. ‘dengan’ can be either a conjunction or a word Musa, Abdul Hamid Mahmood. 1993. to describe its’ subject. Tatabahasa Dewan, Edisi Bahar. Dewan Bahasa dan Pustaka, Kuala Lumpur. 7 Conclusion Rosmah Latif. 1997. Sintaksis Ayat Bahasa A pola grammar was excepted as a formal Malaysia. Tesis Sarjana, Universiti grammar for the Malaysian language. But, the Kebangsaan Malaysia, Bangi. Chomskyian revolution makes the linguist to Yeoh, Chiang Kee. 1979. Interaction of Rules in produce a Context Free format for the language. Bahasa Malaysia. Ph.D. dissertation, For computational purposes, good corpus is University of Illinois at Urbana-Champaign, needed to provide the information in order to USA. parse the language, for instance to provide the

190