2016 International Conference on Informatics and Computing (ICIC)

2016 International Conference on Informatics and Computing (ICIC) took place October 28-29, 2016 in Mataram, .

IEEE catalog number: CFP16G52-ART ISBN: 978-1-5090-1648-8

Copyright and Reprint Permission: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Operations Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved. Copyright © 2016 by IEEE. Committees

General Chair

Zainal Hasibuan (Universitas Indonesia, Indonesia)

Program Chair

Achmad Mutiara (Universitas Gunadarma, Indonesia)

Program Co-Chairs

Riyanarto Sarno (Institut Teknologi Sepuluh Nopember, Indonesia) Ryoichi Sasaki (Tokyo Denki University, Japan)

Publication Co-Chairs

Media A Ayu (Sampoerna University, Indonesia) Tole Sutikno (Universitas Ahmad Dahlan, Indonesia)

Publicity Co-Chairs

Anthony Anggrawan (STMIK Bumigora Mataram, Indonesia) Achmad Hidayanto (, Indonesia) Harry Budi Santoso (Universitas Indonesia, Indonesia)

Technical Program Committee Chair

Teddy Mantoro (Sampoerna University, Indonesia)

Technical Program Committee

Normaziah Abdul Aziz (International Islamic University Malaysia, Malaysia) Evizal Abdul Kadir (Universitas Islam Riau, Indonesia) Adamu Abubakar (International Islamic University Malaysia, Malaysia) Noor Azurati Ahmad (Universiti Teknologi Malaysia, Malaysia) Gerard G Borg (Australian National University, Australia) Eko K Budiardjo (Faculty of Computer Science, Universitas Indonsia, Indonesia) Herbert Dawid (Bielefeld University, Germany) Mohammad Essaaidi (Abdelmalek Essaadi University, Morocco) Frederic Ezerman (Nanyang Technological Univiversity, Singapore) Sabir Jacquir (Universite de Bourgogne, Laboratoire Le2i UMR CNRS, France) Wisnu Jatmiko (University of Indonesia, Indonesia) Ismail Khalil (Institute of Telecooperation Johannes Kepler University Linz, Austria) Tubagus Maulana Kusuma (Gunadarma University, Indonesia) Naufal M. Saad (Universiti Teknologi Petronas, Malaysia) Murni Mahmud (International Islamic University Malaysia, Malaysia) Rila Mandala (Bandung Institute of Technology, Indonesia) Alamin Mansouri (Universite de Bourgogne, France) Fabrice Meriaudeau (University of Bourgogne, France) Sardjoeni Moedjiono (, Indonesia) Salwani Mohd. Daud (Universiti Teknologi Malaysia, Malaysia) Christophoros Nikou (University of Ioannina, Greece) Lukito Edi Nugroho (Universitas Gadjah Mada, Indonesia) Michel Paindavoine (Universite de Bourgogne, France) Anton Satria Prabuwono (King Abdulaziz University, Saudi Arabia) Eri Prasetyo Wibowo (Universitas Gunadarma, Indonesia) Yudi Prayudi (Universitas Islam Indonesia, Indonesia) Prihandoko Prihandoko (University of Gunadarma, Indonesia) Fredy Purnomo (Bina Nusantara University, Indonesia) Ayu Purwarianti (Bandung Institute of Technology, Indonesia) Paulus Santosa (Universitas Gadjah Mada, Indonesia) Bharanidharan Shanmugam (Darwin University, Australia) Waralak Vongdoiwang Siricharoen (University of the Thai Chamber of Commerce, Thailand) Heru Suhartanto (Universitas Indonesia, Indonesia) Iping Supriana (Bandung Institute of Technology, Indonesia) Kridanto Surendro (Institu Teknologi Bandung, Indonesia) Wendi Usino (Universitas Budi Luhur, Indonesia) Muhammad Zarlis (Universitas Sumatera Utara, Indonesia) Youssef Zaz (University of Abdelmalek Essaadi, Morocco) Ahmed M. Zeki (University of Bahrian, Bahrain) Akram M. Zeki (International Islamic University Malaysia, Malaysia) Program

2016 International Conference on Informatics and Computing (ICIC)

Technical Session 1-1

Implementation of Pixel Based Adaptive Segmenter Method for Tracking and Counting Vehicles in Visual Surveillance Muhammad Brilliant Subaweh (Gunadarma University, Indonesia), Eri Prasetyo Wibowo (Universitas Gunadarma, Indonesia) 1 Feature Selection Algorithm using Information Gain Based Clustering for Supporting The Treatment Process of Breast Cancer Tresna Maulana Fahrudin (Politeknik Elektronika Negeri Surabaya & EEPIS, Indonesia), Iwan Syarif (Politeknik Elektronika Negeri Surabaya (PENS), Indonesia), Ali Ridho Barakbah (Politeknik Elektronika Negeri Surabaya, Indonesia) 6 A Soft Set Approach for Fast Clustering Attribute Selection Dedy Hartama (Universitas Sumatera Utara & USU, Indonesia), Iwan Riyadi Yanto (Universitas Ahmad Dahlan, Indonesia), Muhammad Zarlis (Universitas Sumatera Utara, Indonesia) 12 Relationship Between Compressed Image Quality And Quality of Experience for Wireless File Sharing Regina Lionnie (Universitas Mercu Buana, Indonesia), Agus Pristiawan (Mercubuana, Indonesia), Bagus Prasetyo (, Indonesia), Rizal Broer Bahaweres (Universitas Mercu Buana, Indonesia, Indonesia), Mudrik Alaydrus (Universitas Mercu Buana, , Indonesia) 16

Technical Session 1-2

An Analysis of Haar Wavelet Transformation for Androgenic Hair Pattern Recognition Regina Lionnie (Universitas Mercu Buana, Indonesia), Mudrik Alaydrus (Universitas Mercu Buana, Jakarta, Indonesia) 22 Cluster Visualization of Student's Entrance Score Using Smoothed Data Histograms Kadek Cahya Dewi ( State Polytechnic, Indonesia), Putu Indah Ciptayani (Bali State Polytechnic, Indonesia) 27 The Effective Noise Removal Techniques and Illumination Effect in Face Recognition Using Gabor and Non-Negative Matrix Factorization Derwin Suhartono (Bina Nusantara University, Indonesia) 32 WMEVF: An Outlier Detection Methods for Categorical Data Nur Rokhman (, Indonesia), Subanar Subanar (Universitas Gadjah Mada, Indonesia), Edi Winarko (Universitas Gadjah Mada, Indonesia) 37

Technical Session 1-3

A Proposed Combination of Photogrammetry, Augmented Reality and Virtual Reality Headset for heritage visualisation Edson Putra (University of Klabat, Indonesia), Andria Wahyudi (University of Klabat, Indonesia), Charlie Dumingan (University of Indonesia, Indonesia) 43 Adaptive Edge Detection and Histogram Color Segmentation for Centralized Vision of Soccer Robot Arnold Aribowo (Universitas Pelita Harapan, Indonesia), Giorgy Gunawan (Universitas Pelita Harapan, Indonesia), Hendra Tjahyadi (Universitas Pelita Harapan, Indonesia) 49 Bull Sperm Motility Measurement using Improved Matching-Based Algorithm and Ellipse Detection Priyanto Hidayatullah (Bandung State Polytechnic, Indonesia), Muhammad Nuriyadi (Politeknik Negeri Bandung, Indonesia), Iwan Awaludin (Institut Teknologi Bandung & Politeknik Negeri Bandung, Indonesia), Eros Sukmawati (Artificial Insemination Center Lembang, Indonesia), Dwi Utami (Artificial Insemination Center Lembang, Indonesia), Akbar Akbar (Bandung State Polytechnic, Indonesia) 55 3D Reconstruction of Dynamic Vehicles using Sparse 3D-Laser-Scanner and 2D Image Fusion Dennis Christie (Gunadarma University, Indonesia), Cansen Jiang (Universite de Bourgogne, France), Danda Paudel (University of Burgundy, France), Cédric Demonceaux (Université de Bourgogne & Le2i UMR CNRS 6306, France) 61

Technical Session 1-4

Architecture Vision for Indonesian Integrated Agriculture Information Systems Using TOGAF Framework Rosa Delima (Duta Wacana Christian University, Indonesia), Halim Santoso (Duta Wacana Christian University, Indonesia), Joko Purwadi (Duta Wacana Christian University, Indonesia) 66 Barriers Factors in Contributing Knowledge at Virtual Communities (A Literature Review) Setiawan Assegaff (STIKOM Dinamika Bangsa & ISRG STIKOM DB, Indonesia), Kurniabudi Kurniabudi (STIKOM Dinamika Bangsa, Indonesia) 72 Knowledge Management Readiness of Research Agencies: A Case of BATAN Indonesia Abrar Hedar (University Indonesia, Indonesia), Dana Sensuse (Laboratory of E-Government, Indonesia), Puspa Sandhyaduhita (Universitas Indonesia, Indonesia) 78 Factors Affecting Knowledge Sharing and Knowledge Utilization Behavior in an Indonesian Airline Company Muhammad Rifki Shihab (Universitas Indonesia, Indonesia), Wahyu B Anggoro (Universitas Indonesia, Indonesia), Achmad Hidayanto (University of Indonesia, Indonesia) 84

Technical Session 1-5

E-learning for facilitating learning Ariana Azimah (, Indonesia), Heni Jusuf (Universitas Nasional, Indonesia), Rangga Firdaus (Universitas Lampung, Indonesia) 90 Automatic Detection of Learning Style in Adaptive Online Module System Arief Hidayat (STIMIK Pro Visi, Indonesia), Victor Utomo (STIMIK Pro Visi, Indonesia) 94 A Conceptual Green-ICT Implementation Model Based-on ZEN and G-Readiness Framework Marcel Yap (Krida Wacana Christian University, Indonesia) 99 Sundanese Ancient Manuscript Retrieval System Comparison of Two Probability Approaches Mira Suryani (Universitas Padjadjaran, Indonesia), Ayi Muhammad Iqbal Nasuha (Universitas Komputer Indonesia, Indonesia), Setiawan Hadi, M. Sc. CS. (Universitas Padjadjaran, Indonesia) 105

Technical Session 1-6

Performance Evaluation of Routing Protocols RIPng, OSPFv3, and EIGRP in an IPv6 Network Siti Ummi Masruroh (UIN Syarif Hidayatullah Jakarta & National ICT Center, Indonesia), Fadly Robby (UIN Syarif Hidayatullah, Indonesia), Nashrul Hakiem (UIN Syarif Hidayatullah, Indonesia) 111 Usage Area and Speed Performance Analysis of Booth Multiplier on Its FPGA Implementation Antonius irianto Sukowati (Sekolah Tinggi Teknik Multimedia Cendekia Abditama, Indonesia), Hendri Putra (Gunadarma University, Indonesia), Eri Prasetyo Wibowo (Universitas Gunadarma, Indonesia) 117 Automatic Controller Component Development Using FPGA Device Sunny Arief Sudiro (STMIK Jakarta STI&K, Indonesia), Bheta Agus Wardijono (STMIK Jakarta STI&K, Indonesia), Darwis Abdul Rohman (Gunadarma University, Indonesia) 122

Technical Session 2-1

Signal and Image Processing Techniques for VLSI Failure Analysis Anthony Boscaro (Le2i UMR CNRS 6306, France), Sabir Jacquir (Université de Bourgogne, Laboratoire Le2i UMR CNRS, France), Kevin Sanchez (CNES, France), Philippe Perdu (CNES, France), Stéphane Binczak (Université de Bourgogne, France) 128 Breast Cancer Identification on Digital Mammogram Using Evolving Connectionist Systems Erna Budhiarti Nababan (Universitas Sumatera Utara, Indonesia), Muhammad Iqbal (University of Sumatera Utara, Indonesia), Romi Fadillah Rahmat (University of Sumatera Utara, Indonesia) 132 Remote QR code recognition based on HOG and SVM Classifiers Hicham Tribak (Faculty of Science, Tetouan, Morocco), Salah Moughyt (Faculty of Science, Tetouan, Morocco), Youssef Zaz (University of Abdelmalek Essaadi & Faculty of Science, Morocco), Gerald Schaefer (Loughborough University, United Kingdom) 137 Real Time Face Recognition Using DCT Coefficients Based Face Descriptor I Gede Pasek Suta Wijaya (Mataram University, Indonesia), Ario Yudo Husodo (Mataram University, Indonesia), I Wayan Agus Arimbawa (Mataram University, Indonesia) 142 The Implementation of Genetic Algorithm and Routing Lee for PCB Design Optimization Tessy Badriyah (Electronics Engineering Polytechnic Institute of Surabaya Indonesia (EEPIS), Indonesia), Fitri Setyorini (Politeknik Elektronika Negeri Surabaya, Indonesia), Niyoko Yuliawan (Electronic Engineering Polytechnic Institute of Surabaya, Indonesia) 148

Technical Session 2-2

The Web Security And Vulnerability Analysis Model On Indonesia Higher Education Ign Mantra ( & Mercubuana University, Indonesia) 154 Stegano-Image as a Digital Signature to Improve Security Authentication System in Mobile Computing Teddy Mantoro (Sampoerna University, Indonesia), Didit Permadi (Universitas Budi Luhur, Indonesia), Adamu Abubakar (International Islamic University Malaysia & Integ lab, Malaysia) 158 Inculcating Secure Coding for Beginners Normaziah Abdul Aziz (International Islamic University Malaysia, Malaysia), Nur Asnida Hassan (International Islamic University Malaysia, Malaysia), Siti Nurul Zulaiha Shamsuddin (International Islamic University Malaysia, Malaysia) 164 Analysing the security of NFC based payment systems Nour Tabet (Hamad Bin Khalifa university qatar, Qatar), Media A Ayu (Sampoerna University, Indonesia) 169 Token-Based Authentication Using JSON Web Token on SIKASIR RESTful Web Service Eliyani Eliyani (University of Mercu Buana, Indonesia) 175 Technical Session 2-3

Recurrent Neural Network With Extended Kalman Filter For Prediction Of The Number Of Tourist Arrival in Lombok Ahmad Ashril Rizal (STMIK Bumigora Mataram, Indonesia), Sri Hartati (Gadjah Mada University, Indonesia) 180 Driver Behavior State Recognition based on Silence Removal Speech Norhaslinda Kamaruddin (MARA University of Technology, Malaysia), Abdul Wahab Abdul Rahman (IIUM, Malaysia) 186 Stomach Disorder Detection Through the Iris Image Using Backpropagation Neural Network Aisyah Kumala Dewi (, Indonesia) 192 Implementation of Diabetic Retinopathy Screening Using Realtime Data Desti Fitriati (, Indonesia) 198 An overview of Malaria Identification Techniques for Microscope Blood Images Ary Ningsih (Universitas Gadjah Mada, Indonesia), Sri Hartati (University of Gadjah Mada, Indonesia), Rika Rosnelly (Universitas Gadjah Mada, Indonesia) 204

Technical Session 2-4

Customer Loyalty Prediction in Multimedia Service Provider Company With K-Means Segmentation and C4.5 Algorithm Sardjoeni Moedjiono (Budi Luhur University, Indonesia), Yosianus Isak (Budi Luhur University, Indonesia), Aries Kusdaryono (Budi Luhur University, Indonesia) 210 Acceptance and Use of Technology Evaluation on Insurance Data Processing System (Care Tech): A Case Study on Insurance Company in Indonesia Yusuf Durachman (UIN Syarif Hidayatullah Jakarta, Indonesia), Dr Mohd Adam Suhaimi (International Islamic University, Malaysia) 216 Evaluation Model of Information Technology Innovation Effectiveness Case of Higher Education Institutions in Indonesia Muhammad Qomarul Huda (Syarif Hidayatullah State Islamic University (UIN) Jakarta, Indonesia), Husnayati Hussin (International Islamic University Malaysia, Malaysia) 221 Election Fraud and Privacy Related Issues: Addressing Electoral Integrity Muharman Lubis (International Islamic University Malaysia, Indonesia), Mira Kartiwi (International Islamic University Malaysia, Malaysia), Sonny Zulhuda (International Islamic University Malaysia, Malaysia) 227 Current State of Personal Data Protection in Electronic Voting: Demand on Legislature's Bill Muharman Lubis (International Islamic University Malaysia, Indonesia), Mira Kartiwi (International Islamic University Malaysia, Malaysia), Sonny Zulhuda (International Islamic University Malaysia, Malaysia) 233

Technical Session 2-5

Ontology Alignment using Combined Similarity Method and Matching Method Didih Rizki Chandranegara (Institut Teknologi Sepuluh Nopember, Indonesia), Riyanarto Sarno (Institut Teknologi Sepuluh Nopember, Indonesia) 239 Traceability Between Business Process and Software Component using Probabilistic Latent Semantic Analysis Fony Revindasari (Institut Teknologi Sepuluh Nopember, Indonesia), Riyanarto Sarno (Institut Teknologi Sepuluh Nopember, Indonesia), Adhatus Solichah Ahmadiyah (Institut Teknologi Sepuluh Nopember, Indonesia) 245 Discovering Traceability between Business Process and Software Component using Latent Dirichlet Allocation Andreyan Rizky Baskara (Institut Teknologi Sepuluh Nopember, Indonesia), Riyanarto Sarno (Institut Teknologi Sepuluh Nopember, Indonesia), Adhatus Solichah Ahmadiyah (Institut Teknologi Sepuluh Nopember, Indonesia) 251 Using Cloud Computing for building DAS Tondano Mitigation Disaster Information System Prototype Stanley Karouw (, Indonesia), Hans Wowor (Sam Ratulangi University, Indonesia) 257 FX Forecasting using B-WEMA: Variant of Brown's Double Exponential Smoothing Seng Hansun (Universitas Multimedia Nusantara, Indonesia) 262

Technical Session 2-6

Determination of Female Fetus Using To-Zero Threshold And Template Matching David Hareva (Universitas Pelita Harapan, Indonesia), Irene Lazarusli (Universitas Pelita Harapan, Indonesia), Suryasari Suryasari (Universitas Pelita Harapan, Indonesia) 267 An Implementation of Direction Cosine Matrix in Rocket Payload Dynamics Attitude Monitoring Purnawarman Musa (Gunadarma University, Indonesia), Dennis Christie (Gunadarma University, Indonesia), Eri Prasetyo Wibowo (Universitas Gunadarma, Indonesia) 271 Implementation of Server Consolidation Method on a Data Center by using Virtualization Technique: A Case Study Husni Teja Sukmana (Syarif Hidayatullah State Islamic University Jakarta, Indonesia), Yuditha Ichsani (Syarif Hidayatullah State Islamic University Jakarta, Indonesia), Syopiansyah Jaya Putra (Syarif Hidayatullah State Islamic University Jakarta, Indonesia) 277 The design and preliminary implementation of low-cost brain-computer interface for enable moving of rolling robot Setiawan Hadi, M. Sc. CS. (Universitas Padjadjaran, Indonesia), Asep Sholahuddin (Universitas Padjadjaran, Indonesia), Lany Rahmawati (Multipolar Lippo Group, Indonesia) 283 Residual Energy Effects on Wireless Sensor Networks (REE-WSN) Abdullah Alkalbani (Uniersity of Buraimi & Ibri College of Applied Sciences, Oman), Teddy Mantoro (Sampoerna University, Indonesia) 288

Technical Session 3-1

Real-time Activity Recognition in Mobile Phones Based on Its Accelerometer Data Media A Ayu (Sampoerna University, Indonesia), Siti Aisyah Ismail (International Islamic University Malaysia, Malaysia), Teddy Mantoro (Sampoerna University, Indonesia) 292 Kidney Failure Diagnosis Based On Case-Based Reasoning (CBR) Method And Statistical Analysis Anthony Anggrawan (STMIK Bumigora Mataram, Indonesia), Khasnur Hidjah (STMIK Bumigora Mataram & STMIK Bumigora Mataram, Indonesia) 298 Enhanced Latent Semantic Analysis by Considering Mistyped Words in Automated Essay Scoring Derwin Suhartono (Bina Nusantara University, Indonesia) 304 Multi Agent Hyperheuristics Based Framework For Production Scheduling Problem Cecilia Esti Nugraheni (Parahyangan Catholic University, Indonesia), Luciana Abednego (UNPAR, Bandung, Indonesia) 309 Improving the Performance of Translation Process in Statistical Machine Translator Using Sequence IRSTLM Translation Parameters and Pruning Teddy Mantoro (Sampoerna University, Indonesia), Jelita Asian (, Indonesia), Media A Ayu (Sampoerna University, Indonesia) 314 Technical Session 3-2

Saboteur Game Modelling Using Means-Ends Analysis Dion Krisnadi (University of Pelita Harapan, Indonesia), Samuel Lukas (Pelita Harapan University, Indonesia) 319 Analysis of Indonesian Sentiment Text based on Affective Space Model (ASM) using Electroencephalogram (EEG) signals Abdul Wahab Abdul Rahman (IIUM, Malaysia) 325 Okamoto-Uchiyama Homomorphic Encryption Algorithm Implementation in E-Voting System Rifki Suwandi (Telkom University, Indonesia), Surya Michrandi Nasution (Telkom University, Indonesia), Fairuz Azmi (Telkom University, Indonesia) 329 Study of Sea Wave Height Based on Radar Image Texture Feature of Radar Images:An Introduction Sabar Rudiarto (University of Mercu Buana, Indonesia), Harwi Karya (Universitas Mercu Buana, Indonesia) 334 Advanced E-Voting System Using Paillier Homomorphic Encryption Algorithm Shifa Anggriane (Telkom University, Indonesia), Surya Michrandi Nasution (Telkom University, Indonesia), Fairuz Azmi (Telkom University, Indonesia) 338

Technical Session 3-3

Detecting Most Central Actors of an Unknown Network Using Friendship Paradox Sayed Mahmudul Alam (North South University, Bangladesh), Nahid Islam (North South University, Bangladesh), Shazzad Hosain (North South University, Bangladesh) 343 E-business Adoption and Application Portfolio Management in Remanufacturing Small and Medium Enterprises Yun Fatimah (University of Magelang, Indonesia), Panca O. Hadi Putra (University of Indonesia, Indonesia), Zainal Hasibuan (University of Indonesia, Indonesia) 349 ARmatika: 3D Game for Arithmetic Learning with Augmented Reality Technology Julio Young (Universitas Multimedia Nusantara, Indonesia), Marcel Bonar Kristanda (Universitas Multimedia Nusantara, Indonesia), Seng Hansun (Universitas Multimedia Nusantara, Indonesia) 355 Integration of Updated DeLone &McLean Success Model, Kano Model and QFD to Analyze Quality of an Information System Erma Nindiaswari (Universitas Indonesia, Indonesia), Fatimah Azzahro (Faculty of Computer Science Universitas Indonesia, Indonesia), Achmad Hidayanto (University of Indonesia, Indonesia), Solikin Gitik (STMIK Bina Insani, Indonesia), Pornthep Anussornnitisarn (Kasetsart University, Thailand) 361 Applying Open Content Concept by Synchronizing Lecture Video and Slide A. A. Gede Yudhi Paramartha (Ganesha University of Education, Indonesia), I Gede Partha Sindu (Ganesha University of Education, Indonesia), Kadek Yota Ernanda Aryanto (Ganesha University of Education, Indonesia) 367

Technical Session 3-4

Performance Analysis of AODV and DSDV Using SUMO, MOVE and NS2 Teddy Mantoro (Sampoerna University, Indonesia), Muhammad Reza (University of Budi Luhur, Indonesia) 372 VisUN-3D: User Navigation with Visualized 3D Maps for Mobile Users Teddy Mantoro (Sampoerna University, Indonesia), Media A Ayu (Sampoerna University, Indonesia), Umran Abdulla (UNSW@ADFA, Australia), Midhat Muhic (Bosna Bank International, Bosnia and Herzegovina), Moaz AbdulBagi (University of Sharjah, United Arab Emirates (UAE)) 377 Real Time Monitoring System for Water Pollution in Lake Toba Romi Fadillah Rahmat (University of Sumatera Utara, Indonesia), Athmanathan Athmanatan (University of Sumatera Utara, Indonesia), Mohammad Syahputra (University of Sumatera Utara, Indonesia), Maya Lidya (University of Sumatera Utara, Indonesia) 383 Similarity Analysis of Motion based on Motion Capture Technology Ega Hegarini (Gunadarma University, Indonesia), Achmad Mutiara (Universitas Gunadarma, Indonesia), Adang Suhendra (Gunadarma University, Indonesia), Iqbal Mohammad (Gunadarma University & Université de Bourgogne, Indonesia), Bheta Agus Wardijono (STMIK Jakarta STI&K, Indonesia) 389 Real-time Monitoring System of Electrical Quantities on ICT Centre Building University of Lampung Based on Embedded Single Board Computer BCM2835 Gigih Forda Nama (University of Lampung, Indonesia), Dikpride Despa (University of Lampung, Indonesia), Mardiana Rendra (University of Lampung, Indonesia) 394

Technical Session 3-5

Performance Evaluation of Simulated Smoothed Particle Hydrodynamics Method in Pulsating Atherosclerotic Blood Vessel Kenny Wiratama (Universitas Pelita Harapan, Indonesia), Pujianto Yugopuspito (Universitas Pelita Harapan, Indonesia), Helena Margaretha (Pelita Harapan University, Indonesia) 400 Business Process Automation of Document Filing Based Alfresco for Referral BPJS Patient (Case Study: BPJS Center of Dharmais Cancer Hospital) Nurhayati Buslim (UIN Syarif Hidayatullah Jakarta, Indonesia), Diana Fitrisari (UIN Syarif Hidayatullah Jakarta, Indonesia) 406 Automatic Bilingual Ontology Construction using Text Corpus and Ontology Design Patterns (ODPs) in Tuberculosis's Disease Denis Cahyani (University Of Indonesia, Indonesia), Bambang Harjito (, Indonesia) 411 Analysis of Online Training System from Economic Perspective Using Ranti's Generic IS/IT Business Value (Case Study: Bank Rakyat Indonesia) Teddie Darmizal (State Islamic University of Sultan Syarif Qasim Riau, Indonesia) 416 Using GPS and Google Maps for Mapping Digital Land Certificates Eko Sediyono (Satyawacana Christian University, Indonesia), Vikky Windarni (Satya Wacana Christian University, Indonesia), Adi Setiawan (Satya Wacana Christian University, Indonesia) 422

Technical Session 3-6

A File Undelete With Aho-Corasick Algorithm In File Recovery Opim Salim Sitompul (University of Sumatera Utara, Indonesia), Andrew Handoko (University of Sumatera Utara, Indonesia), Romi Fadillah Rahmat (University of Sumatera Utara, Indonesia) 427 Algorithm for Updating n-Grams Word Dictionary for Web Classification Taufik F. Abidin (, Indonesia), Ridha Ferdhiana (Syiah Kuala University, Indonesia) 432 A Framework of Training Anfis Using Chicken Swarm Optimization for Solving Classification Problems Roslina Roslina (University of Sumatera Utara, Indonesia), Muhammad Zarlis (Universitas Sumatera Utara, Indonesia), Iwan Riyadi Yanto (Universitas Ahmad Dahlan, Indonesia), Dedy Hartama (Universitas Sumatera Utara & USU, Indonesia) 437 Wall Shear Stress Calculation Based on MRI Image in Patients With Abdominal Aortic Aneurysm (AAA) Desti Riminarsih (Gunadarma University, Indonesia), Cut Karyati (Gunadarma University, Indonesia), Achmad Mutiara (Universitas Gunadarma, Indonesia), Bambang Wahyudi (Gunadarma University, Indonesia), Ernas Tuti (Gunadarma University, Indonesia) 442 Selection of Cloud Deployment Model for Ministry of Foreign Affairs Using Benefit, Cost, Opportunity, and Risk (BCOR) Analysis and Analytic Hierarchy Process (AHP) Wulan Indriani (Faculty of Computer Science Universitas Indonesia, Indonesia), Nur Ayuningbudi (Faculty of Computer Science Universitas Indonesia, Indonesia), Fatimah Azzahro (Faculty of Computer Science Universitas Indonesia, Indonesia), Achmad Hidayanto (University of Indonesia, Indonesia), Solikin Gitik (STMIK Bina Insani, Indonesia) 447 2016 International Conference on Informatics and Computing (ICIC)

Algorithm for Updating n-Grams Word Dictionary for Web Classification

Taufik Fuadi Abidin1, Ridha Ferdhiana2 Department of Informatics1, Department of Statistics2 Faculty of Mathematics and Natural Sciences, Syiah Kuala University Darussalam, Banda Aceh, Indonesia [email protected], [email protected]

Abstract—In this paper, we examine an algorithm to update n- When we developed an SVM classifier to classify web grams word dictionary (thesaurus) and evaluate its effectiveness pages about tropical disease incidence [6], n-grams model was in binary classification problem. The thesaurus is used as a used to build both positive and negative class dictionaries for reference to generate the numerical feature attributes of web binary classification problem. The dictionaries consist of uni- pages. Generally, the n-grams word dictionary is built once using gram, bi-grams, tri-grams words in the first column, their a set of training data and its content is never updated. Hence, the frequency in the second column, and a normalized frequency content is static and its coverage is limited to the n-grams word in the third column. The dictionaries are used as a reference in found in the initial training set. Actually, the content of a generating numerical features of a web. The description of thesaurus must be dynamic, especially because the n-grams word each numerical feature attributes is elaborated later in Section dictionary is used repeatedly as a reference in generating the II. A few examples of n-grams words, taken from positive numerical feature attributes of web pages. We argue that a dynamic thesaurus is better than a static one in a long-term. class dictionary, is listed in Table 1. Thus, n-grams word dictionary should be updated frequently Several prominent research works have shown that word using new data without degrading the classification accuracy. We dictionary has been used in the proposed methods. Bollegala validate our proposed algorithm using several test sets, each of [7] used dictionary in a cross-domain sentiment classification which contains one hundred web pages, except for the last one. problem. We used n-grams dictionary in our web classification The experimental results show that our proposed algorithm problem as a reference in generating feature attributes of a works well. On average, the accuracy of feature dataset web [6]. However, we notice that the content of the thesaurus generated using the existing (old) dictionary is 57.75%, while the accuracy of feature dataset generated using updated (new) is highly static and never changed since it was first created, dictionary is 76.75%. The proposed algorithm increases and therefore, its coverage is limited only to those n-grams classification accuracy about 32.90%. words found in the initial training set. We argue that the thesaurus should grow and expand its coverage dynamically, Keywords—algorithm; n-grams dictionary; web classification; especially because they will be used repeatedly as a reference. classification accuracy; f-measure. However, we must ensure that the classification accuracy will not degrade when the content of the thesaurus is updated. I. INTRODUCTION This paper introduces an algorithm to update n-grams word Web is the largest inter-linked hypertext documents that dictionary of positive class {+1} in a tropical disease binary rapidly growing in the last decade due to an increasing usage classification problem and evaluates its effectiveness. The of Internet around the world. About 46 billion web pages have algorithm works as follows: An n-grams word list is extracted been indexed by Google in July 2015 [1] and at least 4 million from a set of new tropical disease web pages with positive informative websites with high value resources have been class label. Let’s name this list as NEW_LIST. The existing listed and grouped in the Open Directory in the same period positive class dictionary, CURR_DICT, is replicated and the [2]. The tremendous size of the web has made effort to content of the replica is updated using all n-grams words in intelligently categorize web pages using machine learning NEW_LIST. Let’s name the replica as REPLICA_DICT. If an techniques become an important task. n-grams word in NEW_LIST is found in REPLICA_DICT, then only its frequency is updated, otherwise the n-grams word Web pages are documents written in hypertext markup will be added into REPLICA_DICT with its frequency is set to language with tags flanking words or sentences. According to 1. All n-grams words in NEW_LIST are evaluated. Last, the Kok [3], words in a sentence are not combined in a random frequency of n-grams word in REPLICA_DICT is normalized order, instead they are organized by certain rules that make a by dividing the frequency with its maximum. Thus, the largest word and its neighbors have a certain meaning. N-grams is a normalized frequency is 1, while the smallest one can be model that is commonly used to understand how words form a closed to 0. sentence [4][5]. It is a contiguous sequence of n words from a given sentence. Parameter n is an integer value greater than After the n-grams word dictionary of positive class is zero such that a uni-gram is when n=1, bi-grams is when n=2, updated, we evaluate the impact. For each testing set, two and tri-grams is when n=3. Uni-gram, bi-grams, and tri-grams different datasets are constructed. One dataset will have are the most widely used variants of n-grams model [4]. numerical features constructed using the existing (old)

978-1-5090-1648-8/16/$31.00 ©2016 IEEE 2016 International Conference on Informatics and Computing (ICIC)

dictionary, and the other dataset will have numerical features by the total possible n-grams word that can be formed in that constructed using REPLICA_DICT and the existing negative section. Hence, a total of 24 features are calculated for each W class dictionary. We classify the two feature datasets using i.e. three different n-grams multiplied by two kinds of SVM classifier trained in [6] and we evaluate the accuracy. If thesaurus and multiplied by four sections of W. We describe the testing set with numerical features constructed using each numerical feature as follows: REPLICA_DICT performs at least the same or even better than the other testing set, then REPLICA_DICT will be kept 1) A uni-gram feature of positive class in title section, and it will become the newest positive class dictionary. In denoted as funi-gram, positive, title is the total count of uni-gram other words, if REPLICA_DICT performs better then the words in the title that are found in positive thesaurus Tpos positive class dictionary will be replaced by REPLICA_DICT. divided by the total number of uni-gram words in that section. Otherwise, no changes will be made to the positive class 2) A bi-grams feature of positive class in title section, dictionary and REPLICA_DICT will be discarded. In denoted as fbi-grams, positive, title is the total count of bi-grams summary, our contributions are twofold: words in the title that are found in positive thesaurus Tpos 1. We introduce an algorithm to update n-grams word divided by the total number of bi-grams words in that section. dictionary of positive class {+1} in a tropical disease 3) A tri-grams feature of positive class in title section, binary classification problem. denoted as ftri-grams, positive, title is the total count of tri-grams 2. We evaluate the algorithm effectiveness to update n-grams words in the title that are found in positive thesaurus Tpos word dictionary by measuring its classification accuracy. divided by the total number of tri-grams words in that section. 4) A uni-gram feature of negative class in title section, TABLE 1 denoted as funi-gram, negative, title is the total count of uni-gram EXAMPLE OF POSITIVE THESAURUS words in the title that are found in negative thesaurus Tneg divided by the total number of uni-gram words in that section. penyakit kaki gajah 764 1.000 5) A bi-grams feature of negative class in title section, demam berdarah dengue 639 0.836 denoted as fbi-grams, negative, title is the total count of bi-grams penyakit demam berdarah 297 0.389 words in the title that are found in negative thesaurus Tneg virus flu burung 262 0.343 divided by the total number of bi-grams words in that section. pemberantasan sarang nyamuk 241 0.315 nyamuk aedes aegypti 240 0.314 6) A tri-grams feature of negative class in title section, kasus flu burung 207 0.271 denoted as ftri-grams, negative, title is the total count of tri-grams penderita demam berdarah 170 0.223 words in the title that are found in negative thesaurus Tneg pasien demam berdarah 163 0.213 divided by the total number of tri-grams words in that section. kasus demam berdarah 159 0.208 7) A uni-gram feature of positive class in the top part of a penderita kaki gajah 158 0.207 content, denoted as funi-gram, positive, top is the total count of uni- gram words in the top part of a content that are found in positive thesaurus Tpos divided by the total number of uni-gram The rest of this paper is organized as follows. We discuss words in that part. the problem setting in Section II. Then, we describe the process of constructing a dynamic n-grams word dictionary in 8) A bi-grams feature of positive class of the top part of a Section III. We discuss dataset and experimental results in content, denoted as fbi-grams, positive, top is the total count of bi- Section IV, and last, we conclude the work in Section V. grams words in the top part of a content that are found in the positive thesaurus Tpos divided by the total number of bi-grams words in that part. II. PROBLEM SETTING We define a corpus C as a set of tropical disease web pages, 9) A tri-grams feature of positive class of the top part of a Tpos as a positive thesaurus and Tneg as a negative thesaurus. content, denoted as ftri-grams, positive, top is the total count of tri- Given a web page W, the objective is to binary classify W to a grams words in the top part of a content that are found in the class label c as either a tropical disease incidence {+1} or not positive thesaurus Tpos divided by the total number of tri-grams {-1}. Thus, c belongs to a set of {+1, -1} and it is a binary words in that part. classification problem. We define an n-grams word t its term 10) A uni-gram feature of negative class of the top part of a frequency in C as tf(t,C) and the maximum term frequency as content, denoted as funi-gram, negative, top is the total count of uni- max(tf(t,C)). We denote a weight wt as a normalized frequency gram words in the top part of a content that are found in the value of n-grams word t such that wt = tf(t,C)/max(tf(t,C)). negative thesaurus Tneg divided by the total number of uni- A feature f is a set of numerical features constructed from gram words in that part. title section, top, middle, and bottom parts of W content. 11) A bi-grams feature of negative class of top part of a Different weights are given to top, middle, and bottom parts of content, denoted as fbi-grams, negative, top is the total count of bi- W content, i.e. 0.5, 0.3, and 0.2 respectively. With a slight grams words in the top part of a content that are found in the change of formula notation introduced in [6], we define a negative thesaurus Tneg divided by the total number of bi- numerical feature as the total count of n-grams word t in a grams words in that part. certain section that are found in a thesaurus Tpos or Tneg divided 2016 International Conference on Informatics and Computing (ICIC)

12) A tri-grams feature of negative class of the top part of a experiment, then the n-grams word with a smaller weight will content, denoted as ftri-grams, negative, top is the total count of tri- be eliminated from the dictionary. However, if the ratio is grams words in the top part of a content that are found in the smaller than a given threshold, then the n-grams word will be negative thesaurus Tneg divided by the total number of tri- removed from both dictionaries. grams words in that part. After updating REPLICA_DICT is completed, two feature The same definitions are defined to the other 12 features datasets will be created (see step 6 and 7 in Figure 1). The first that form numerical instances of the pages, i.e. funi-gram, positive, feature dataset is created using the existing n-grams word, middle, fbi-grams, positive, middle, ftri-grams, positive, middle, funi-gram, negative, middle, both positive and negative, while the second feature dataset is fbi-grams, negative, middle, ftri-grams, negative, middle, funi-gram, positive, bottom, fbi- created using a negative class thesaurus and REPLICA_DICT grams, positive, bottom, ftri-grams, positive, bottom, funi-gram, negative, bottom, fbi-grams, (the new updated positive class dictionary). Both feature negative, bottom, ftri-grams, negative, bottom. These features are then datasets are then classified using SVM classifier built in [6] organized in SVM-light format [8]. Although our problem and the classification accuracy are compared. If the accuracy setting only focuses on a set of tropical disease web pages and of the second feature dataset performs better than or equal to n-grams word dictionary for a binary classification problem, the accuracy of the first dataset, then REPLICA_DICT will be the same approach and setting can be imitated to address a kept and used as a new updated positive class dictionary. On multi-category classification problem on other domains. the contrary, no changes will be made to the existing positive dictionary and REPLICA_DICT will be discarded. III. CONSTRUCTING A DYNAMIC THESAURUS A fundamental problem when using a static thesaurus to generate numerical features is that not all n-grams words can be collected in a thesaurus, or they can be found in a thesaurus but with a very low frequency. This occurred because the size of a training set used to build a thesaurus are small and its coverage is not big enough to handle all n-grams words with adequate frequency. Therefore, to overcome this issue, an algorithm to automatically update n-grams word dictionary for positive class is proposed in this paper. There are two important questions arise. First, how to automatically construct a dynamic thesaurus that can grow its content and update its frequency without degrading the classification accuracy. Second, how to decide when the new updated thesaurus will replace the old one. We will discuss these two important questions in this Section and show the experimental results in Section IV. There are several steps to dynamically update positive class thesaurus as depicted in Figure 1. First, a set of unclassified web pages are collected and cleaned in Repotropical database [9]. Then, the numerical features of web pages are generated using the existing positive and negative dictionaries. After that, they are classified using SVM classifier built in [6]. A list of uni-gram, bi-grams, and tri-grams are extracted from a set of new tropical disease web pages, categorized as positive, but before the extraction is done, all stopwords in cleaned web pages are removed. This to ensure that no important words are eliminated just because their position in a sentence is near a stopword. Let’s call this list NEW_LIST. The existing positive class dictionary, named as CURR_DICT, is replicated and its content is updated using all n-grams words in NEW_LIST. Let’s call this dictionary REPLICA_DICT. If an n-grams word in NEW_LIST is found in REPLICA_DICT, then its frequency is updated, otherwise the n-grams word will be added into REPLICA_DICT with its frequency is set to 1. The process continues until all n-grams words in NEW_LIST are examined and evaluated. Finally, the frequency of n-grams word in REPLICA_DICT is normalized. To measure how important an n-grams word is in positive and negative dictionaries, elimination ratio as suggested in [10] is used. If the ratio is greater than a given threshold, i.e. 0.5 in this Figure 1. An algorithm to automatically update n-grams word thesaurus.

2016 International Conference on Informatics and Computing (ICIC)

IV. EXPERIMENTAL RESULTS A. Dataset A collection of 939 categorized web pages, collected in Repotropical database [9], was used as a benchmarking testing set. The testing set was broken up into ten chunks, each of which contains 100 web pages. Only the 10th chuck contains 39 web pages. The distribution of the benchmarking testing set for each class label is listed in Table 2.

TABLE 2 TESTING DATASETS BY CLASS LABELS

Testing Total Set Web Pages {+1} {-1} Figure 2. Precision of each testing dataset using new and old dictionaries. I 60 40 II 70 30 III 80 20 IV 80 20 V 75 25 VI 75 25 VII 75 25 VIII 70 30 IX 70 30 X 20 19

Total 675 264

Figure 3. Recall of each testing dataset using new and old dictionaries. B. Effectiveness Evaluation of Updating Thesaurus We used 5,700 manually categorized web pages consists of 2,970 web pages classified as {+1} and 2,730 web pages classified as {-1} to evaluate the effectiveness of our proposed algorithm to automatically update the n-grams word dictionary or thesaurus. As mentioned in Section III, web pages are tokenized to create new n-grams words (NEW_LIST) as depicted in Figure 1. Using this NEW_LIST, the content of the existing positive thesaurus is updated. We evaluated the effectiveness of the algorithm using testing set listed in Table 2. A side-by-side comparison of Precision of each testing set is depicted in Figure 2 and a side- by-side comparison of Recall of each testing set is shown in Figure 3. On average, the Precision of the first feature dataset, i.e. the feature of each testing set generated using the existing dictionary, is 94.68%, while the Precision of the second Figure 4. F-measure of each testing dataset using new and old dictionaries. feature dataset, i.e. the feature of each testing set generated using negative class thesaurus and REPLICA_DICT (the new V. CONCLUSION updated and normalized n-grams word thesaurus of positive class), is 89.40%. Moreover, on average, the Recall of the first We have shown from this work that the algorithm for feature dataset is 41.83%, where as the Recall of the second automatically update the n-grams word dictionary works well feature dataset is 67.49%. Based on those two metrics, we and increases the overall classification accuracy about calculated the F-Measure of each feature dataset and found 32.90%. On average, the accuracy of feature dataset generated that on average the classification accuracy of the first feature using the existing (old) dictionary is 57.75%, while the dataset is 57.75%, while the accuracy of the second feature accuracy of feature dataset generated using updated (new) dataset is 76.75%. Figure 4 shows the side-by-side comparison dictionary is 76.75%. Therefore, the algorithm is effective and of F-Measure of each testing set. recommended to be to update a thesaurus and used for web 2016 International Conference on Informatics and Computing (ICIC)

classification. In our future plans, we are planning to analyze [5] Z. Shui-geng, G. Ji-hong, H. Yan-xiang, “Hierarchical Classification of the effectiveness of this proposed algorithm when it is applied Chinese Documents Based on N-grams”, Wuhan University Journal of on multi-class classification problem. We expect that the Natural Sciences, vol. 6, no. 1-2, pp. 416-422, 2001. [6] T. F. Abidin, R. Ferdhiana, H. Kamil, “Learning to Classify Tropical algorithm will also work well. Disease Web Pages from Large Indonesian Web Documents.” Proc. of the 4th International Conference on Computer and Electrical VI. ACKNOWLEDGMENT Engineering, Singapore, October 2011. [7] Bollegala, D. Weir, and J. Carroll, “Cross-Domain Sentiment This work is supported by the Ministry of Research, Classification Using a Sentiment Sensitive Thesaurus”, IEEE Technology, and Higher Education, Republic of Indonesia Transactions on Knowledge and Data Engineering, vol. 25(8), August through National Strategic Research Grant, contract number 2013. 132/UN11.2/LT/SP3/2015. [8] Joachims, “Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning”, B. Scholkopf, C. Burges and A. Smola (ed.), MIT Press, 1999. REFERENCES [9] T. F. Abidin, M. Subianto, T. A. Gani, R. Ferdhiana, “Periodic Update [1] Worldwidewebsize.com, accessed on July 10, 2015. and Automatic Extraction of Web Data for Creating a Google Earth [2] AOL Inc., the DMOZ Open Directory Project (ODP), www.dmoz.org, Based Tool.” Proc. of the International Conference on Adv. Computer July, 2015. Science and Information Systems, Depok, October 10-11, 2015. [3] D. de Kok and H. Brouwer, Natural Language Processing for the [10] T.F. Abidin, R. Ferdhiana, H. Kamil, “Automatic Extraction of Place Working Programmer, www.nlpwp.org/book, cited on May 2, 2015. Entities and Sentences Containing the Date and Number of Victims of [4] G. A. Fink, Markov Models for Pattern Recognition: From Theory to Tropical Disease Incidence from the Web”, Journal of Emerging Applications, pp. 107-127, Springer London, 2014. Technologies in Web Intelligence, vol. 5(3), pp. 302-309, August 2013.