Lecture Notes in Artificial Intelligence 5476 Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science Thanaruk Theeramunkong Boonserm Kijsirikul Nick Cercone Tu-Bao Ho (Eds.)

Advances in Knowledge Discovery and Data Mining

13th Pacific-Asia Conference, PAKDD 2009 , , April 27-30, 2009 Proceedings

13 Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany

Volume Editors Thanaruk Theeramunkong Sirindhorn International Institute of 131 Moo 5 Tiwanont Road, Bangkadi, Muang, Pathumthani 12000, Thailand E-mail: [email protected]

Boonserm Kijsirikul University Faculty of , Department of Computer Engineering Bangkok 10330, Thailand E-mail: [email protected]

Nick Cercone York University, Faculty of Science & Engineering 355 Lumbers Building, 4700 Keele Street, Toronto ON M3J 1P3, Canada E-mail: [email protected]

Tu-Bao Ho Advanced Institute of Science and Technology School of Knowledge Science 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan E-mail: [email protected]

Library of Congress Control Number: Applied for

CR Subject Classification (1998): I.2, H.2.8, H.3, H.5.1, G.3, J.1, K.4

LNCS Sublibrary: SL 7 – Artificial Intelligence

ISSN 0302-9743 ISBN-10 3-642-01306-6 Springer Berlin Heidelberg New York ISBN-13 978-3-642-01306-5 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, Printed on acid-free paper SPIN: 12663194 06/3180 543210 Preface

The Pacific-Asia Conference on Knowledge Discovery and Data Mining has been held every year from 1997. PAKDD 2009, the 13th in the series, was held in Bangkok, Thailand during April 27-30, 2008. PAKDD is a major interna- tional conference in the areas of data mining (DM) and knowledge discovery in database (KDD). It provides an international forum for researchers and indus- try practitioners to share their new ideas, original research results and practi- cal development experiences from all KDD-related areas including data mining, data warehousing, machine learning, databases, statistics, knowledge acquisi- tion and automatic scientific discovery, data visualization, causal induction and knowledge-based systems. For PAKDD 2009, we received 338 research papers from various countries and regions in Asia, Australia, North America, South America, Europe, and Africa. Every submission was rigorously reviewed by at least three reviewers with a double blind protocol. The initial results were discussed among the reviewers and finally judged by the Program Committee Chairs. When there was a conflict, an additional review was provided by the Program Committee Chairs. The Program Committee members were deeply involved in the highly selective process. As a result, only 39 papers (approximately 11.5% of the 338 submitted papers) were accepted as regular papers, 73 papers (21.6% of them) were accepted as short papers. The PAKDD 2009 conference program also included five workshops: the Pa- cific Asia Workshop on Intelligence and Security Informatics (PAISI 2009), a workshop on Advances and Issues in Biomedical Data Mining (AIBDM 2009), a workshop on Data Mining with Imbalanced Classes and Error Cost (ICEC 2009), a workshop on Open Source in Data Mining (OSDM 2009), and a workshop on Quality Issues, Measures of Interestingness and Evaluation of data mining mod- els (QIMIE 2009). PAKDD 2009 would not have been successful without the support of committee members, reviewers, workshop organizers, tutorial speak- ers, invited speakers, competition organizers, organizing staffs, and supporting organizations. We are indebted to the members of Steering Committee for their invaluable suggestions and support throughout the organization process. We highly appreciate the Program Committee members, and external reviewers for their technical effort in providing straightforward scientific comments and im- partial judgments in the review process of PAKDD 2009. We thank our Tuto- rial Co-chairs Vincent S. Tseng and Shusaku Tsumoto for kindly coordinating the fruitful tutorials. We wish to thank our General Workshop Co-chairs Man- abu Okumura and Bernhard Pfahringe for selecting and coordinating the great workshops. Many thanks are given to the distinguished keynote speakers, in- vited speakers and tutorial presenters for their attractive and motivational talks and lectures. We thank the General Co-chairs Masaru Kitsuregawa and Vilas VI Preface

Wuwongse for their useful guidance and their sharp advice in various spectrums related to the conference arrangements. We are also grateful to the Local Ar- rangements Chair Chotirat Ratanamahatana and our Local Arrangements Com- mittee in both Thammasat University and for their unlimited help toward the success of the conference. Last but not the least, we would like to give special thanks to Cholwich Nattee, who arranged the publi- cation of PAKDD 2009 in the Lecture Notes in Computer Science series, and to Wirat Chinnan and Swit Phuvipadawat for their support of the PAKDD 2009 conference website. While the arrangement of PAKDD 2009 involved so many people, we would like to extend an additional thank-you to those who contributed to PAKDD 2009 but whose names may not be listed. We greatly appreciate the support from various institutions. The confer- ence was organized by the Sirindhorn International Institute of Technology (SIIT), Thammasat University (TU) and co-organized by the Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University (CU), and the Asian Institute of Technology (AIT). It was sponsored by the Na- tional Electronics and Computer Technology Center (NECTEC, Thailand), the Thailand Convention and Exhibition Bureau (TCEB), and the Air Force Of- fice of Scientific Research/Asian Office of Aerospace Research and Development (AFOSR/AOARD). Finally, we wish to thank all authors and all conference participants for their contribution and support. We hope all participants took this opportunity to share and exchange ideas with each other and enjoyed PAKDD 2009 in the wonderful city of Bangkok.

February 2009 Thanaruk Theeramunkong Boonserm Kijsirikul Nick Cercone Ho Tu Bao Organization

PAKDD 2009 is organized by Sirindhorn International Institute of Technology of Thammasat University, Department of Computer Engineering of Chulalongkorn University, and the School of Engineering and Technology of Asian Institute of Technology.

PAKDD 2009 Conference Committee Honorary Chairs David Cheung University of , Hiroshi Motoda Osaka University, Japan

Local Honorary Chairs Surapon Nitikraipot Rector of Thammasat University, Thailand Pirom Kamolratanakul Rector of Chulalongkorn University, Thailand Said Irandoust President of AIT, Thailand

General Chairs (Conference Chairs) Masaru Kitsuregawa Tokyo University, Japan Vilas Wuwongse Asian Institute of Technology, Thailand

Program Committee Chairs Thanaruk Theeramunkong SIIT, Thammasat University, Thailand Boonserm Kijsirikul Chulalongkorn University, Thailand Nick Cercone York University, Canada Ho Tu Bao JAIST, Japan

Workshop Chairs Manabu Okumura Tokyo Institute of Technology, Japan Bernhard Pfahringe University of Waikato, New Zealand

Tutorial Chairs Vincent S. Tseng National Cheng Kung University, Shusaku Tsumoto Shimane University, Japan

Journal Publication Yasushi Sakurai NTT, Japan Nick Cercone York University, Canada

Local Arrangements Committee Chair Chotirat Ratanamahatana Chulalongkorn University, Thailand VIII Organization

Members Dararat Srisai Chulalongkorn University, Thailand Ithipan Methasate SIIT, Thammasat University, Thailand Juniar Ganis SIIT, Thammasat University, Thailand Kovit Punyasoponlert Chulalongkorn University, Thailand Nattapong Tongtep SIIT, Thammasat University, Thailand Nichnan Kittiphattanabawon SIIT, Thammasat University, Thailand Pakinee Aimmanee SIIT, Thammasat University, Thailand Pasakorn Tangchanachaianan Chulalongkorn University, Thailand Peerasak Intarapaiboon SIIT, Thammasat University, Thailand Piya Limcharoen SIIT, Thammasat University, Thailand Ratthachat Chatpatanasiri Chulalongkorn University, Thailand Sudchaya Saengthong SIIT, Thammasat University, Thailand Surapa Thiemjarus SIIT, Thammasat University, Thailand Swit Phuvipadawat SIIT, Thammasat University, Thailand Tanasanee Phienthrakul Chulalongkorn University, Thailand Thanasan Tanhermhong SIIT, Thammasat University, Thailand Thatsanee Charoenporn NECTEC, Thailand Thawatchai Suwannapong SIIT, Thammasat University, Thailand Vit Niennattrakul Chulalongkorn University, Thailand Warakorn Gulyanon SIIT, Thammasat University, Thailand Wirat Chinnan SIIT, Thammasat University, Thailand

Publication Chairs Cholwich Nattee SIIT, Thammasat University, Thailand Jakkrit TeCho SIIT, Thammasat University, Thailand

Publicity Chairs Chutima Pisarn Prince of Songkla University, Thailand Kritsada Sriphaew Tokyo Institute of Technology, Japan Thatsanee Charoenporn NECTEC, Thailand

PAKDD 2009 Program Committee

Chairs and Co-chairs Thanaruk Theeramunkong SIIT, Thammasat University, Thailand Boonserm Kijsirikul Chulalongkorn University, Thailand Nick Cercone York University, Canada Ho Tu Bao JAIST, Japan Organization IX

Members

Ah-Hwee Tan Clement Yu Aidong Zhang Dacheng Tao Aijun An Daisuke Ikeda Aixin Sun Daniel C. Neagu Akihiro Inokuchi Dao-Qing Dai Akira Shimazu Daoqiang Zhang Aleksandar Lazarevic David Taniar Alfredo Cuzzocrea Daxin Jiang Alipio M. Jorge Dejing Dou Alok Choudhary Dell Zhang Amanda Clare Demetris Zeinalipour Ambuj K. Singh Desheng Dash Wu Annalisa Appice Di Wu Anne M. Denton Diane Cook Anthony Bagnall Diansheng Guo Aris Anagnostopoulos Dimitrios Katsaros Ashkan Sami Dimitris Margaritis Atsuyoshi Nakamura Dit-Yan Yeung Aurawan Imsombut Doina Caragea Baoning Wu Domenico Talia BeatrizdelaIglesia Dou Shen Ben Kao Dragan Gamberger Benjamin C.M. Fung Du Zhang Bernhard Pfahringer Eamonn Keogh Bettina Berendt Ee-Peng Lim Bradley Malin Eibe Frank Carlos Alberto Evaggelia Pitoura Alejandro Castillo Ocaranza Evimaria Terzi Chai Wutiwiwatchai Fabian Moerchen Chandan Reddy Fabio Roli Chang-Tien Lu Fabrizio Silvestri Chaveevan Pechsiri Feifei Li Chengkai Li Fernando Berzal Chengqi Zhang Francesco Bonchi Chih-Jen Lin Francesco Masulli Choochart Haruechaiyasak Gabriel Fung Chotirat Ann Ratanamahatana Gang Li Christian Dawson Gao Cong Christophe Giraud-Carrier Gemma Garriga Chun-hung Li George Karypis Chung-Hong Lee Georges Grinstein Chunsheng Yang Giovanni Semeraro Chutima Pisarn Giuseppe Manco Claudio Lucchese Graham Williams X Organization

Grigorios Tsoumakas Josep Domingo-Ferrer Guido Cervone Juggapong Natwichai Guozhu Dong Junbin Gao Hai Wang Jure Leskovec Haimonti Dutta K. Selcuk Candan Hideo Bannai Kaidi Zhao Hiroki Arimura Kaiqi Huang Hiroyuki Kawano Kanishka Bhaduri Hiroyuki Kitagawa Kay Chen Tan Hisashi Kashima Keith C.C. Chan Hisham Al-Mubaid Kevin Curran Hong Gao Kitsana Waiyamai Howard Ho Konstantinos Kalpakis Hsin-Chang Yang Kun Liu Hsin-Vonn Seow Latifur Rahman Khan Hua Lu Limsoon Wong Hui Wang Lipo Wang Hui Xiong Lisa Hellerstein Hui Yang Longbing Cao Huidong Jin Luis Torgo Huiyu Zhou Marco Maggini Hung Son Nguyen Marut Buranarach Ira Assent Masashi Shimbo Ivor W. Tsang Masoud Jamei Jaakko Hollmen Maybin Muyeba Jake Chen Mehmet Koyuturk Jan Ramon Michael Schmidt Jan Rauch Michelangelo Ceci Jason T.L. Wang Min Yao Jean-Gabriel Gustave Ganascia Ming Hua Jean-Marc Petit Mingli Song Jeremy Besson Mithun Prasad Jialie Shen Mitsunori Ogihara Jian Yin Mohamed F. Mokbel Jianyong Wang Mohamed Medhat Gaber Jieping Ye Myra Spiliopoulou Jimmy Huang N.Ch. Sriman Narayana Iyengar Jin Tian Ngoc Thanh Nguyen Jing Peng Nikunj Chandrakant Oza Jinyan Li Ning Zhong Jiong Yang Ninghui Li Jo˜oP.Gama Nucharee Premchaiswadi Joern Schneidewind Orlando De Jesus Johannes F¨urnkranz Osman Abul John Keane P.K. Mahanti Organization XI

Panagiotis Karras Taneli Mielik¨ainen Pang-Ning Tan Tansel Ozyer Patricia Riddle Tanya Y. Berger-Wolf Paulo Cortez Tao Li Petra Kralj Novak Tao Mei Petros Drineas Tetsuya Yoshida Philippe Lenca Thanaruk Theeramunkong Qingxiang Wu Themis Palpanas Radha Krishna Murthy Karuturi Thepchai Supnithi Raj Krishna Bhatnagar Tianhao Zhang Rajendra Akerkar Tie-Yan Liu Rajesh Reghunadhan Tim Oates Reda Alhajj Tina Eliassi-Rad Richi Nayak Tom Croonenborghs Ronald Rousseau Tomoyuki Uchida Rosa Meo Torsten Suel Rui Camacho Toshihiro Kamishima Ruoming Jin Toshiro Minami Seiji Yamada Traian Marius Truta Salvatore Orlando Tru Cao San-Yih Hwang Tsuyoshi Murata Sanjay Ranka Ulf Brefeld Sanparith Marukatat Vagelis Hristidis Satoshi Oyama Vasilis George Aggelis Shen-Shyang Ho Vasilis Megalooikonomou Sheng Zhong Vassilis Athitsos Shenghuo Zhu Vincent C.S. Lee Shichao Zhang Vincent S. Tseng Shu-Ching Chen Vincenzo Piuri Shun Ishizaki Virach Sortlertlamvanich Silvia Chiusano Wagner Meira Jr. Spiros Papadimitriou Wai Lam Srikanta Tirthapura Wei Fan Srinivasan Jagannathan Wen-Chih Peng Stefan Rueping Wenliang Du Suman Nath Wilfred Ng Sung Ho Ha William K. Cheung Surapa Thiemjarus Wlodek Zadrozny Szymon Jaroszewicz Wolfgang Lehner Tadashi Nomoto Woong-Kee Loh Takeaki Uno Wynne Hsu Takehisa Yairi Xiangjun Dong Takenobu Tokunaga Xiao-Lin Li Tamas Sarlos Xiaofeng Meng Tamer Kahveci Xiaohui Liu XII Organization

Xiaolei Li Yi-Ping Phoebe Chen Xiaoli Li Li Yifeng Zeng Xiaowei Shao Yihua Wu Xindong Wu Ying Tan Xingquan Zhu Yiyu Yao Xintao Wu Yong Guan Xue Li Yuan Yuan Xuelong Li Yun Fu Yan Zhou Yutaka Matsuo Yang Xiang Zhanhuai Li Yang Zhang Zhaohui Tang Yang-Sae Moon Zhaoyang Dong Yanwei Pang Zheng Chen Yasuhiko Morimoto Zhi-Hua Zhou Yi Feng Zhongfei (Mark) Zhang Yi-Dong Shen Zhuoming Xu

PAKDD 2009 External Reviewers Daan He Ratthachat Chatpatanasiri Ioannis Katakis Xiangdong An Jiye Li

Organized by

Sirindhorn International Institute of Technology Thammasat University

Chulalongkorn University

Asian Institute of Technology Organization XIII

Sponsoring Institutions

National Electronics and Computer Technology Center (NECTEC), Thailand

Thailand Convention and Exhibition Bureau (TCEB), Thailand

TheAirForceOfficeofScientificResearch, Asian Office of Aerospace Research and Development (AFOSR/AOARD), USA Table of Contents

Keynote Speeches

KDD for BSN – Towards the Future of Pervasive Sensing ...... 1 Guang-Zhong Yang

Finding Hidden Structures in Relational Databases ...... 2 Jeffrey Xu Yu

The Future of Search: An Online Content Perspective ...... 3 Andrew Tomkins

Regular Papers

DTU: A Decision Tree for Uncertain Data ...... 4 Biao Qin, Yuni Xia, and Fang Li

Efficient Privacy-Preserving Link Discovery ...... 16 Xiaoyun He, Jaideep Vaidya, Basit Shafiq, Nabil Adam, Evimaria Terzi, and Tyrone Grandison

On Link Privacy in Randomizing Social Networks ...... 28 Xiaowei Ying and Xintao Wu

Sentence-Level Novelty Detection in English and Malay ...... 40 Agus T. Kwee, Flora S. Tsai, and Wenyin Tang

Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words ...... 52 Mani Arun Kumar and Madan Gopal

Cool Blog Classification from Positive and Unlabeled Examples ...... 62 Kritsada Sriphaew, Hiroya Takamura, and Manabu Okumura

Thai Word Segmentation with Hidden Markov Model and Decision Tree ...... 74 Poramin Bheganan, Richi Nayak, and Yue Xu

An Efficient Method for Generating, Storing and Matching Features for Text Mining ...... 86 Shing-Kit Chan and Wai Lam

Robust Graph Hyperparameter Learning for Graph Based Semi-supervised Classification ...... 98 Krikamol Muandet, Sanparith Marukatat, and Cholwich Nattee XVI Table of Contents

Regularized Local Reconstruction for Clustering ...... 110 Jun Sun, Zhiyong Shen, Bai Su, and Yidong Shen

Clustering with Lower Bound on Similarity ...... 122 Mohammad Al Hasan, Saeed Salem, Benjarath Pupacdi, and Mohammed J. Zaki

Approximate Spectral Clustering ...... 134 Liang Wang, Christopher Leckie, Kotagiri Ramamohanarao, and James Bezdek

An Integration of Fuzzy Association Rules and WordNet for Document Clustering ...... 147 Chun-Ling Chen, Frank S.C. Tseng, and Tyne Liang

Nonlinear Data Analysis Using a New Hybrid Data Clustering Algorithm...... 160 Ureerat Wattanachon, Jakkarin Suksawatchon, and Chidchanok Lursinsap

A Polynomial-Delay Polynomial-Space Algorithm for Extracting Frequent Diamond Episodes from Event Sequences ...... 172 Takashi Katoh, Hiroki Arimura, and Kouichi Hirata

A Statistical Approach for Binary Vectors Modeling and Clustering .... 184 Nizar Bouguila and Khalid Daoudi

Multi-resolution Boosting for Classification and Regression Problems ... 196 Chandan K. Reddy and Jin-Hyeong Park

Interval Data Classification under Partial Information: A Chance-Constraint Approach ...... 208 Sahely Bhadra, J. Saketha Nath, Aharon Ben-Tal, and Chiranjib Bhattacharyya

Negative Encoding Length as a Subjective Interestingness Measure for Groups of Rules ...... 220 Einoshin Suzuki

The Studies of Mining Frequent Patterns Based on Frequent Pattern Tree ...... 232 Show-Jane Yen, Yue-Shi Lee, Chiu-Kuang Wang, Jung-Wei Wu, and Liang-Yu Ouyang

Discovering Periodic-Frequent Patterns in Transactional Databases ..... 242 Syed Khairuzzaman Tanbeer, Chowdhury Farhan Ahmed, Byeong-Soo Jeong, and Young-Koo Lee Table of Contents XVII

Quantifying Asymmetric Semantic Relations from Query Logs by Resource Allocation ...... 254 Zhiyuan Liu, Yabin Zheng, and Maosong Sun

Acquiring Semantic Relations Using the Web for Constructing Lightweight Ontologies ...... 266 Wilson Wong, Wei Liu, and Mohammed Bennamoun

Detecting Abnormal Events via Hierarchical Dirichlet Processes ...... 278 Xian-Xing Zhang, Hua Liu, Yang Gao, and Derek Hao Hu

Active Learning for Causal Bayesian Network Structure with Non-symmetrical Entropy ...... 290 Guoliang Li and Tze-Yun Leong

A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification ...... 302 Bin Liu, Ying Yang, Geoffrey I. Webb, and Janice Boughton

Analysis of Variational Bayesian Matrix Factorization ...... 314 Shinichi Nakajima and Masashi Sugiyama

Variational Bayesian Approach for Long-Term Relevance Feedback ..... 327 Sabri Boutemedjet and Djemel Ziou

Detecting Link Hijacking by Web Spammers ...... 339 Young-joo Chung, Masashi Toyoda, and Masaru Kitsuregawa

A Data Driven Ensemble Classifier for Credit Scoring Analysis ...... 351 Nan-Chen Hsieh, Lun-Ping Hung, and Chia-Ling Ho

A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams ...... 363 Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham

Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data ...... 376 Peipei Li, Qianhui Liang, Xindong Wu, and Xuegang Hu

Exploiting the Block Structure of Link Graph for Efficient Similarity Computation ...... 389 Pei Li, Yuanzhe Cai, Hongyan Liu, Jun He, and Xiaoyong Du

Online Feature Selection Algorithm with Bayesian 1 Regularization .... 401 Yunpeng Cai, Yijun Sun, Jian Li, and Steve Goodison

Feature Selection for Local Learning Based Clustering ...... 414 Hong Zeng and Yiu-ming Cheung XVIII Table of Contents

RV-SVM: An Efficient Method for Learning Ranking SVM ...... 426 Hwanjo Yu, Youngdae Kim, and Seungwon Hwang

A Kernel Framework for Protein Residue Annotation ...... 439 Huzefa Rangwala, Christopher Kauffman, and George Karypis

Dynamic Exponential Family Matrix Factorization ...... 452 Kohei Hayashi, Jun-ichiro Hirayama, and Shin Ishii

A Nonparametric Bayesian Learning Model: Application to Text and Image Categorization ...... 463 Nizar Bouguila and Djemel Ziou

Short Papers

Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem ...... 475 Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap

Using Highly Expressive Contrast Patterns for Classification - Is It Worthwhile? ...... 483 Elsa Loekito and James Bailey

Arif Index for Predicting the Classification Accuracy of Features and Its Application in Heart Beat Classification Problem ...... 491 Muhammad Arif, Fayyaz A. Afsar, Muhammad Usman Akram, and Adnan Fida

UCI++: Improved Support for Algorithm Selection Using Datasetoids ...... 499 Carlos Soares

Accurate Synthetic Generation of Realistic Personal Information ...... 507 Peter Christen and Agus Pudjijono

An Efficient Approximate Protocol for Privacy-Preserving Association Rule Mining ...... 515 Murat Kantarcioglu, Robert Nix, and Jaideep Vaidya

Information Extraction from Thai Text with Unknown Phrase Boundaries ...... 525 Peerasak Intarapaiboon, Ekawit Nantajeewarawat, and Thanaruk Theeramunkong

A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Ensemble Learning Techniques ...... 533 Jakkrit TeCho, Cholwich Nattee, and Thanaruk Theeramunkong Table of Contents XIX

A Hybrid Approach to Improve Bilingual Multiword Expression Extraction ...... 541 Jianyong Duan, Mei Zhang, Lijing Tong, and Feng Guo

Addressing the Variability of Natural Expression in Sentence Similarity with Semantic Structure of the Sentences ...... 548 Palakorn Achananuparp, Xiaohua Hu, and Christopher C. Yang

Scalable Web Mining with Newistic ...... 556 Ovidiu Dan and Horatiu Mocian

Building a Text Classifier by a Keyword and Unlabeled Documents..... 564 Qiang Qiu, Yang Zhang, and Junping Zhu

A Discriminative Approach to Topic-Based Citation Recommendation ...... 572 Jie Tang and Jing Zhang

Romanization of Thai Proper Names Based on Popularity of Usages .... 580 Akegapon Tangverapong, Atiwong Suchato, and Proadpran Punyabukkana

Budget Semi-supervised Learning ...... 588 Zhi-Hua Zhou, Michael Ng, Qiao-Qiao She, and Yuan Jiang

When does Co-training Work in Real Data? ...... 596 Charles X. Ling, Jun Du, and Zhi-Hua Zhou

Classification of Audio Signals Using a Bhattacharyya Kernel-Based Centroid Neural Network ...... 604 Dong-Chul Park, Yunsik Lee, and Dong-Min Woo

Sparse Kernel Learning and the Relevance Units Machine ...... 612 Junbin Gao and Jun Zhang

Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces ...... 620 Su Yan, Hai Wang, Dongwon Lee, and C. Lee Giles

Clustering Documents Using a Wikipedia-Based Concept Representation ...... 628 Anna Huang, David Milne, Eibe Frank, and Ian H. Witten

An Instantiation of Hierarchical Distance-Based Conceptual Clustering for Propositional Learning ...... 637 Ana Funes, Cesar Ferri, Jose Hern´andez-Orallo, and Maria Jos´eRam´ırez-Quintana

Computing Substitution Matrices for Genomic Comparative Analysis ... 647 Minh Duc Cao, Trevor I. Dix, and Lloyd Allison XX Table of Contents

Mining Both Positive and Negative Impact-Oriented Sequential Rules from Transactional Data ...... 656 Yanchang Zhao, Huaifeng Zhang, Longbing Cao, Chengqi Zhang, and Hans Bohlscheid

Aggregated Subset Mining ...... 664 Albrecht Zimmermann and Bj¨orn Bringmann

Hot Item Detection in Uncertain Data ...... 673 Thomas Bernecker, Hans-Peter Kriegel, Matthias Renz, and Andreas Zuefle

Spanning Tree Based Attribute Clustering ...... 681 Yifeng Zeng, Jorge Cordero Hernandez, and Shuyuan Lin

The Effect of Varying Parameters and Focusing on Bus Travel Time Prediction ...... 689 Jo˜ao M. Moreira, Carlos Soares, Al´ıpio M. Jorge, and JorgeFreiredeSousa

Transfer Learning Action Models by Measuring the Similarity of Different Domains ...... 697 Hankui Zhuo, Qiang Yang, and Lei Li

On Optimal Rule Mining: A Framework and a Necessary and Sufficient Condition of Antimonotonicity ...... 705 Yannick Le Bras, Philippe Lenca, and St´ephane Lallich

Discovering Action Rules That Are Highly Achievable from Massive Data ...... 713 Einoshin Suzuki

Extracting Fuzzy Rules for Detecting Ventricular Arrhythmias Based on NEWFM ...... 723 Dong-Kun Shin, Sang-Hong Lee, and Joon S. Lim

Trace Mining from Distributed Assembly Databases for Causal Analysis ...... 731 Shohei Hido, Hirofumi Matsuzawa, Fumihiko Kitayama, and Masayuki Numao

Let’s Tango – Finding the Right Couple for Feature-Opinion AssociationinSentimentAnalysis...... 741 Kam Tong Chan and Irwin King

An Efficient Candidate Pruning Technique for High Utility Pattern Mining ...... 749 Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee Table of Contents XXI

Grouped ECOC Conditional Random Fields for Prediction of Web User Behavior ...... 757 Yong Zhen Guo, Kotagiri Ramamohanarao, and Laurence A.F. Park

CLHQS: Hierarchical Query Suggestion by Mining Clickthrough Log ... 764 Depin Chen, Ning Liu, Zhijun Yin, Yang Tong, Jun Yan, and Zheng Chen

X-Tracking the Changes of Web Navigation Patterns ...... 772 Long Wang and Christoph Meinel

Tree-Based Method for Classifying Websites Using Extended Hidden Markov Models ...... 780 Majid Yazdani, Milad Eftekhar, and Hassan Abolhassani

Emotion Recognition of Pop Music Based on Maximum Entropy with Priors ...... 788 Hui He, Bo Chen, and Jun Guo

Simultaneously Finding Fundamental Articles and New Topics Using a Community Tracking Method ...... 796 Tieyun Qian, Jaideep Srivastava, Zhiyong Peng, and Phillip C.Y. Sheu

Towards a Novel Association Measure via Web Search Results Mining ...... 804 Xiaojun Wan and Jianguo Xiao

A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data ...... 813 Ke Zhang, Marcus Hutter, and Huidong Jin

Mining Outliers with Faster Cutoff Update and Space Utilization ...... 823 Chi-Cheong Szeto and Edward Hung

Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data ...... 831 Hans-Peter Kriegel, Peer Kr¨oger, Erich Schubert, and Arthur Zimek

K-Dominant Skyline Computation by Using Sort-Filtering Method ..... 839 Md. Siddique and Yasuhiko Morimoto

Effective Boosting of Na¨ıve Bayesian Classifiers by Local Accuracy Estimation ...... 849 Zhipeng Xie

COMUS: Ontological and Rule-Based Reasoning for Music Recommendation System ...... 859 Seungmin Rho, Seheon Song, Eenjun Hwang, and Minkoo Kim XXII Table of Contents

Spatial Weighting for Bag-of-Visual-Words and Its Application in Content-Based Image Retrieval ...... 867 Xin Chen, Xiaohua Hu, and Xiajiong Shen

Item Preference Parameters from Grouped Ranking Observations ...... 875 Hideitsu Hino, Yu Fujimoto, and Noboru Murata

Cross-Channel Query Recommendation on Commercial Mobile Search Engine: Why, How and Empirical Evaluation ...... 883 ShunkaiFu,BingfengPi,YingZhou,MichelC.Desmarais, Weilei Wang, Song Han, and Xunrong Rao

Data Mining for Intrusion Detection: From Outliers to True Intrusions ...... 891 Goverdhan Singh, Florent Masseglia, C´eline Fiot, Alice Marascu, and Pascal Poncelet

A Multi-resolution Approach for Atypical Behaviour Mining ...... 899 Alice Marascu and Florent Masseglia

Change Analysis in Spatial Data by Combining Contouring Algorithms with Supervised Density Functions ...... 907 Chun Sheng Chen, Vadeerat Rinsurongkawong, Christoph F. Eick, and Michael D. Twa

Centroid Neural Network with Spatial Constraints ...... 915 Dong-Chul Park

Diversity in Combinations of Heterogeneous Classifiers ...... 923 Kuo-Wei Hsu and Jaideep Srivastava

Growth Analysis of Neighbor Network for Evaluation of Damage Progress ...... 933 Ken-ichi Fukui, Kazuhisa Sato, Junichiro Mizusaki, Kazumi Saito, Masahiro Kimura, and Masayuki Numao

A Parallel Algorithm for Finding Related Pages in the Web by Using Segmented Link Structures ...... 941 Xiaoyan Shen, Junliang Chen, Xiangwu Meng, Yujie Zhang, and Chuanchang Liu

Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study ...... 949 Xiaoshi Yin, Xiangji Huang, Qinmin Hu, and Zhoujun Li Table of Contents XXIII

Similarity-Based Feature Selection for Learning from Examples with Continuous Values ...... 957 Yun Li, Su-Jun Hu, Wen-Jie Yang, Guo-Zi Sun, Fang-Wu Yao, and Geng Yang

Application-Independent Feature Construction from Noisy Samples .... 965 Dominique Gay, Nazha Selmaoui, and Jean-Fran¸cois Boulicaut

Estimating Optimal Feature Subsets Using Mutual Information Feature Selector and Rough Sets ...... 973 Sombut Foitong, Pornthep Rojanavasu, Boonwat Attachoo, and Ouen Pinngern

Speeding Up Similarity Search on a Large Time Series Dataset under Time Warping Distance ...... 981 Pongsakorn Ruengronghirunya, Vit Niennattrakul, and Chotirat Ann Ratanamahatana

A Novel Fractal Representation for Dimensionality Reduction of Large Time Series Data ...... 989 Poat Sajjipanon and Chotirat Ann Ratanamahatana

Clustering Data Streams in Optimization and Domains ..... 997 Ling-Yin Wei and Wen-Chih Peng

CBDT: A Concept Based Approach to Data Stream Mining ...... 1006 Stefan Hoeglinger, Russel Pears, and Yun Sing Koh

Meaningful Subsequence Matching under Time Warping Distance for Data Stream ...... 1013 Vit Niennattrakul and Chotirat Ann Ratanamahatana

An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise ...... 1021 Peng Zhang, Xingquan Zhu, Yong Shi, and Xindong Wu

On Pairwise Kernels: An Efficient Alternative and Generalization Analysis ...... 1030 Hisashi Kashima, Satoshi Oyama, Yoshihiro Yamanishi, and Koji Tsuda

A Family-Based Evolutional Approach for Kernel Tree Selection in SVMs ...... 1038 Ithipan Methasate and Thanaruk Theeramunkong

An Online Incremental Learning Vector Quantization ...... 1046 Ye Xu, Shen Furao, Osamu Hasegawa, and Jinxi Zhao XXIV Table of Contents

On Mining Rating Dependencies in Online Collaborative Rating Networks ...... 1054 Hady W. Lauw, Ee-Peng Lim, and Ke Wang

Learning to Extract Relations for Relational Classification ...... 1062 Steffen Rendle, Christine Preisach, and Lars Schmidt-Thieme

Author Index ...... 1073