Conference Program

Total Page:16

File Type:pdf, Size:1020Kb

Conference Program December 11 to 14, 2011 Conference Program TABLE OF CONTENTS MESSAGE FROM CONFERENCE GENERAL CHAIRS ................................. 1 MESSAGE FROM THE PROGRAM CO-CHAIRS .......................................... 3 SPONSORS ............................................................................................. 5 GENERAL INFORMATION ......................................................................... 6 INFORMATION FOR SESSION CHAIRS AND PRESENTERS........................ 7 CONFERENCE PROGRAM AT-A-GLANCE ................................................. 9 WORKSHOPS AT-A-GLANCE ........................................................................ 9 TECHNICAL PROGRAM AT-A-GLANCE ........................................................ 10 CONFERENCE PROGRAM ...................................................................... 12 WORKSHOP PROGRAM .............................................................................. 12 TECHNICAL CONFERENCE PROGRAM ....................................................... 31 DEMOS AND EXHIBITIONS PROGRAM ........................................................ 43 TUTORIALS PROGRAM ............................................................................... 45 ICDM 2011 KEYNOTE SPEECHES ........................................................... 46 SOCIAL PROGRAM ................................................................................ 48 ORGANISING COMMITTEE ..................................................................... 49 CONFERENCE CHAIRS ............................................................................... 49 PROGRAM COMMITTEE ............................................................................. 50 PROGRAM COMMITTEE MEMBERS ............................................................. 50 EXTERNAL REVIEWERS ............................................................................. 56 VOLUNTEERS ............................................................................................ 58 VOLUNTEERS (STUDENT TRAVEL AWARD RECIPIENTS) ............................ 58 USEFUL LINKS ...................................................................................... 59 ABOUT VANCOUVER ............................................................................. 59 CONFERENCE VENUE LOCATIONS ........................................................ 60 CONFERENCE ROOM FLOOR PLANS .......................................................... 61 Conference program MESSAGE FROM CONFERENCE GENERAL CHAIRS On behalf of the organizing committee for the IEEE International Conference on Data Mining 2011, we would like to welcome you to the conference, its numerous workshops and other activities in the program. We extend our warmest welcome to all attendees from around the globe to this vibrant and world class city of Vancouver, Canada. The lively and animated city of Vancouver surrounded by ocean, mountains, and a majestic river valley, is reputed among the best cities in the world with endless indoor and outdoor activities, outstanding international restaurants and first-rate entertainment. The city of more than 2 million people (metro), barely 125 years old, houses a large number of IT companies, among them many world-leaders in business intelligence, analytics, video game industry, and software development in general. We wish you a memorable time in Vancouver, experiencing this multicultural city celebrating the heritage of its inhabitants and showcasing the important cultures of the region’s First Nations. This is the eleventh time ICDM is organized, each time elsewhere in the world. It has grown to a respectable size, considered today as the premier international research conference on data mining. After San Jose, USA (2001), Maebashi City, Japan (2001), Melbourne, USA (2003), Brighton, UK (2004), Houston, USA (2005), Hong Kong, China (2006), Omaha, USA (2007), Pisa, Italy (2008), Miami, USA (2009), Sydney, Australia (2010), this is the 5th time ICDM comes back to North America. Organizing an international conference, the magnitude of ICDM, is not an easy task. It requires the coordination of a multitude of individuals and a tremendous effort by an army of volunteers. Without these volunteers, many of them working on the organization for a full year, such a large conference cannot take place. The individuals involved are from the students helping at the conference registration, packing of bags, setting up audio, visual equipment, to selection committee members and scientific reviewers as well as the organizers of the program, the logistics, publicity, sponsorship and finance. We would like to officially warm heartedly thank all volunteers for their hard work, and to whom the success of this conference is attributed. In particular, we would like to extend our gratitude for an excellent technical program to: Diane Cook and Jian Pei (Program co-Chairs), Myra Spiliopoulou and Haixun Wang (Workshops Co-Chairs), Evimaria Terzi and Jure Leskovec (Tutorials Co-Chairs), Ming Hua and Alex Thomo (Exhibits and Demos Co-Chairs), Ashok Srivastava and Larry Holder (Contest Co-Chairs), Rosa Meo and Alfredo Cuzzocrea (PhD Forum Co-Chairs), and George Karypis (Panel Chair). Of course the program in itself is not enough to have a successful and memorable conference and we owe the accomplishment of this fine organization to Xindong Wu (ICDM Steering Committee Chair), Charles X. Ling (Finance Chair), Carson Leung (Local Arrangements Chair), Olfa Nasraoui, Latifur Khan and Jie Tang (Publicity Co-Chairs), Wei Ding and Gabor Melli (Sponsorship Co-Chairs), Justin Fagnan (Webmaster), and Juzhen Dong (Cyberchair). We would like to highlight their tremendous contributions. At its 10th anniversary last year, ICDM started the ICDM highest impact paper award to recognize the best paper from the ICDM proceedings 10 years prior, that has had the most impact (methodology, applications, products) over the intervening decade. This year, this award goes to Xifeng Yan, and Jiawei Han for "gSpan: Graph-Based Substructure Pattern Mining". ICDM has always been an innovator among top tier conferences in data mining in terms of improving the quality of its program. ICDM was the first to introduce the double-blind review process in 2007 in which the identity of authors is concealed to reviewers. This was demonstrated to improve the chances for newcomers to publish peer reviewed papers, and it reduced the bias towards known names. This year, to avoid a bias during the discussions about papers among committee members, the identity of reviewers were concealed. 1 Conference program We also introduced for the first time the PhD Forum, a meeting in the format of a workshop allowing PhD students to present and discuss their research strategies and the new trends in data mining research. Our gratitude also goes to our corporate sponsors (listed in the program) for their important support. Last but not least, we would like to thank the many authors who submitted research papers to the conference and all the attendees whose contributions resulted in this enriching conference. We wish you a productive conference with new discoveries, new relationships and networking and a most enjoyable time in Vancouver. Wei Wang and Osmar R. Zaïane ICDM 2011 Conference General Chairs 2 Conference program MESSAGE FROM THE PROGRAM CO-CHAIRS Welcome to the Eleventh IEEE International Conference on Data Mining! The ICDM conference is held in varying locations throughout the world, and the 2011 ICDM conference will be held for the first time in Canada, in the metropolitan city of Vancouver, British Columbia. ICDM is established as the world’s premier research conference in data mining. The conference provides an opportunity to present original research results, to relay practical development experiences, and to spark ideas for new research directions. The ICDM conference is truly an international forum. During its eleven-year history, the conference has been held in eight countries around the world. This year’s conference continues this global trend: our organizing and program committee members represent 36 countries and authors submitted papers from 47 different countries. This year’s conference was extremely competitive. A total of 786 papers were submitted for review. Each paper was reviewed by at least three program committee members and the selection was made on the basis of discussion among the reviewers, a vice chair, and the program co-chairs. This year, 101 regular papers and 47 short papers were accepted for presentation, representing an acceptance rate of 18.83%. In keeping with the goal of advancing the state-of-the-art in data mining, paper topics span numerous active and emerging topic areas including feature analysis, classification, privacy, anomaly detection, semi-supervised learning, clustering, recommendations, time series mining, sparse representations, data summarization, and mining data found in graphs, video, images, and text. Reviewing and selecting papers from such a large set of research groups required the coordinated effort of many individuals. We want to thank the 36 Vice Chairs and 273 members of the Program Committee who provided insightful feedback to the authors and helped with this selection process. Of the papers that were submitted, 139 had student first authors. These authors represent the future of our field and we want to thank the National Science Foundation for sponsoring awards that funded student travel to attend the conference. The awards committee selected two papers that
Recommended publications
  • Nonconvex Online Support Vector Machines
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. XX, XXXXXXX 2011 1 Nonconvex Online Support Vector Machines S¸eydaErtekin,Le´on Bottou, and C. Lee Giles, Fellow, IEEE Abstract—In this paper, we propose a nonconvex online Support Vector Machine (SVM) algorithm (LASVM-NC) based on the Ramp Loss, which has the strong ability of suppressing the influence of outliers. Then, again in the online learning setting, we propose an outlier filtering mechanism (LASVM-I) based on approximating nonconvex behavior in convex optimization. These two algorithms are built upon another novel SVM algorithm (LASVM-G) that is capable of generating accurate intermediate models in its iterative steps by leveraging the duality gap. We present experimental results that demonstrate the merit of our frameworks in achieving significant robustness to outliers in noisy data classification where mislabeled training instances are in abundance. Experimental evaluation shows that the proposed approaches yield a more scalable online SVM algorithm with sparser models and less computational running time, both in the training and recognition phases, without sacrificing generalization performance. We also point out the relation between nonconvex optimization and min-margin active learning. Index Terms—Online learning, nonconvex optimization, support vector machines, active learning. Ç 1INTRODUCTION N supervised learning systems, the generalization perfor- solutions are guaranteed to reach global optima and are not Imance of classification algorithms is known to be greatly sensitive to initial conditions. The popularity of convexity improved with large margin training. Large margin classi- further increased after the success of convex algorithms, fiers find the maximal margin hyperplane that separates the particularly with SVMs, which yield good generalization training data in the appropriately chosen kernel-induced performance and have strong theoretical foundations.
    [Show full text]
  • Applied Computing 2004
    AI and Computational Logic and Image Analysis (AI) Track Chair: C.C. Hung, Southern Polytechnic State University, USA Track Co-Chair: Agostinho Rosa, LaSEEB –ISR – IST, Portugal Track Editorial...........................................................................................................................................3 Experimenting with a Real-Size Man-Hill to Optimize Pedagogical Paths..........................................3 Gregory Valigiani, University of Calais Yannick Jamont, Paraschool Company Claire Bourgeois Republique, University of Bourgogne Raphael Biojout, Paraschool Company Evelyne Lutton, INRIA Rocquencourt Pierre Collet, University of Calais An Hybridization of an Ant-based Clustering Algorithm with Growing Neural Gas Networks for Classification Tasks ...................................................................................................................................8 Marco A. Montes de Oca, Monterrey Institute of Technology, Mexico Leonardo Garrido, Monterrey Institute of Technology, Mexico José L. Aguirre, Monterrey Institute of Technology, Mexico Reinforcement Learning Agents with Primary Knowledge Designed by Analytic Hierarchy Process.......................................................................................................................................................18 Kengo Katayama, Okayama University of Science, Japan Takahiro Koshiishi, Okayama University of Science, Japan Hiroyuki Narihisa, Okayama University of Science, Japan Estimating Manifold Dimension by Inverion
    [Show full text]
  • JCDL 2004) Global Reach and Diverse Impact June 7-11, 2004 Tucson, Arizona, USA
    Call for Papers Joint Conference on Digital Libraries (JCDL 2004) Global Reach and Diverse Impact June 7-11, 2004 Tucson, Arizona, USA http://www.jcdl2004.org/ Jointly sponsored by Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval (SIGIR) Special Interest Group on Hypertext, Hypermedia, and the Web (ACM SIGWEB) and Institute of Electrical and Electronics Engineers Computer Society (IEEE Computer Society) Technical Committee on Digital Libraries (TCDL) In cooperation with The American Society for Information Science and Technology (ASIS&T) Coalition for Networked Information DELOS Network of Excellence on Digital Libraries The Joint Conference on Digital Libraries is a major international forum focusing on digital libraries and associated technical, practical, and social issues. JCDL encompasses the many meanings of the term “digital libraries,” including (but not limited to) new forms of information institutions; operational information systems with all manner of digital content; new means of selecting, collecting, organizing, and distributing digital content; digital preservation and archiving; and theoretical models of information media, including document genres and electronic publishing. The intended community for this conference includes those interested in aspects of digital libraries such as infrastructure; institutions; metadata; content; services; digital preservation; system design; implementation; interface design; human-computer interaction; performance evaluation; usability evaluation;
    [Show full text]
  • Citeseerx: 20 Years of Service to Scholarly Big Data
    CiteSeerX: 20 Years of Service to Scholarly Big Data Jian Wu Kunho Kim C. Lee Giles Old Dominion University Pennsylvania State University Pennsylvania State University Norfolk, VA University Park, PA University Park, PA [email protected] [email protected] [email protected] ABSTRACT access to a growing number of researchers. Mass digitization par- We overview CiteSeerX, the pioneer digital library search engine, tially solved the problem by storing document collections in digital that has been serving academic communities for more than 20 years repositories. The advent of modern information retrieval methods (first released in 1998), from three perspectives. The system per- significantly expedited the process of relevant search. However, spective summarizes its architecture evolution in three phases over documents are still saved individually by many users. In 1997, three the past 20 years. The data perspective describes how CiteSeerX computer scientists at the NEC Research Institute (now NEC Labs), has created searchable scholarly big datasets and made them freely New Jersey, United States – Steven Lawrence, Kurt Bollacker, and available for multiple purposes. In order to be scalable and effective, C. Lee Giles, conceived an idea to create a network of computer AI technologies are employed in all essential modules. To effectively science research papers through citations, which was to be imple- train these models, a sufficient amount of data has been labeled, mented by a search engine, the prototype CiteSeer. Their intuitive which can then be reused for training future models. Finally, we idea, automated citation indexing [8], changed the way researchers discuss the future of CiteSeerX.
    [Show full text]
  • Next Generation Citeseer
    Keynote Address Next Generation CiteSeer Dr. C. Lee Giles The Pennsylvania State University University Park, PA [email protected] Abstract CiteSeer, a computer and information science search engine and digital library, has been a radical departure for scientific document access and analysis. With nearly 700,000 documents, it has sometimes two million page views a day making it one of the most popular document access engines in science. CiteSeer is also portable, having been extended to ebusiness (eBizSearch) and more recently to academic business documents (SMEALSearch). CiteSeer is based on two features: actively acquiring new documents and automatic tagging and linking of metadata information inherent in an academic document's syntactic structure. Why is CiteSeer so popular? We discuss this and methods for providing new tagged metadata such as institutions and acknowledgements, new data resources and services and the issues in automation. We then discuss the next generation of CiteSeer. Bio Dr. C. Lee Giles is the David Reese Professor at the School of Information Sciences and Technology, Professor of Computer Science and Engineering, Professor of Supply Chain and Information Systems, and Associate Director of Research at the eBusiness Research Center at the Pennsylvania State University, University Park, PA. He has been associated with Princeton University, the University of Pennsylvania, Columbia University, the University of Pisa and the University of Maryland; and has taught at all of the above. His current research and consulting
    [Show full text]
  • Halifax, Nova Scotia Canada August 13 17, 2017
    Halifax, Nova Scotia ­ Canada August 13 ­ 17, 2017 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining Contents KDD 2017 Agenda at a Glance KDD 2017 Chairs’ Welcome Message Program Highlights Keynote Talks Research and Applied Data Science Tracks Applied Data Science Track Invited Talks Applied Data Science Panel KDD Panel Tutorials Hands‐On Tutorials Workshops KDD 2017 Tutorial Program KDD 2017 Workshop Program Full‐Day Workshops ‐ Monday August 14, 8:00am ‐5:00pm Half Day Workshops ‐ Monday August 14, 8:00am ‐ 12:00pm Half Day Workshops ‐ Monday August 14, 1:00pm ‐ 5:00pm KDD Cup Workshop ‐ Wednesday August 16, 1:30pm ‐ 5:00pm KDD 2017 Hands‐On Tutorial Program Tuesday August 15, 2017 Wednesday August 16, 2017 Thursday August 17, 2017 KDD 2017 Conference Program Monday August 14 2017 Detailed Program Monday August 14, 2017 5:15pm – 7:00pm, KDD 2017 Opening Session ‐ Scoabank Centre Tuesday August 15, 2017 Detailed Program Wednesday August 16, 2017 Detailed Program Thursday August 17, 2017 Detailed Program KDD 2017 Conference Organizaon KDD 2017 Organizing Commiee Research Track Senior Program Commiee Applied Data Science Track Senior Program Commiee Research Track Program Commiee Applied Data Science Track Program Commiee KDD 2017 Sponsors & Supporters Halifax, Points of Interest Useful Links and Emergency Contacts KDD 2017 Agenda at a Glance Saturday, August 12th Level 8 ­ Summit 8:00AM ­ 5:00PM Workshop: Broadening Participation in Data Mining (BPDM) ­ Day 1 Suite/Meeting Room 5 4:00PM ­ 6:00PM KDD 2017 Registration
    [Show full text]
  • Automatic Identification of Informative Sections of Web Pages
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 9, SEPTEMBER 2005 1233 Automatic Identification of Informative Sections of Web Pages Sandip Debnath, Prasenjit Mitra, Nirmal Pal, and C. Lee Giles Abstract—Web pages—especially dynamically generated ones—contain several items that cannot be classified as the “primary content,” e.g., navigation sidebars, advertisements, copyright notices, etc. Most clients and end-users search for the primary content, and largely do not seek the noninformative content. A tool that assists an end-user or application to search and process information from Web pages automatically, must separate the “primary content sections” from the other content sections. We call these sections as “Web page blocks” or just “blocks.” First, a tool must segment the Web pages into Web page blocks and, second, the tool must separate the primary content blocks from the noninformative content blocks. In this paper, we formally define Web page blocks and devise a new algorithm to partition an HTML page into constituent Web page blocks. We then propose four new algorithms, ContentExtractor, FeatureExtractor, K-FeatureExtractor, and L-Extractor. These algorithms identify primary content blocks by 1) looking for blocks that do not occur a large number of times across Web pages, by 2) looking for blocks with desired features, and by 3) using classifiers, trained with block-features, respectively. While operating on several thousand Web pages obtained from various Web sites, our algorithms outperform several existing algorithms with respect to runtime and/or accuracy. Furthermore, we show that a Web cache system that applies our algorithms to remove noninformative content blocks and to identify similar blocks across Web pages can achieve significant storage savings.
    [Show full text]
  • Presenter and Discussant Biographical Sketches
    PRESENTER AND DISCUSSANT BIOGRAPHICAL SKETCHES MARGARET (MEG) BLUME-KOHOUT is an assistant professor in the Department of Economics at the University of New Mexico, and a senior fellow of UNM’s Robert Wood Johnson Foundation Center for Health Policy. Her current research focuses on evaluating the productivity and efficiency of federally-funded research for biomedical sciences, including impacts on private R&D funding at universities, patenting, graduate training and the scientific workforce, scientific publications, and biopharmaceutical innovation. She has received grant awards from the National Science Foundation and National Institutes of Health to pursue this research. Her paper on effects of changes in targeted, disease-specific NIH funding on pharmaceutical innovation was recently published in the Journal of Policy Analysis and Management. Prior to her current appointment, she was an assistant professor in the Department of Economics at the University of Waterloo in Ontario, Canada, and a doctoral fellow at the RAND Corporation. She has also worked in simulation science at Los Alamos National Laboratory, in an environmental microbiology laboratory, and in strategic management consulting and health outcomes contract research for pharmaceutical firms. She holds a Ph.D. in policy analysis from the Pardee RAND Graduate School, an M.S. in environmental health sciences from the University of California-Berkeley, and a B.A. in economics from Williams College. YANG CHEN is a Ph.D. candidate in the Department of Computer Science at University of North Carolina at Charlotte. His research interests include information visualization and visual analytics. His recent research is collaborated with Microsoft Research Asia, and is focused on visual analysis of text and social media data.
    [Show full text]
  • Global Reach and Diverse Impact
    Joint Conference on Digital Libraries (JCDL) 2004 Tucson, Arizona, U.S.A. June 7-11, 2004 Global Reach and Diverse Impact Website: http://www.jcdl2004.org Scope he Joint Conference on Digital Libraries is a major international forum focusing on digital libraries and associated technical, practical, and social issues. JCDL encompasses the many meanings of the term “digital libraries,” including (but not limited to) new forms of information institutions; operational information T systems with all manner of digital content; new means of selecting, collecting, organizing, and distributing digital content; digital preservation and archiving; and theoretical models of information media, including document genres and electronic publishing. Participation is sought from all parts of the world and from the full range of disciplines and professions involved in digital library research and practice, including computer science, electrical engineering, information science, information systems, librarianship, archival science and practice, museum studies and practices, technology, education, medicine, intelligence analysis, social sciences, and humanities. All domains – academia, government, industry, and others – are encouraged to participate as presenters or attendees. Paper Submission CDL 2004 encourages submission of papers that illustrate the digital library’s global reach and diverse impact. Examples include (but are not limited to): major national or cross-regional digital library projects; case studies exemplifying successful international collaboration and impact; innovative cultural preser- J vation and dissemination projects aimed at preserving unique and indigenous knowledge; the development and use of digital library technologies for national (and international) security; digital library research for intelligence and security informatics; digital library techniques, content, and services based on cyberinfrastructure; digital library research for enhancing e-learning and education; and other novel and high-impact digital library efforts.
    [Show full text]