Knowledge Discovery with Bayesian Networks Li

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by ScholarBank@NUS KNOWLEDGE DISCOVERY WITH BAYESIAN NETWORKS BY LI GUOLIANG A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT DEPARTMENT OF COMPUTER SCIENCE SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE COMPUTING 1, LAW LINK, SINGAPORE 117590 JANUARY, 2009 © COPYRIGHT 2009 BY LI GUOLIANG Acknowledgement I owe a great debt to many people who assisted me in my graduate education. I would like to take this opportunity to cordially thank: Associate Professor Tze-Yun Leong, my thesis supervisor, in School of Computing, National University of Singapore, for her guidance, patience, encouragement, and support throughout my years of graduate training. Especially when I wavered amongst different topics, her encouragement and support were very important to me. I would not have made it through the training without her patience and belief in me. Associate Professor Louxin Zhang in Department of Mathematics, National University of Singapore, for his detailed and constructive discussions in Bioinformatics problems. His expertise in phylogenetics has enlightened me the application of Bayesian analysis in ancestral state reconstruction accuracy. Members and alumni of the Medical Computing Lab and the Biomedical Decision Engineering (Bide) group: Associate Professor Kim-Leng Poh, Dr Han Bin, Rohit Joshi, Chen Qiong Yu, Yin Hong Li, Zhu Ai Ling, Zeng Yi Feng, Wong Swee Seong, Lin Li, Ong Chen Hui, Dinh Thien Anh, Vu Xuan Linh, Dinh Truong Huy Nguyen, Sreeram Ramachandran, for their caring advice, insightful comments and suggestions. Mr. Guo Wen Yuan for his broad discussion of philosophical issues and his recommendation of the book “ Philosophical theories of probability ” by Donald Gillies. This book was very helpful in enlightening me the different philosophical ii perspectives of probability. Dr Chew-Kiat Heng for his kindness to share the heart disease data with me. Dr Qiu Wen Jie for sharing his biological domain knowledge in Actin cytoskeleton genes of yeast with me. Dr. Qiu Long for taking his precious time to proofread my thesis. Singapore-MIT Alliance (SMA) classmates: Zhao Qin, Yu Bei, Qiu Long, Qiu Qiang, Edward Sim, Ou Han Yan and Yu Xiao Xue. The discussion with them is broad and insightful for my research. Finally, I owe a great debt to my family: my parents, my sisters, my daughter Wei Hang, and especially to my wife Wang Hui Qin for their love and support. iii Table of Contents Acknowledgement .......................................................................................................................... ii Table of Contents ............................................................................................................................iv Summary ........................................................................................................................................ ix List of Tables ................................................................................................................................ xii List of Figures .............................................................................................................................. xiii Glossary of Terms .........................................................................................................................xv Chapter 1 Introduction ...............................................................................................................1 1.1 Background and Motivation............................................................................................2 1.1.1 Causal Knowledge.....................................................................................................5 1.1.2 Causal Knowledge Discovery with Bayesian Networks ...........................................6 1.1.3 Why Bayesian Networks? .........................................................................................7 1.1.4 Data ...........................................................................................................................8 1.1.5 Hypotheses ..............................................................................................................10 1.1.6 Domain Knowledge.................................................................................................10 1.2 The Application Domain...............................................................................................11 1.3 Contributions.................................................................................................................12 1.4 Structure of the Thesis ..................................................................................................17 1.5 Declaration of Work......................................................................................................18 Chapter 2 Background and Related Work................................................................................19 2.1 Knowledge Discovery with Correlation Information....................................................19 2.1.1 Classification...........................................................................................................20 2.1.2 Regression...............................................................................................................22 2.1.3 Clustering ................................................................................................................22 2.1.4 Association Rule Mining.........................................................................................23 2.1.5 Time-series Analysis ...............................................................................................23 2.1.6 Disadvantages of Correlation-based Knowledge Discovery...................................24 2.2 Causal Knowledge Discovery with Randomized Experiments.....................................25 iv 2.3 Bayesian Network Learning..........................................................................................26 2.3.1 Basics of Bayesian Networks..................................................................................26 2.3.2 Bayesian Network Construction from Domain Knowledge....................................29 2.3.3 Reasons to Learn Bayesian Networks from Data....................................................30 2.3.4 Categories of Bayesian Network Learning Problems..............................................30 2.3.5 Parameter Learning in Bayesian Networks .............................................................32 2.3.6 Structure Learning in Bayesian Networks...............................................................33 2.3.7 Causal Knowledge Discovery with Bayesian Networks .........................................44 2.3.8 Active Learning of Bayesian Networks with Interventional Data...........................46 2.3.9 Applications of Causal Knowledge Discovery with Bayesian Networks................48 Chapter 3 Hypothesis Generation in Knowledge Discovery with Bayesian Networks ...........49 3.1 Hypothesis Generation with Bayesian Network Structure Learning ............................50 3.1.1 Probabilities of Individual Bayesian Network Structures .......................................50 3.1.2 Probabilities of Individual Edges in Bayesian Networks ........................................51 3.1.3 An Application of Hypothesis Generation to a Heart Disease Problem..................53 3.2 Hypothesis Generation with Variable Grouping ...........................................................57 3.2.1 Observations from Microarray Data........................................................................57 3.2.2 Related Work...........................................................................................................60 3.2.3 Learning Algorithm with Variable Grouping...........................................................62 3.2.4 Important Issues in the Proposed Algorithm ...........................................................69 3.2.5 Experiments with Variable Grouping ......................................................................71 3.2.6 Discussion ...............................................................................................................75 3.3 Summary of Hypothesis Generation .............................................................................76 Chapter 4 Hypothesis Refinement for Knowledge Discovery with Bayesian Networks.........78 4.1 Background and Motivation..........................................................................................79 4.1.1 Related Work...........................................................................................................81 4.2 Representation of Topological Domain Knowledge in Bayesian Networks .................82 4.2.1 Compilation of Domain Knowledge from the Rule Format to the Matrix Format..85 4.2.2 Checking the Consistency of Topological Constraints............................................85 4.2.3 Induction with Topological Constraints ..................................................................88 4.3 Bayesian Network Structure Learning with Domain Knowledge.................................90 4.4 An Iterative Process to Identify Topological Constraints with Bayesian Network Structure Learning.......................................................................................................................91 v 4.5 Empirical Evaluation of Topological Constraints on Bayesian Network Structure Learning ......................................................................................................................................93

Knowledge Discovery with Bayesian Networks Li

CONVERGENCE OR COLLISION? Navigating the Creative, Commercial and Regulatory Challenges Facing Media Over the Next Decade

Channel Guide August 2018

Discovery Communications to Launch First Channel on Freeview in UK

SIMPLESTREAM SECURES FURTHER £2M FUNDING to GROW INTERNATIONALLY and ROLLOUT TVPLAYER Submitted By: Simplestream Thursday, 5 February 2015

Highlights December 2012

TV & Radio Channels Astra 2 UK Spot Beam

016Cd9e416e4569af696511dd3

Basic-Cable-Channels.Pdf

Estudio Sobre El Mercado De Servicios De Televisión

GEITF Programme 2010.Pdf

Media Nations 2020 UK Report

Broadcast Website Info D183 MCPS.Xlsx