ACM-BCB 2016

The 7th ACM Conference on , Computational Biology, and Health Informatics

October 2-5, 2016

Organizing Committee

General Chairs: Steering Committee: Ümit V. Çatalyürek, Georgia Institute of Technology Aidong Zhang, State University of New York at Buffalo, Genevieve Melton-Meaux, University of Minnesota Co-Chair May D. Wang, Georgia Institute of Technology and Program Chairs: Emory University, Co-Chair John Kececioglu, University of Arizona Srinivas Aluru, Georgia Institute of Technology Adam Wilcox, University of Washington Tamer Kahveci, University of Florida Christopher C. Yang, Drexel University Workshop Chair: Ananth Kalyanaraman, Washington State University

Tutorial Chair: Mehmet Koyuturk, Case Western Reserve University

Demo and Exhibit Chair: Robert (Bob) Cottingham, Oak Ridge National Laboratory

Poster Chairs: Lin Yang, University of Florida Dongxiao Zhu, Wayne State University

Registration Chair: Preetam Ghosh, Virginia Commonwealth University

Publicity Chairs Daniel Capurro, Pontificia Univ. Católica de Chile A. Ercument Cicek, Bilkent University Pierangelo Veltri, U. Magna Graecia of Catanzaro

Student Travel Award Chairs May D. Wang, Georgia Institute of Technology and Emory University Jaroslaw Zola, University at Buffalo, The State University of New York

Student Activity Chair Marzieh Ayati, Case Western Reserve University Dan DeBlasio, Carnegie Mellon University

Proceedings Chairs: Xinghua Mindy Shi, U of North Carolina at Charlotte Yang Shen, Texas A&M University

Web Admins: Anas Abu-Doleh, The Ohio State University Hyun Anderson, The Ohio State University Jonathan Kho, Georgia Institute of Technology

2 ACM-BCB 2016 Program REGISTRATION Sunday 7:30 – 16:00 / Monday-Tuesday 8:00 – 16:00 / Wednesday 8:00 – 11:00

Sunday, October 2, 2016 Continental Breakfast 8 am Location: Fourth Floor Breakstation Seattle 1 Seattle 2 Seattle 3 Belltown Pioneer First Hill Emerald II 8:25 am Tutorial 1

(T1) 10 am BigLS MAHA ParBio Tutorial 2 pSALSA TDA-Bio CNB-MAC (8:25 am – (8:25 am – (10 am – (T2) (8:25 am – (8:50 am – (8:50 am – 12 pm) 11:40 am) 12 pm) 12 pm) 12:05 pm) 12 pm 12 pm) (1:30 pm – (1:30 pm – (1:30 pm – (1:30 pm – 1 pm (1:20 pm – 5 pm) 5 pm) Tutorial 3 5:30 pm) 5:15 pm) BrainKDD 6 pm) (1 pm – (T3) 4 pm 5 pm) Tutorial 4 (T4) Student Networking and Social Event at the Seattle Great Wheel 6 pm Meet at the pre-event space on the 4th floor

WORKSHOPS * CNB-MAC 3rd International Workshop on Computational Network Biology: Modeling, Analysis, and Control Organizers: Byung-Jun Yoon, Xiaoning Qian and Tamer Kahveci BigLS 4th ACM International Workshop on Big Data in Life Sciences Organizers: Jaroslaw Zola and Ananth Kalyanaraman MAHA 1st International Workshop on Methods and Applications in Healthcare Analytics Organizers: Fei Wang, Jyotishman Pathak and Nigam Shah pSALSA 3rd Workshop on Parallel Software Libraries for Sequence Analysis Organizers: Srinivas Aluru TDA-Bio 1st International Workshop on Topological Data Analysis in Biomedicine Organizers: Bala Krishnamoorthy and Bei Wang Phillips ParBio 5th International Workshop on Parallel and Cloud-based Bioinformatics and Biomedicine Organizers: Mario Cannataro and John A. Springer BrainKDD The 3rd International Workshop on Data Mining and Visualization for Brain Science Organizers: Shuiwang Ji, Lei Shi, Hanghang Tong, Shuai Huang and Paul Thompson

* See page 12 for detailed workshop programs.

TUTORIALS * Sunday, October 2 8:30-9:30 T1: Combinatorial methods for nucleic acid sequence analysis Presenters: Sreeram Kannan and Mark Chaisson, University of Washington 10:00-12:00 T2: Network Science meets Tissue-specific Biology Presenters: Shahin Mohammadi and Ananth Grama, Purdue University 1:30-3:30pm T3: Big Data for Discovery Science Presenters: Ben Heavner (Institute for Systems Biology), Ravi Madduri (Argonne National Lab), Jack Van Horn (University of Southern California), and Naveen Ashish (Fred Hutchinson Cancer Research Center) 4:00-6:00pm T4: Deep Learning for Bioinformatics and Health Informatics Presenter: Sungroh Yoon, Seoul National University

3 Monday, October 3 (Seattle II) 11:00-12:00pm T5: Data-Driven Analysis of Untargeted Metabolomics Datasets Presenter: Soha Hassoun, Tufts University 1:30-3:30pm T6: Evolutionary for Protein Structure Modeling Presenters: Emmanual Sapin, Amarda Shehu, and Kenneth De Jong, George Mason University

Tuesday, October 4 (Seattle II) 10:00-12:00pm T7: The ISB Cancer Genomics Cloud Presenter: Sheila Reynolds, Institute for Systems Biology 1:30-3:30pm T8: Living the DREAM: Crowdsourcing biomedical research through challenges and ensembles Presenters: Gaurav Pandey, Lara Mangravite, Solveig Sieberts, Robert Vogel, and Gustavo Stolovitzky, Icahn School of Medicine at Mount Sinai, SAGE Bionetworks

* See page 19 for more information on individual tutorials.

Student Networking and Social Event

All students and postdocs are invited to the student-networking event, which will be held Sunday at 6pm. This year the event will include an excursion to The Seattle Great Wheel (the largest observation wheel on the west coast). The student activity is focused on developing programs for student growth through educational and networking opportunities. This is the second year of a recognized student activity and last year improved the student relationships during the conference. The event will begin at 6:00PM on Sunday, October 2, 2016 with scientific speed networking in the pre-event space on the 4th floor before the short walk to Elliot Bay. (The networking event is free but please bring cash for a discounted admission to the Great Wheel.)

4 Monday, October 3, 2016 8:00 – Continental Breakfast 10:00 Location: Fourth Floor Breakstation Opening Remarks (Location: Seattle I & II) 8:15 – General Chairs: Ümit V. Çatalyürek, Georgia Institute of Technology & 8:30 Genevieve Melton-Meaux, University of Minnesota Program Chairs: John Kececioglu, University of Arizona & Adam Wilcox, University of Washington Keynote Talk 1 (Location: Seattle I & II) 8:30 – Don’t forget the notes: Why NLP is key to health care transformation 9:30 Wendy W. Chapman, University of Utah Session Chair: Genevieve Melton-Meaux, University of Minnesota 9:30 – Morning Break 10:00 Session 1A Session 1B Session 1C Location: Seattle I Location: Seattle II Location: Seattle III Systems Biology Demo Presentations & Tutorials Automated Diagnosis and

Session Chair: Anna Ritz, Session Chair: Robert W. Cottingham, Prediction Reed College Oak Ridge National Laboratory Session Chair: Jaroslaw Zola, University at Buffalo 10:00 Demo Presentations 10:00 Tin Nguyen, Diana Diaz, Sorin 10:00 Shou-Hsuan Stephen Huang, Ming- Draghici. “TOMAS: A novel “Software tools for sequence Chih Shih, Youli Zu. “A Multi- TOpology-aware Meta-Analysis comparison, sequence mapping, and Objective Flow Cytometry Profiling approach applied to System biology” for B-Cell Lymphoma Diagnosis” patient-specific healthcare outcome 10:30 prediction”. Presenter: Ankit Agrawal, 10:30 Huey Eng Chua, Sourav S. Bhowmick, Northwestern University Ying Sha, Janani Venugopalan, May Jie Zheng, Lisa Tucker-Kellogg. D. Wang. “A Novel Temporal “TAPESTRY: Network-centric Target 10:20 Similarity Measure for Patients Prioritization in Disease-related “The CMH Variant Warehouse – A Based on Irregularly Measured Data Signaling Networks” Catalog of Genetic Variation in Patients in Electronic Health Records” of a Children’s Hospital". Presenter: 11:00 11:00 10:00 – Byunggil Yoo, Children’s Mercy Hospital 12:00 Aisharjya Sarkar, Yuanfang Ren, Aydin Saribudak, Adarsha A. Subick, Rasha Elhesha, Tamer Kahveci. 10:40 Joshua A. Rutta, M. Ümit Uyar, “The “Counting independent motifs in “KBase: Developing collaborative Alzheimer's Disease Neuroimaging probabilistic networks” analyses of biological function using Initiative. Gene Expression Based Computation Methods for 11:30 Narratives and App Catalog”. Presenter: Alzheimer's Disease Progression Robert W. Cottingham, Oak Ridge Paola Pesantez-Cabrera, Ananth using Hippocampal Volume Loss Kalyanaraman. “Detecting National Laboratory and MMSE Scores” Communities in Biological Bipartite Tutorial Networks” 11:30 11:00 Qiuling Suo, Hongfei Xue, Jing Gao, T5: Data-Driven Analysis of Untargeted Aidong Zhang. “Risk factor analysis Metabolomics Datasets based on deep learning models” Presenter: Soha Hassoun, Tufts University 12:30 – Lunch 13:30 (On your own)

5

Session 2A Session 2B Session 2C Location: Seattle I Location: Seattle II Location: Seattle III Biological Modeling Tutorials Applications to Healthcare Processes Session Chair: Tamer Kahveci, Session Chair: Beth Britt, University of Florida University of Washington 13:30 13:30 Hanyu Jiang, Morisa Manzella, Luka Shital Kumar Mishra, Sourav S. Bhowmick, Djapic, Narayan Ganesan. Huey Eng Chua, Jie Zheng. Predictive “Computational Framework for in-Silico “Modeling of Drug Effects on Signaling Study of Virtual Cell Biology via Process Pathways in Diverse Cancer Cell Lines” Simulation and Multiscale Modeling” 14:00 14:00 Qian Cheng, Jingbo Shang, Joshua Juen, Muhibur Rasheed, Nathan Clement, Jiawei Han, Bruce Schatz. “Mining Abhishek Bhowmick, Chandrajit Bajaj. T6: Evolutionary Algorithms for Discriminative Patterns to Predict Health “Statistical Framework for Uncertainty Protein Structure Modeling Status for Cardiopulmonary Patients” Quantification in Computational Presenters: Emmanual Sapin, 13:30 – Molecular Modeling” 14:30 Amarda Shehu, and Kenneth De 15:30 Paul D. Martin, Michael Rushanan, 14:30 Jong, George Mason University Thomas Tantillo, Christoph Lehmann, Aviel Jeet Banerjee, Tanvi Ranjan, Ritwik D. Rubin. “Applications of Secure Location Kumar Layek. “Stability Analysis of Sensing in Healthcare” Population Dynamics Model in Microbial Biofilms with Non-participating Strains” 15:00 Sai Nivedita Chandrasekaran, Alexios 15:00 Koutsoukas, Jun Huan. “Investigating Shuo Wang, Mansooreh Ahmadian, Multiview and Multitask Learning Minghan Chen, John Tyson, Young Cao. Frameworks for Predicting Drug-Disease “A Hybrid Stochastic Model of the Associations” Budding Yeast Cell Cycle Control

Mechanism” 15:30 – Afternoon Break – Refreshments Provided 16:00 16:00 – ACM SIGBio General Meeting 18:00 Location: Seattle I & II 18:00 – Poster Reception – Light hors d'oeuvres & Cash bar 20:00 (see page 21 for list of posters)

DEMOS (Belltown) “Software tools for sequence comparison, sequence mapping, and patient-specific healthcare outcome prediction”. Presenter: Ankit Agrawal, Northwestern University “The CMH Variant Warehouse – A Catalog of Genetic Variation in Patients of a Children’s Hospital". Presenter: Byunggil Yoo, Children’s Mercy Hospital “KBase: Developing collaborative analyses of biological function using Narratives and App Catalog”. Presenter: Robert W. Cottingham, Oak Ridge National Laboratory

6

Tuesday, October 4, 2016 8:00 – Continental Breakfast 10:00 Location: Fourth Floor Breakstation Keynote Talk 2 (Location: Seattle I & II) 8:30 – An evolutionary biologist's skeptical search for computational biology 9:30 Joseph Felsenstein, University of Washington Session Chair: Srinivas Aluru, Georgia Institute of Technology 9:30 – Morning Break 10:00 Session 3A Session 3B Session 3C Location: Seattle I Location: Seattle II Location: Seattle III Inferring Phylogenies and Tutorials Text Mining and Classification Haplotypes Session Chair: Xinghua Mindy Shi, Session Chair: Ananth University of North Carolina at Kalyanaraman, Charlotte Washington State University 10:00 T7: The ISB Cancer Genomics Cloud 10:00 Jucheol Moon, Oliver Eulenstein. Presenter: Sheila Reynolds, Institute for Majid Rastegar-Mojarad, Ravikumar “Robinson-Foulds Median Trees: A Systems Biology Komandur Elayavilli, Liwei Wang, Clique-based Heuristic” Rashmi Prasad, Hongfang Liu. “Prioritizing Adverse Drug Reaction 10:30 and Drug Repositioning Candidates Alexey Markin, Oliver Eulenstein. generated by Literature-Based “Manhattan Path-Difference Median Discovery” Trees” 10:30 11:00 Kishlay Jha, Wei Jin. “Mining Novel Misagh Kordi, Mukul S. Bansal. “Exact Knowledge from Biomedical Literature Algorithms for Duplication-Transfer- using Statistical Measures and 10:00 – Loss Reconciliation with Non-Binary Domain Knowledge” Gene Trees” 12:00 11:00 11:30 Ramakanth Kavuluru, Maria Ramos- Olivia Choudhury, Ankush Morales, Tara Holaday, Amanda G. Chakrabarty, Scott Emrich. “HAPI- Williams, Laura Haye, Julie Cerel. Gen: Highly Accurate Phasing and “Classification of Helpful Comments Imputation of Genotype Data” on Online Suicide Watch Forums” 11:30 Haotian Xu, Ming Dong, Dongxiao Zhu, Alexander Kotov, April Idalski Carcone, Sylvie Naar-King. “Text Classification with Topic-based Word Embedding and Convolutional Neural Networks” Women in Bioinformatics Panel 12:00 – Chair: May D. Wang, Lunch 13:30 Georgia Institute of Technology & (On your own) Emory University

7

Session 4A Session 4B Session 4C Location: Seattle I Location: Seattle II Location: Seattle III Sequence Analysis and Genome Tutorials Knowledge Representation Assembly Applications Session Chair: Oliver Eulenstein, Session Chair: Naveena Yanamala, Iowa State University Centers for Disease Control and Prevention 13:30 T8: Living the DREAM: Crowdsourcing 13:30 Rahul Nihalani, Srinivas Aluru. biomedical research through challenges Naveen Ashish, Arihant Patawari, “Effective Utilization of Paired Reads and ensembles Simrat Singh Chhabra, Arthur W. to Improve Length and Accuracy of Presenters: Gaurav Pandey, Lara Toga. “Name Similarity for Composite Contigs in Genome Assembly” Mangravite, Solveig Sieberts, Robert Element Name Matching” Vogel, and Gustavo Stolovitzky, Icahn 14:00 School of Medicine at Mount Sinai, SAGE 14:00 Priyanka Ghosh, Ananth Bionetworks Edward W Huang, Sheng Wang, Kalyanaraman. “A Fast Sketch-based Runshun Zhang, Baoyan Liu, Assembler for Genomes” Xuezhong Zhou, ChengXiang Zhai. “PaReCat: Patient Record 14:30 Subcategorization for Precision 13:30 – Subrata Saha, Sanguthevar Traditional Chinese Medicine” 15:30 Rajasekaran. “POMP: a powerful splice mapper for RNA-seq reads” 14:30 Michael R. Wyatt II, Travis Johnston, 15:00 Mia Papas, Michela Taufer. Tony Pan, Patrick Flick, Chirag Jain, “Development of a Scalable Method Yongchao Liu, Srinivas Aluru. for Creating Food Groups Using the “Kmerind: A Flexible Parallel Library NHANES Dataset and MapReduce” for K-mer Indexing of Biological Sequences on Distributed Memory 15:00 Systems” Shahin Mohammadi, Ananth Grama. “De novo identification of cell type hierarchy with application to compound marker detection” 15:30 – Afternoon Break – Refreshments Provided 16:00 16:00 – NSF Sponsored Student Research Forum 17:30 Location: Seattle I & II 17:30 – Break (for banquet setup) 19:00 Cash Bar at 18:30 19:00 – Banquet 21:30 Location: Seattle I, II & III

8

Wednesday, October 5, 2016 8:00 – Continental Breakfast 10:00 Location: Fourth Floor Breakstation Keynote Talk 3 (Location: Seattle I & II) 8:30 – Data, Predictions, and Decisions 9:30 Eric Horvitz, Research Session Chair: Ümit V. Çatalyürek, Georgia Institute of Technology 9:30 – Morning Break 10:00 Session 5A Session 5B Location: Seattle I Location: Seattle II

Protein Structure and Dynamics Applications to Microbes and Imaging Genetics Session Chair: Sreeram Kannan, Univ. of Washington Session Chair: Mark Clement, Brigham Young Univ. 10:00 10:00 Dong Si. “Automatic Detection of Beta-barrel from Medium Jeffrey D. McGovern, Eric Johnson, Alex Dekhtyar, Michael Resolution Cryo-EM Density Maps” Black, Christopher Kitts, Jennifer Vanderkelen. “Library-Based 10:30 Microbial Source Tracking via Strain Identification” Tatiana Maximova, Daniel Carr, Erion Plaku, Amarda Shehu. 10:30 “Sample-based Models of Protein Structural Transitions” Serghei Mangul, David Koslicki. “Reference-free comparison of microbial communities via de Bruijn graphs” 10:00 – 11:00 11:00 12:00 Dario Ghersi, Roberto Sanchez. “Recovering Bound Forms of Protein Structures Using the Elastic Network Model and Md Ashad Alam, Osamu Komori, Vince Calhoun, Yu-Ping Wang. Molecular Interaction Fields” “Robust Kernel Canonical Correlation Analysis to Detect Gene- 11:30 Gene Interaction for Imaging Genetics Data” Ramu Anandakrishnan, Mayank Daga, Alexey Onufriev, Wu- 11:30 Chun Feng. “Multiscale Approximation with Graphical Md Ashad Alam, Vince Calhoun, Yu-Ping Wang. “Influence Processing Units for Multiplicative Speedup in Molecular Function of Multiple Kernel Canonical Analysis to Identify Dynamics” Outliers in Imaging Genetics Data” 12:00 – Noon Break – Refreshments Provided 13:30 Session 6A Session 6B Location: Seattle I Location: Seattle II

Protein and RNA Analysis Advancing Algorithms and Methods Session Chair: John Kececioglu, University of Arizona Session Chair: Adam Wilcox, University of Washington 13:30 13:30 Deeptak Verma, Gevorg Grigoryan, Chris Bailey-Kellogg. Soumi Ray, Adam Wright. “Detecting Anomalies in Alert Firing “OCoM-SOCoM: Combinatorial Mutagenesis Library Design within Clinical Decision Support Systems using Anomaly/Outlier Optimally Combining Sequence and Structure Information” Detection Techniques” 14:00 14:00 Byunghan Lee, Junghwan Baek, Seunghyun Park, Sungroh Chih-Wen Cheng, Ying Sha, May D. Wang. “InterVisAR: An Yoon. “deepTarget: End-to-end Learning Framework for Interactive Visualization for Association Rule Search” microRNA Target Prediction using Deep Recurrent Neural 13:30 – Networks” 14:30 15:30 14:30 , Niina Haiminen. “Scalable Algorithms at Genomic Resolution to fit LD Distributions” Naozumi Hiranuma, Scott Lundberg, Su-In Lee. “CloudControl: Leveraging many public ChIP-seq control experiments to better remove background noise” 15:00 Wenruo Bai, Jeffrey Bilmes, William S. Noble. “Bipartite matching generalizations for peptide identification in tandem mass spectrometry”

9 Keynotes

Monday, October 3 | Wendy W. Chapman, University of Utah Title: Don’t forget the notes: Why NLP is key to health care transformation Abstract: The majority of clinical information useful for patient care and research is locked in clinical notes and only accessible with great pain and effort. Natural Language Processing has the potential to unlock the information in the notes to support phenotyping for precision medicine, quality improvement, and health services research. This talk will illustrate the potential of NLP through existing applications, will describe the challenges of making NLP a real and scalable solution, and will provide concrete suggestions for how the audience can help NLP reach its potential in health care and discovery. Biography: Dr. Chapman earned her Bachelor’s degree in Linguistics and her PhD in Medical Informatics from the University of Utah in 2000. From 2000-2010 she was a National Library of Medicine postdoctoral fellow and then a faculty member at the University of Pittsburgh. She joined the Division of Biomedical Informatics at the University of California, San Diego in 2010. In 2013, Dr. Chapman became the chair of the University of Utah, Department of Biomedical Informatics where she continues her research on natural language processing in the context of informatics solutions to problems that vex health care. Tuesday October 4 | Joseph Felsenstein, University of Washington Title: An evolutionary biologist's skeptical search for computational biology Abstract: This talk will explain how, starting with an interest in biology, and also in computers, I gradually learned how to use computers to illuminate problems in evolutionary biology. Along the way I learned about theoretical population genetics, learned why it is not always best to write your theorems down, and how fascination with a problem may indicate that something more important is at stake. I moved from theoretical population genetics to algorithms for inferring evolutionary trees (phylogenies). The statistical viewpoint that was standard in theoretical population genetics turned out to be highly controversial among taxonomists studying evolution, and was also considered unnecessary by computer scientists. Both of these groups of people were wrong. I will argue that computer scientists and biologists should indeed communicate, but that this is best done via a statistician. I will argue that a parametric model based on evolutionary theory is crucial, but that one should beware of believing in it too much. Computation is essential in biology, but I wonder whether there really is a field called Computational Biology. Or ought to be.. In the era of Complex Systems and Big Data, a Simple Systems perspective based on Small Data has distinct advantages. As we reach limits in what genome data can tell us, a concern for efficient use of those data will become important, and an understanding of the effects of statistical noise will prove important, and it should encourage a little more humility. Biography: Joe Felsenstein grew up in Philadelphia, and attended the University of Wisconsin, where he got involved with theoretical population genetics in the lab of James F. Crow. He went on to do his Ph.D. with Richard Lewontin at the University of Chicago, and a postdoctoral fellowship with Alan Robertson at the Institute of Animal Genetics at the University of Edinburgh. He has since then been a faculty member of the Department of Genetics at the University of Washington, Seattle, and its successor the Department of Genome Sciences, and he is also jointly appointed in the Department of Biology. Although his training was thus in theoretical population genetics, since his graduate work he has also been fascinated by the reconstruction of evolutionary trees (phylogenies). This led him to promote and develop likelihood methods for inference of phylogenies, to apply the bootstrap method to investigating which parts of them are well-supported, and to release the first general program package for inferring phylogenies, PHYLIP, in 1980. He wishes that computational biology textbooks would pay more attention to phylogenies, which are the basic structures for making sense of multispecies data. His work in this area has also led him into the extreme and byzantine conflicts in systematics -- some of his closest friendships in computational phylogenetics were cemented by shared victimization. Joe has received a number of very nice honors, which are listed at his online CV, but which false modesty dictates that he not mention here.

10 Wednesday October 5 | Eric Horvitz, Title: Data, Predictions, and Decisions Abstract: I will describe several projects that highlight directions with the use of to enhance patient care and to build insights about health and wellbeing. I will first present research on leveraging large amounts of data drawn from electronic health records to predict outcomes and to guide decisions. I will focus on opportunities with reducing readmissions and identifying patients at risk for hospital-associated infection, emphasizing the promise of coupling predictive models with decision analysis. I will reflect on challenging directions with these efforts, including causal inference and transfer learning. Then, I will move to studies of health and well-being from non-traditional sources of data, including the use of anonymized logs of online activities. I will present results on pharmacovigilance, detecting the onset of illness, and building deeper understandings of episodic information needs of patients over phases of illness. I’ll wrap up by discussing several aspirational directions with data, predictions, and decisions. Biography: Eric Horvitz is technical fellow at Microsoft, where he serves as director of the Microsoft Research lab at Redmond. His interests span theoretical and practical challenges with computing systems that learn from data and that can perceive, reason, and decide. His efforts and collaborations have led to fielded systems in the areas of transportation, healthcare, ecommerce, and operating systems. Eric received MD and PhD degrees at . He has been elected fellow of the National Academy of Engineering (NAE), AAAI, ACM, AAAS, and the American Academy of Arts and Sciences. He received the Feigenbaum Prize and the ACM-AAAI Allen Newell Award for his research contributions. He currently serves on the Board of Regents of the National Library of Medicine, the and Telecommunications Board (CSTB), and the advisory board for the Center for Causal Discovery at the University of Pittsburgh. More information can be found at http://research.microsoft.com/~horvitz.

11 Workshops 3rd International Workshop on Computational Network 10:20-12:00 Session 1 Biology: Modeling, Analysis, and Control (CNB-MAC) “Sparse Feature Selection for Classification and Prediction of 8:45am-6pm, October 2, 2016 Metastasis in Endometrial Cancer”, Mehmet Eren Ahsen, Organizers: Todd Boren, Nitin Singh, Burook Misganaw, David Mutch, Kathleen Moore, Floor Backes, Carolyn McCourt, Jayanthi Byung-Jun Yoon, Texas A&M University Lea, David Miller, Michael White and Mathukumalli Xiaoning Qian, Texas A&M University Vidyasagar Tamer Kahveci, University of Florida “Data Requirements for Model-Based Cancer Prognosis https://cnbmac.org/ Prediction”, Lori Dalton and Mohammadmahdi Rezaei Yousefi “Comparison of tissue/disease specific integrated networks Next-generation high-throughput profiling technologies have using directed graphlet signatures”, Arzu Burcak Sonmez and enabled more systematic and comprehensive studies of living Tolga Can systems. Network models play crucial roles in understanding “Optimal ROC-based Classification and Performance Analysis the complex interactions that govern biological systems, and under Bayesian Uncertainty Models”, Lori Dalton their interactions with external environment. The inference “SNP by SNP by Environment Interaction Network of and analysis of such complex networks and network-based Alcoholism”, Amin Zollanvari and Gil Alterovitz analysis of large-scale measurement data have already shown strong potentials for unveiling the key mechanisms of 12:00-13:20 Lunch Break complex diseases as well as for designing improved therapeutic strategies. At the same time, the inference and 13:20-15:00 Session 2 analysis of complex biological networks pose new exciting “Towards targeted combinatorial therapy design for the challenges for computer science, signal processing, control, treatment of castration-resistant prostate cancer”, Osama and statistics. The CNB-MAC workshop aims to provide an Arshad and Aniruddha Datta international scientific forum for presenting recent advances “Combination therapy design for maximizing sensitivity and in computational network biology that involve modeling, minimizing toxicity”, Kevin Matlock, Noah Berlow, Charles analysis, and control of biological systems under different Keller and Ranadip Pal conditions, and system-oriented analysis of large-scale OMICS “DIGNiFI: Discovering causative genes for orphan diseases data. using protein-protein interaction networks”, Xiaoxia Liu, Zhihao Yang, Hongfei Lin, Michael Simmons and Zhiyong Lu “SEQUOIA: Significance enhanced network querying through context-sensitive random walk and minimization of network conductance”, Hyundoo Jeong and Byung-Jun Yoon 08:50-9:00 Opening Remarks “Finding Low-Conductance sets with Dense interactions (FLCD) for better protein complex prediction”, Yijie Wang and 09:00-10:00 Keynote Talk by Dr. Su-In Lee (University of Xiaoning Qian Washington), Talk Title: Mining Big Data for Molecular Marker 15:00-15:20 Coffee Break Identification 15:20-16:40 Session 3 10:00-10:20 Coffee Break “Inferring Microbial Interaction Networks from Metagenomic Data Using SgLV-EKF ”, Mustafa Alshawaqfeh, Ahmad Bani Younes and Erchin Serpedin “Stochastic Modeling and Simulation of Reaction-Diffusion System with Hill Function Dynamics”, Minghan Chen, Fei Li, Shuo Wang and Yang Cao “Interpretive Time-Frequency Analysis of Genomic Sequences”, Hamed Hassani Saadi, Reza Sameni and Amin Zollanvari “Comprehensive Evaluation of RNA-seq Quantification Methods for Linearity”, Haijing Jin, Ying-Wooi Wan and Zhandong Liu

16:40-17:05 Five-Minute Lightning Talks for Posters 17:05-17:50 Poster Session 17:50-18:00 Closing Remarks

12 4th ACM International Workshop on Big Data in Life 8:25am-8:30am: Opening Remarks Sciences (BigLS) 8:25am-5:30pm, October 2, 2016 8:30am-10:00 Regular Papers Organizers: “Exploration of regression models for cancer noncoding Jaroslaw Zola, SUNY Buffalo mutation recurrence”, Tanjin Xu, Stephen A. Ramsey. Ananth Kalyanaraman, Washington State University “Optimization of I/O Intensive Genome Assemblies on http://www.bigls.org the Cori Supercomputer with Burst Buffer”, Joshua Pritchett, Bill Andreopoulos. The ever-growing volume and diversity of biological and “Explorations in Very Early Prognosis of the Human biomedical data collections continues to pose new Immune Response to Influenza”, Manu Chaturvedi, challenges and increasing demands on computing and Tomtit Ghosh, Michael Kirby, Xiaoyu Liu, Xiaofeng Ma, data management. The inherent complexity of this Big Shannon Stiverson. Data forces us to rethink how we collect, store, combine and analyze it. BigLS is a workshop series dedicated to 10:00am-10:30am Coffee Break (with student posters the broad theme of Big Data in life sciences. The goal of on display) the workshop is to bring together leading researchers and practitioners working on a diverse range of Big Data 10:30am-12:00pm Keynote Talk by Dr. Nathan Price problems relating to biology and medicine, and engage (Institute of Systems Biology, Arivale, Inc.) them in a discussion about current Big Data problems, Title: Actionable big data for proactive the state of computational tools and analytics, the healthcare challenges and the future trends within life sciences. 12pm-1:30pm Lunch Break

1:30pm-3:10pm Invited Talks – Session 1 Invited talk by Dr. William Stafford Noble (University of Washington), Talk title: “Joint Imputation of Epigenomics Data by Three Dimensional Tensor Factorization” Invited talk by Dr. Adam Margolin (Oregon Health & Science University), Talk title: “Inferring genomic predictors of cancer phenotypes: machine learning, crowd-sourcing, and big data” Q&A session

3:10pm-3:30pm Coffee Break

3:30pm-4:10pm Invited Talks – Session 2 Invited talk by Dr. David Heckerman (Microsoft Research), Talk title: “Embracing big data in genomics”

4:15pm-5:30pm Poster session and interaction

13 1st International Workshop on Methods and 8:30am-8:35am Opening Remarks Applications in Healthcare Analytics (MAHA) 8:30am-5:30pm, October 2, 2016 8:35am-9:10am Session 1 Organizers: “Using a Semi-Automated Modeling Environment to Fei Wang, University of Connecticut Construct a Bayesian, Sepsis Diagnostic System,” Peter Jyotishman Pathak, Cornell University Haug and Jeffrey Ferraro Nigam Shah, Stanford University Invited talk by Dr. Wanpracha Art Chaovalitwongse, Talk https://sites.google.com/site/feiwang03/acm-bcb- title: “Optimization in Medical Analytics: From Data to workshop-on-healthcare-analytics Knowledge to Decisions”

Healthcare is undergoing a massive transition, due to 10:10am-10:30am Coffee Break changes in payment incentives, growth of clinical data warehouses, advances in genome sequencing 10:30am-11:05am Session 2 technology and digital imaging, as well as the increased “Automatic classification of Co-occurring patient role of the patient in managing their own health events,” Alexander Titus, Rebecca Faill and Amar Das information and rapid accumulation of biomedical “On Interestingness Measures for Mining Statistically knowledge. As a result, data analytics techniques, for Significant and Novel Clinical Associations from EMRs,” knowledge discovery and deriving data driven insights Orhan Abar, Richard J. Charnigo, Abner Rayapati and from various data sources, are increasingly important in Ramakanth Kavuluru modern healthcare. Although, effective analytical approaches have been applied in many healthcare 11:40am-1:30pm Lunch Break problems, several challenges remain including: data heterogeneity, sparsity, irregular sampling and the 1:30pm-3:05pm Session 3 difficulty of drawing inferences from such data. This Invited Talk by Dr. Daniela Witten, Talk title: “Learning workshop focuses on novel methodologies and their from time” applications in addressing these emerging healthcare “Automated Verification of Phenotypes using PubMed,” analytics problems from both academia and industry. Ryan Bridges, Jette Henderson, Joyce Ho, Byron Wallace and Joydeep Ghosh

3:05pm-3:30pm Coffee Break

3:30pm-5:20pm Session 4 “Predicting Future Frequent Users of Emergency Departments in California State,” Mayana Pereira, Vikhyati Singh, Chun Pan Hon, T. Greg McKelvey, Shanu Sushmita and Martine De Cock “Predicting human-immunodeficiency virus rebound after therapy initiation/switch using genetic, laboratory, and clinical data,” Mattia Prosperi, Alejandro Pironti, Francesca Incardona, Giuseppe Tradigo and Maurizio Zazzi “Feature Selection Model for Diagnosis, Electronic Medical Records and Geographical Data Correlation,” Giovanni Canino, Qiulings Suo, Pietro H. Guzzi, Giuseppe Tradigo, Aidong Zhang and Pierangelo Veltri

5:20pm-5:30pm Closing Remarks

14 3rd Workshop on Parallel Software Libraries for 8:25am-8:30am Opening Remarks Sequence Analysis (pSALSA) 8:30am-9:15am NCBI Pathogen detection pipeline for 8:25am-5:30pm, October 2, 2016 food safety : SNPs and MLST schemes Organizers: Richa Agarwala, NCBI, NIH Srinivas Aluru, Georgia Tech. http://psalsa.gatech.edu/ 9:15am-10:00am Sketching Biological Sequences for Storage and Computation High-throughput DNA sequencing instruments are Jaroslaw Zola, SUNY Buffalo capable of generating terabytes of sequencing data in a 10:00am-10:30am Coffee Break single experiment at a cost that is affordable on a routine basis. Analyzing such data is fundamental to 10:30am-11:15am High-Throughput Sequencing many applications including genome resequencing, de Analysis on the AWS cloud novo genome sequencing, transcriptome sampling, Mia Champion, Amazon, Inc. metagenomics, and population diversity studies. The 11:15am-12:00pm Analyzing Genomic Data at Scale rate and volume of data generation is exposing the with ADAM limitations of serial bioinformatics software. Effective Frank Austin Nothaft, UC Berkeley exploitation of high performance computing technologies including multicores, accelerators, cluster 12pm-1:30pm Lunch Break and cloud computing platforms can bridge this critical 1:30pm-2:30pm Keynote Talk: A tour of contemporary gap. genome assembly algorithms and software Aydın Buluç, Lawrence Berkeley National Lab The goal of this workshop is to bring together a community of bioinformatics researchers interested in 2:30pm-3:15pm SKESA : Fast and accurate haploid development of parallel algorithms and high genome assembler with application in Pathogen performance computing software for high-throughput detection DNA sequence analysis and its myriad applications. In Alexandre Souvorov, NCBI, NIH particular, this workshop focuses on community-driven 3:15pm-3:35pm Coffee Break development of parallel software libraries to enable the bioinformatics community to more easily exploit high 3:45pm-4:30pm FastEtch: Fast and Efficient Genome performance computing technologies. Development of Assembly Using Sketching such libraries is feasible because bioinformatics Priyanka Ghosh, Washington State University applications often rely on a common core of index and 4:30pm-5:15pm ParBLiSS: A parallel bioinformatics data structures – for e.g., look up tables, suffix library for short sequences trees/arrays, de Bruijn graphs etc. Such libraries have Srinivas Aluru, Georgia Tech proved enormously useful in other application domains (e.g. BLAS libraries for scientific computing), and similar 5:15pm-5:30pm Discussion and Wrap Up efforts are currently underway in other application domains (e.g. parallel graph libraries).

This workshop is supported in part by an NSF/NIH Big Data award to develop parallel software libraries for high throughput sequencing.

15 1st International Workshop on Topological Data 8:50am-9:00am Opening Remarks Analysis in Biomedicine (TDA-Bio) 8:50am-5pm, October 2, 2016 9:00am-10:00am Keynote Talk by Dr. Yusu Wang (The Organizers: Ohio State University) Bala Krishnamoorthy, Washington State University Title: Two Examples of Application of Topological Bei Wang Phillips, University of Utah Methods in Neuron Data Analysis http://www.sci.utah.edu/~beiwang/acmbcbworkshop2 10:00am-10:35am Invited talk by Dr. Chao Chen (City 016/ University of New York) Title: Extracting and Using Topological Structures in the Data sets of different forms in biomedical sciences have Analysis of Biomedical Images seen a huge increase in size and complexity in the past two decades. We have made substantial progress in 10:35am-10:50am Coffee Break various aspects of genomics, e.g., mapping of whole genomes of humans as well as other small and large 10:50am-11:30am Invited talk by Dr. Elizabeth Much species. Similarly, a lot has been explored in the scope (University of Albany) of the sequence-to-structure-to-function paradigm for Title: Utilizing Topological Data Analysis to Detect proteins. At the same time, current data challenges in Periodicity biomedicine are much more diverse, as well as varied in 11:30am-12:05pm Invited talk by Dr. Brittany Fasy scope. The sheer scale and diversity of data sources and (Montana State University) types encountered in today's biomedical data sets often Title: Using Topological Data Analysis to Study Glandular render the routine computational techniques Architecture ineffective. Recently, a suite of new techniques termed 12:05pm-1:30pm Lunch Break topological data analysis (TDA) has shown a lot of promise in discovering structure in large, high- 1:30pm-2:30pm Keynote Talk by Dr. Gunnar Carlsson dimensional, and diverse data sets that other traditional (Stanford University, Ayasdi) techniques could not find. The range of applications Title: The Shape of Biomedical Data includes gene expression analysis, voting, and 2:30pm-3:20pm Demo by Dr. Svetlana Lockwood basketball players' performances, to name a few. This (Washington State University) workshop will present a concise yet self-contained Title: Open Source Software for TDA overview of the key aspects of TDA, with an eye toward motivating the application of these techniques to 3:20pm-3:25pm Coffee Break problems in bioinformatics and computational biology (BCB). While topological techniques have been applied 3:25pm-4:00pm Invited talk by Dr. Bei Wang Phillips previously in certain subfields of BCB (e.g., to model (University of Utah) protein and DNA/RNA 3D structure), they have proved Title: Topological Data Analysis for Brain Networks to be much more versatile and powerful than these 4:00pm-4:35pm Invited talk by Dr. Michael Robinson applications might suggest. We aim to showcase the Title: Finding Cross-Species Orthologs with Local versatility and strength of this suite of techniques in this Topology workshop. 4:40pm-5:10pm Panel Discussion This workshop will expose the audience to the key 5:10pm-5:15pm Closing Remarks fundamental as well as computational aspects of topology. The speakers will introduce (within their talks) basic TDA concepts and techniques, such as simplicial complexes, homology, persistent homology, Reeb graphs and mapper. They will also present how these concepts and techniques have been, or potentially could be, employed to tackle interesting problems in several areas of BCB.

16 5th International Workshop on Parallel and Cloud- 9:55am-10:00am Opening Remarks based Bioinformatics and Biomedicine (ParBio) 10am-12pm, October 2, 2016 10:00am-12:00pm Paper Session Organizers: “High-performance data structures for de novo Mario Cannataro, University "Magna Græcia" of assembly of genomes: cache oblivious generic Catanzaro programming,” Franco Milicchio, Giuseppe Tradigo, John Springer, Purdue University Pierangelo Veltri, Mattia Prosperi http://staff.icar.cnr.it/cannataro/parbio2016/ “G-quadruplex Structure Prediction and Integration in the GenData2020 Data Model,” Giuseppe Tradigo, Due to the availability of high-throughput platforms Francesca Cristiano, Stefano Alcaro, Sergio Greco, (e.g. next generation sequencing, microarray and mass Gianluca Pollastri, Pierangelo Veltri, Mattia Prosperi spectrometry) and clinical diagnostic tools (e.g. medical imaging), a recent trend in Bioinformatics and “A Multi-threaded Algorithm for Mining Maximal Biomedicine is the increasing production of Cohesive Dense Modules from Interaction Networks experimental and clinical data. Considering the complex with Gene Profiles,” Saeed Salem, Aditya Goparaju analysis pipeline of the biomedical research, the “A Survey of Semantic Integration Approaches in bottleneck is more and more moving toward the Bioinformatics,” Chaimaa Messaoudi, Rachida Fissoune, storage, integration, and analysis of experimental data, Hassan Badir as well as their correlation and integration with publicly available data banks. The goal of the ParBio workshop is to bring together scientists in the fields of high performance and cloud computing, computational biology and medicine, to discuss, among the others, the organization of large scale biological and biomedical databases, the parallel/service-based implementation of bioinformatics and biomedical applications, and problems and opportunities of moving biomedical and health applications on the cloud.

17 The 3rd International Workshop on Data Mining and 1:00pm-1:10pm Opening Remarks Visualization for Brain Science (BrainKDD) 1pm-5pm, October 2, 2016 1:10pm-2:10pm Keynote Talk by Dr. Hanchuan Peng (Allen Organizers: Institute of Brain Science), Talk title: “Massive Brain Scale Informatics” Shuiwang Ji, Washington State University

Lei Shi, Chinese Academy of Sciences 2:10pm-2:50pm Paper Session 1 Hanghang Tong, Arizona State University “ENIGMA-Viewer: Interactive Visualization Strategies for Shuai Huang, University of Washington Conveying Effect Sizes in Meta-Analysis,” Guohao Zhang, Paul Thompson, University of Southern California Peter Kochunov, Elliot Hong, Neda Jahanshad, Paul Thompson https://sites.google.com/site/brainkdd2016/ and Jian Chen “Hierarchical Spatio-temporal Visual Analysis of Cluster Understanding brain function is one of the greatest Evolution in Electrocorticography Data,” Sugeerth challenges facing science. Today, brain science is experiencing Murugesan, Kristofer Bouchard, Edward Chang, Max rapid changes and is expected to achieve major advances in Dougherty, Bernd Hamann and Gunther H. Weber the near future. In April 2013, U.S. President Barack Obama formally announced the Brain Research through Advancing 2:50pm-3:20pm Coffee Break Innovative Neurotechnologies Initiative, the BRAIN Initiative. In Europe, the European Commission has recently launched 3:20pm-4:20pm Keynote Talk by Dr. Bingni Wen Brunton the European Human Brain Project (HBP). In the private (University of Washington), sector, the Allen Institute for Brain Science is embarking on a Talk title: Data-intensive approaches to understanding neurial new 10-year plan to generate comprehensive, large-scale computations underlying naturalistic behaviors data in the mammalian cerebral cortex under the MindScope project. These ongoing and emerging projects are expected 4:20pm-5:00pm Paper Session 2 to generate a deluge of data that capture the brain activities “Sub-network based Kernels for Brain Network at different levels of organization. There is thus a compelling Classification,” Biao Jie, Minxia Liu, Xi Jiang and Daoqiang need to develop the next generation of data mining, Zhang visualization and knowledge discovery tools that allow one to “Using Network Alignment for Analysis of Connectomes: make sense of this raw data and to understand how Experiences from a Clinical Dataset,” Pietro Hiram Guzzi, neurological activity encodes information. This workshop will Marianna Milano, Olga Tymofiyeva, Duan Xu, Christopher focus on exploring the forefront between computer science Hess and Mario Cannataro and brain science and inspiring fundamentally new ways of mining, visualization and knowledge discovery from a variety of brain data.

18

Tutorials T1: Combinatorial methods for nucleic acid sequence analysis Sreeram Kannan and Mark Chaisson, University of Washington

Abstract: By deciphering the sequences of genomes, we are able to determine the ‘blueprint’ of how our cells function. Unfortunately while our genomes are polymers of billions of nucleotides, methods for reading sequences are limited to hundreds to thousands of nucleotides. To determine the sequence of a genome, many small fragments of DNA are read, and the genome is inferred through ‘de novo’ fragment assembly, where these short fragments are stitched together to reconstruct the entire genome. In this tutorial, we will discuss information-theoretic barriers and algorithmic methods for reconstructing DNA, and the allied combinatorial problems involved for solving genome structure. In particular, we will discuss the following aspects in detail. 1. The architecture of human genomes and how this creates challenges for fragment assembly. 2. The characteristics of high-throughput sequencing data. 3. Information theoretic barriers for fragment assembly 4. Combinatorial methods for de novo fragment assembly, including novel challenges for assembling reads from third- generation long-read sequencers. 5. Challenges in RNA sequence assembly

T2: Network Science meets Tissue-specific Biology Shahin Mohammadi and Ananth Grama, Purdue University

Abstract: Networks are ubiquitous across disciplines to model systems-level characteristics. In biology, these networks can represent interactions among a diverse set of biomolecules, ranging from genes, proteins, non-coding RNAs, and metabolites. Concurrent with advances in high-throughput technologies, a large body of research has been devoted to methods and models aimed at extracting information from the ever-increasing interaction datasets. However, unlike its counterpart in sequence analysis, a majority of fundamental problems in network analysis are “hard” to solve. In this tutorial, we review and experiment with the latest network analysis tools, including alignment, community detection, and information flow analysis. We will illustrate how to utilize publicly available tissue cell type-specific profiles to construct “tissue-specific interactomes” and how to use these specialized networks to gain novel biological insights.

T3: Big Data for Discovery Science Ben Heavner, Institute for Systems Biology Ravi Madduri, Argonne National Lab Jack Van Horn, University of Southern California Naveen Ashish, Fred Hutchinson Cancer Research Center

Abstract: This 2 hour tutorial will present the “Big Data” biomedical discovery technologies, end-to-end solutions, and applications developed at the Big Data for Discovery Science (BDDS) Center of Excellence for Big Data Computing in Biomedical Research. The BDDS center itself is uniquely focused on handling big data in biomedical research. The center introduces solutions to key biomedical informatics challenges such as big data organization, storage, processing, distribution, and sharing data across collaborative networks. All BDDS developments aim for interaction of basic science, biological and engineering researchers using vast data collections and distant computers and storage systems to explore, interact and understand what the data mean and to derive knowledge from them. This tutorial will describe and demonstrate the technologies that we are developing for addressing the complexity, scalability of analysis, and ease of interaction with big data and associated analytic methods. Participants will learn how BDDS researchers apply these tools to process genomic, imaging, and other data from tens of thousands of patients, and will gain the knowledge required to take these tools back to their institutions and apply them to their own big data problems. In this tutorial, attendees will be able to discover datasets of interest from public data repositories such as ENCODE, SRA, generate easily exchangeable BDBags of raw datasets, generate unique permanent identifiers with metadata, transfer the datasets by leveraging high performance data transfer services to cloud-based BDDS Globus Galaxy service. Using BDDS Galaxy, attendees can interactively analyze data or run existing large-scale optimized workflows for gene expression and transcriptomic regulatory networks. Learning Objective 1: Attendees will understand what specific big data analysis technologies can be applied, in an integrated way, to address their particular clinical, imaging and genetics data analysis needs and that could not be achieved before with the prior state of the art. 19 Learning Objective 2: Attendees will learn how to use and further explore robust data analysis tools in the areas of clinical data analysis, protein function analysis, and genetic analysis. Tutorial Content: In BDDS, we are developing technologies that enables rapid discovery in the field of biomedicine. Specifically, we are developing tools and services that enables discovery, exchange, identification, large-scale analysis and publication of big biomedical data. This tutorial will be hands on and attendees are expected to bring a laptop.

T4: Deep Learning for Bioinformatics and Health Informatics Sungroh Yoon, Seoul National University

Abstract: In this era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important problems in bioinformatics. Meanwhile, deep learning has advanced rapidly since the early 2000s, and now demonstrates state-of- the-art performance in various fields. Accordingly, the application of deep learning in bioinformatics to gain insight from data is emphasized both in academia and industry. This tutorial will review deep learning in the bioinformatics and presents examples of current research. To provide a useful and comprehensive perspective, the presenter will categorize related research both by bioinformatics domain (i.e., omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e., deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, there will be discussion on theoretical and practical issues of deep learning in bioinformatics and suggestions for future research directions. This tutorial will provide valuable insight and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.

T5: Data-Driven Analysis of Untargeted Metabolomics Datasets Soha Hassoun, Tufts University

Abstract: Metabolomics is an expanding field of ‘omics’ research concerned with the characterization of small molecule metabolites in biological systems. Owing to recent technological advances in mass spectrometry, it is now possible to simultaneously detect in an untargeted fashion a very large number of metabolites covering a substantial fraction of metabolites in a biological sample. This presents an exciting opportunity to develop potentially transformative data-driven approaches to study and manipulate cells and organisms. A major challenge in realizing metabolomics’ rich potential is in analyzing collected data. In this tutorial, we review recent computational techniques for automated assignment of chemical identities to spectral data collected through metabolomics. The tutorial will begin with an overview of tandem mass spectrometry platforms and available databases that catalogue spectral data. The tutorial will then cover recent metabolite identification techniques including those based on biochemical transformation analysis, metabolite fragmentation, and statistical methods including overrepresentation, pathway enrichment analysis, and inference. The tutorial concludes by outlining challenges and research opportunities in metabolomics. This tutorial will be beneficial for researchers in systems biology, and those interested in integrating metabolomics with other ‘omics’ data and in tackling challenges enabled by novel mass spectrometry collection platforms.

T6: Evolutionary Algorithms for Protein Structure Modeling Emmanuel Sapin, Amarda Shehu, and Kenneth De Jong, George Mason University

Abstract: In the last two decades, great progress has been made in molecular modeling through computational treatments of biological molecules grounded in evolutionary search techniques. Evolutionary algorithms (EAs) are gaining popularity beyond exploring the relationship between sequence and function in biomolecules. In particular, recent work is showing the promise of EAs in exploring structure spaces of proteins, such as de novo structure prediction and other structure modeling problems. The objective of this tutorial is to introduce the Bioinformatics and Computational Biology, and Health Informatics community to the rapid developments on EA-based frameworks for protein structure modeling through a concise but comprehensive review of developments in this direction over the last decade. The review will be accompanied with specific detailed highlights and interactive software demonstrations of representative methods. The tutorial will introduce BCB researchers to solving open problems in computational structural biology using powerful evolutionary search techniques.

T7: The ISB Cancer Genomics Cloud Sheila M. Reynolds, Institute for Systems Biology

Abstract: The ISB Cancer Genomics Cloud (ISB-CGC) is one of three pilot projects funded by the National Cancer Institute with the goal of democratizing access to The Cancer Genome Atlas (TCGA) data by substantially lowering the barriers to accessing and

20 computing over this rich dataset. The ISB-CGC is a cloud-based platform that serves as a large-scale data repository for TCGA data, while also providing the computational infrastructure and interactive exploratory tools necessary to carry out cancer genomics research at unprecedented scales. The ISB-CGC facilitates collaborative research by allowing scientists to share data, analyses, and insights in a cloud environment. Tools, data, and resources that make up the ISB-CGC platform include an interactive web application, data leveraging various Google Cloud technologies such as Cloud Storage, Big Query and Google Genomics, and open- source code examples. The ISB-CGC team includes scientists and engineers from the Institute for Systems Biology (ISB), Google, and CSRA.

T8: Living the DREAM: Crowdsourcing biomedical research through challenges and ensembles Gaurav Pandey and Robert Vogel, Icahn School of Medicine at Mount Sinai Lara Mangravite and Solveig Sieberts, Sage Bionetworks Gustavo Stolovitzky, IBM Research

Abstract: The explosion in the scale, variety and complexity of biomedical datasets has necessitated an almost parallel growth of advanced computational methods that can produce actionable knowledge from these datasets. This growth has led to a new approach for addressing complex biomedical problems, namely the organization of unbiased crowdsourcing-based science competitions/challenges. DREAM Challenges, the most prominent and comprehensive effort in this direction, engage diverse communities of experts to leverage the “wisdom of crowds” to solve specific biomedical problems within fixed time periods. DREAM organizers have launched over 35 successful challenges, which have attracted over 8,000 participants and resulted in over 100 publications using DREAM data. The first part of our tutorial will describe the motivation, design and scientific impact of DREAM challenges. The participation of a large diverse community of experts in DREAM challenges offers a promising opportunity to develop/learn challenge “ensembles” that automatically and effectively assimilate the rich knowledge embedded in the diverse submissions made to the challenges. This diversity among the submissions calls for the development of novel heterogeneous ensemble learning methods, which will be the focus of the second part of the tutorial.

Posters 1. Noa Rappaport, Michal Twik, Ron Nudel, Inbar Plaschkes, Tsippi Iny Stein, Danit Oz-Levi, Simon Fishilevich, Marilyn Safran, Doron Lancet. Integrated Identification of Disease-Gene Links and their Utility in Next-Generation Sequencing Interpretation 2. Omid Ghiasvand, Mary Shimoyama. Introducing a Text Annotation Tool (OntoMate), Assisting Curation at Rat Genome Database 3. Yoo-Ah Kim, Sanna Madan, Teresa Przytycka. WeSME: uncovering mutual exclusivity of cancer mutations 4. Ilya Zhbannikov, Konstantin Arbeev, Anatoliy Yashin. Multidimensional Stochastic Process Model and its Applications to Analysis of Longitudinal Data with Genetic Information 5. Thomas Hahn, Hidayat Rahman, Richard Segall. Advanced Feature-Driven Disease Named Entity Recognition Using Conditional Random Fields 6. Eunji Kim, Ivan Ivanov, Jianping Hua, Robert S. Chapkin, Edward R. Dougherty. Model-based study of the Effectiveness of Reporting Lists of Small Feature Sets using RNA-Seq Data 7. Hasini Yatawatte, Christian Poellabauer, Susan Latham. Automated Capture of Naturalistic Child Vocalizations for Health Research 8. Manal Alshehri, Iman Rezaeian, Abed Alkhateeb, Luis Rueda. A Machine Learning Model for Discovery of Protein Isoforms as Biomarkers. 9. Somyung Oh, Jeonghyeon Ha, Kyungwon Lee, Sejong Oh. Integrated Visualization Tool for Differentially Expressed Genes and Gene Ontology Analysis 10. Janique Peyper, Naomi Walker, Robert Wilkinson, Graeme Meintjes, Jonathan Blackburn. The TB-IRIS neutrophil proteome: bioinformatic challenges 11. Richard Tillquist, Manuel Lladser. Metric-space Positioning Systems (MPS) for Machine Learning 12. Byunggil Yoo, Neil Miller, Greyson Twist, Shane Corder. The CMH Warehouse - A Catalog of Genetic Variation in Patients of a Children's Hospital 13. Salvador Eugenio Caoili. Kinetic and Affinity Constraints on Reactions Between Antihapten Antibodies and Nonpeptidic B- Cell Epitopes: Implications for Predicting Antibody-Mediated Modulation of Pharmacokinetics and Pharmacodynamics 14. Taein Kwon, Eunjeong Park, Hyukjae Chang. Smart Refrigerator for Healthcare Using Food Image Classification 21 15. Methun Kamruzzaman, Ananth Kalyanaraman, Bala Krishnamoorthy. Characterizing the Role of Environment on Phenotypic Traits using Topological Data Analysis 16. Surabhi Agrawal, Chun Pan Hon, Swati Garg, Aadarsh Sampath, Shanu Sushmita, Martine De Cock. Sequence Based Prediction of Hospital Readmissions 17. Chun Pan Hon, Mayana Pereira, Shanu Sushmita, Ankur Teredesai, Martine De Cock. Risk Stratification for Hospital Readmission of Heart Failure Patients: A Machine Learning Approach 18. Nick Thieme, Kristin Bennett. Time to Reactivation of Latent Tuberculosis Infection Varies by Lineage 19. Muhammad Arifur Rahman, Neil Lawrence. A Gaussian Process Model for Inferring the Dynamic Transcription Factor Activity 20. Faizy Ahsan, Doina Precup, Mathieu Blanchette. Prediction of Cell Type Specific Transcription Factor Binding Site Occupancy 21. Amelia Bateman, Todd J. Treangen, Mihai Pop. Limitations of Current Approaches for Reference-Free, Graph-Based Variant Detection 22. Peter Z. Revesz. A Last Genetic Contact Tree Generation Algorithm for a Set of Human Populations 23. Barney Potter, James Fix, Anna Ritz. Modeling Cell Signaling Networks with Prize-Collecting Subhypernetworks 24. Karl Menzel, Suzy C. P. Renn, Anna Ritz. Copy Number Variation and Adaptive Evolutionary Radiations across the African Cichlid phylogeny 25. Ting Wang, Richard H. Duerr, Wei Chen. An integrative analysis of ATAC-seq and RNA-seq data in activated, CD4+CD45RO+CD196+ human T cells treated with IL-1B and IL-23 with or without PGE2 26. Claudio Daza, Josefa Santa Maria, Ignacio Gomez, Mario Barbe, Javier Trincado, Daniel Capurro. Phenotyping Intensive Care Unit Patients Using Temporal Abstractions and Temporal Pattern Matching 27. Dan Deblasio, John Kececioglu. Adaptive Local Realignment via Parameter Advising 28. Iman Mohammadi, Seyedsasan Hashemikhabir, Tammy Toscos, Huanmei Wu. Health Care Needs of Underserved Populations in the City of Indianapolis 29. Nicole Ezell, Anna Ritz. Reconstructing Neuronal Signaling Pathways With the Potential for Disruption in Schizophrenia 30. Mohammad Shahrokh Esfahani, Aaron Newman, Henning Stehr, Florian Scherer, Jacob Chabon, David Kurtz, Robert Tibshirani, Maximilian Diehn, Ash Alizadeh. Noninvasive Cancer Classification Using Diverse Genomic Features in Circulating Tumor DNA 31. Naveena Yanamala, Lindsey Bishop, Vamsi Kodali, Patti Zeidler-Erdely, Aaron Erdely. Machine learning techniques predict and characterize toxicity between different multi-walled carbon nanotubes 32. Huanan Zhang, David Roe, Rui Kuang. Detecting Population-differentiation CNVs in Human Population Tree by Sparse Group Selection 33. Julien Herrmann, Zachary Witter, Nakul Patel, Jonathan Kho, Daniel Janies, Ümit V. Çatalyürek. Visual analytics on the spread of pathogens 34. Marzieh Ayati, Danica Wiredja, Daniela Schlatzer, Goutham Narla, Mark R Chance, Mehmet Koyuturk. MoBaS on Phosphorylation Data 35. Negin Bagherzadi, Alp Ozgun Borcek, Gul Tokdemir, Nergiz Cagiltay, Hakan Maras. Analysis of neurooncological data to predict success of operation through classification

22 Program Committee Nancy Amato, Texas A&M University Zhiyong Lu, National Institutes of Health Rolf Backofen, University of Freiburg Hui Lu, University of Illinois at Chicago Chris Bailey-Kellogg, Dartmouth College Shaun Mahony, Penn State University Asa Ben-Hur, Colorado State University Brad Malin, Vanderbilt University Catherine Blake, Univ. of Illinois, Urbana-Champaign Ramgopal Mettu, Tulane University Christina Boucher, Colorado State University Tijana Milenkovic, University of Notre Dame Beth Britt, University of Washington T. M. Murali, Virginia Tech Daniel Brown, University of Waterloo Chad Myers, University of Minnesota Yang Cao, Virginia Tech Luay Nakhleh, Rice University John Chelico, Scott Narus, The University of Utah Brian Y. Chen, Lehigh University William Stafford Noble, University of Washington Jake Chen, Indiana Univ.-Purdue Univ. Indianapolis Laxmi Parida, IBM T J Watson Research Center Yi Chen, New Jersey Institute of Technology Mihai Pop, University of Maryland Jianlin Jack Cheng, University of Missouri Giuseppe Pozzi, Politecnico di Milano Chih-Lin Chi, University of Minnesota Teresa Przytycka, National Institutes of Health A. Ercument Cicek, Bilkent University Predrag Radivojac, Indiana University Mark Clement, Brigham Young University Susan Rea, Intermountain Healthcare Trevor Cohen, University of Texas, Houston Anna Ritz, Reed College Carlo Combi, University of Verona Larry Ruzzo, University of Washington Hector Corrado Bravo, Univ. of Maryland, College Park Farrant Sakaguchi, The University of Utah Lenore Cowen, Tufts University Harm Scherpbier, Jefferson College Bhaskar Dasgupda, University of Illinois at Chicago Russell Schwartz, Carnegie Mellon University Peter Elkin, University at Buffalo Soumitra Sengupta, Columbia University Emre Ertin, The Ohio State University Amarda Shehu, George Mason University Oliver Eulenstein, Iowa State University Xinghua Shi, University of North Carolina at Charlotte Jeff Ferraro, The University of Utah , Princeton University , University of California, San Diego Krister Swenson, CNRS, Université de Montpellier Andrew Gentles, Stanford University Jijun Tang, University of South Carolina Ananth Grama, Purdue University Haixu Tang, Indiana University Eric Hall, Cincinnati Children's Hospital Nurcan Tuncbag, Massachusetts Institute of Technology Nurit Haspel, University of Massachusetts, Boston Jason Wang, New Jersey Institute of Technology Lenwood Heath, Virginia Tech Nicole Weiskopf, Oregon Health & Science University Vasant Honavar, Pennsylvania State University Chunhua Weng, Columbia University Fereydoun Hormozdiari, University of California, Davis Travis Wheeler, University of Montana Filip Jagodzinski, Western Washington University Adam Wilcox, Co-Chair, University of Washington Xiaoqian Jiang, University of California, San Diego Adam Wright, Brigham and Women's Hospital Tamer Kahveci, University of Florida Jinbo Xu, Toyota Technological Institute at Chicago Ananth Kalyanaraman, Washington State University Naveena Yanamala, Centers for Disease Control and Sreeram Kannan, University of Washington Prevention John Kececioglu, Co-Chair, University of Arizona Rui Zhang, University of Minnesota Zia Khan, University of Maryland, College Park Aidong Zhang, Univ. at Buffalo, State Univ. of New York Mehmet Koyuturk, Case Western Reserve University Mi Zhang, Michigan State University Albert Lai, The Ohio State University Liqing Zhang, Virginia Tech Su-In Lee, University of Washington Jie Zhang, The Ohio State University Hans-Peter Lenhof, Saarland University Li Zhou, Partners Healthcare Jing Li, Case Western Reserve University Binhai Zhu, Montana State University Hongfang Liu, Mayo Clinic Jaroslaw Zola, Univ. at Buffalo, State Univ.of New York Stefano Lonardi, University of California, Riverside

23