A Methodology for Motif Discovery Employing Iterated Cluster Re-Assignment

Total Page:16

File Type:pdf, Size:1020Kb

A Methodology for Motif Discovery Employing Iterated Cluster Re-Assignment May 24, 2006 19:13 WSPC/Trim Size: 11in x 8.5in for Proceedings motif3 2571 A METHODOLOGY FOR MOTIF DISCOVERY EMPLOYING ITERATED CLUSTER RE-ASSIGNMENT Osman Abul¤ y and Finn Drabl¿s Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway ¤Email: [email protected] ¯[email protected] Geir Kjetil Sandve Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway Email: [email protected] Motif discovery is a crucial part of regulatory network identi¯cation, and therefore widely studied in the literature. Motif discovery programs search for statistically signi¯cant, well-conserved and over-represented patterns in given promoter sequences. When gene expression data is available, there are mainly three paradigms for motif discovery; cluster-¯rst, regression, and joint probabilistic. The success of motif discovery depends highly on the homogeneity of input sequences, regardless of paradigm employed. In this work, we propose a methodology for getting homoge- neous subsets from input sequences for increased motif discovery performance. It is a unification of cluster-first and regression paradigms based on iterative cluster re-assignment. The experimental results show the e®ectiveness of the methodology. 1. INTRODUCTION from each other. ChIP-chip experiments can analyze the genome-wide binding of a speci¯c TF. For in- Transcription Factors (TF) are proteins that selec- stance, Lee et al. 19 have conducted experiments for tively bind to short pieces (5-25nt long) of DNA, so over 100 TFs for experimental identi¯cation of reg- called Transcription Factor Binding Sites (TFBS). ulatory networks of Saccharomyces cerevisiae. Un- Although TFs bind in a selective way they allow fortunately their resolution (¼ 1K-nt) is not enough some degeneracy in their binding sites, forming Tran- to exactly identify binding locations. Other prob- scription Factor Binding Motifs (TFBMs) or just lems include condition speci¯c binding, measurement motifs. This property creates the TFBS represen- noise, and di±culty in ¯nding an optimal consen- tation problem, i.e. the choice of language in which sus motif. TFBMs are functional elements of genes motifs are expressed. The most common represen- and preserved throughout the evolution. This prop- tations are motif consensus over IUPAC codes, mis- erty, together with available completed genetic maps match strings and position speci¯c weight matrices of many species, has made possible computational (PSWMs), as well as their variants and specializa- identi¯cation based solely on sequence data. That is, tions. since these regions have accumulated very few muta- Finding TFBMs is an important step in elucida- tions compared to non-functional parts, it is possible tion of genetic regulatory networksa. There are ba- to ¯nd them computationally by just exploiting the sically two methods for ¯nding TFBMs, experimen- statistical over-representation. Computational ap- tal and computational, although they usually bene¯t ¤Corresponding author. yThis work was carried out during the tenure of an ERCIM fellowship. aRegulatory network identi¯cation methods are also studied without explicitly focusing on use and discovery of TFBMs33; 27; 36; 28; 24, in this paper we do not cover these approaches. May 24, 2006 19:13 WSPC/Trim Size: 11in x 8.5in for Proceedings motif3 2258 proaches built around this fact include MEME 2; 1, An alternative to the cluster-¯rst approach is to BioProspector 20, AlignACE 12, Consensus 10, and start from a large set of putative motifs and ¯lter MDScan 21, among many others. them by regressing on expression data. The idea TFs bind to respective TFBSs in promoter re- behind this approach is to remove non-relevant mo- gions of their target genes. Each gene can have a tifs and thereby reduce the number of false positives. number of TFBSs for several di®erent TFs in its pro- Examples of this approach include Reduce 7, Motif moter sequence. In Eukaryotes, TFBSs are organized Regressor 8; 21, a boosting approach also employing in modules; sets of TFBSs for a number of TFs. Each ChIP-chip data (Hong et al. 15) and a logic regres- TF can function as inducer or repressor and this pro- sion approach by Kele»s et al. 17. cess is combinatorial, i.e. depends on the qualita- Although a number of algorithms and programs tively and quantitatively binding of other TFs. This have been developed for motif discovery, little has combinatorial behavior can cause non-additive ex- been done on designing a methodology for optimal pression behavior for their common targets. In gen- usage. In particular, little attention is paid to the eral, intra-module couplings are much stronger than s e le c tio n o f ho mo g e ne o us s ubs e ts fr o m he te r o g e ne o us inter-module couplings. Expression behavior also de- gene sets of interest. In practice, what an exper- pends on the genome-wide global conditions. imenter does is 1) cluster the gene sets of interest To understand the governing rules for gene ex- (using a clustering program like k-means, hierarchi- pression, we need to know 1) all TFs, 2) abundance cal clustering, Self-organizing maps, etc), then 2) in- and activity of them under varying conditions, 3) put them to one or a few motif ¯nding programs, and their binding sites, and 4) their combinatorial joint ¯nally 3) decide on the true motifs among all the can- regulation of target expression 35; 9. From this, it didates, either by further analysis (like regression) or is clear that to induce regulatory networks com- manually. Though clustering before motif discovery putationally we need both sequence and functional improves homogeneity compared to random subsets, data. Typically, the sequence data employed is the it might fail in ¯nding true clusters. Motivated by inter-genic promoter regions upstream of transcrip- this, we here study the generation of homogeneous tion start sites while the functional data is obtained clusters using both sequence and expression data, from microarray experiments under various condi- and we address the issue of methodology for motif tions. Other useful sources of data for motif (and discovery. module) discovery include ChIP-chip experiments We de¯ne an iterative procedure (a methodol- (e.g. 3), TFBM databases (e.g. 26), and phyloge- ogy) for the motif discovery process. Briefly, we start netic relations (e.g. 14). with an initial clustering of gene sets from gene ex- The success of motif discovery programs depends pression data and ¯nd motifs in these clusters. We on the quality of input data. That is, they typically then (optionally) re¯ne these motifs by ¯ltering out give high false-positives/negatives if input genes are irrelevant ones. In this step, simple ¯ltering or ¯lter- heterogenous with respect to regulation. To make ing employing regression analysis is applied. After the input genes homogeneous, genes are clustered be- that, we screen all the genes by motif pro¯les of each fore they are presented to motif discovery programs; cluster and re¯ne clusters by re-assignment based on hence this is called the cluster-¯rst approach. This screening score. Following this, we restart motif dis- is because gene expression depends on combinatorial covery on the new gene clusters and iterate this pro- binding of TFs on TFBMs. The co-expressed genes cedure until convergence. Finally, we output the set are assumed to be co-regulated, therefore genes are of motifs found in the last iteration. clustered based on their expression pro¯le similarity over a course of microarray experiments. Each clus- 2. POWERING MOTIF DISCOVERY ter (in which sequences are highly probable to con- USING GENE EXPRESSION DATA tain homogeneous TFBMs) is given as input to motif ¯nding programs (MEME, BioProspector, MDScan The three main paradigms for incorporating etc.). gene expression data into motif discovery are cluster- May 24, 2006 19:13 WSPC/Trim Size: 11in x 8.5in for Proceedings motif3 2593 ¯rst, regression and joint probabilistic. proach of Kele»s et al. 17 uses two-step logistic re- Brazma et al. 6 presented one of the earliest gression on a single gene expression experiment. In methods within the cluster-¯rst paradigm. They the ¯rst step, the set of all over-represented oligos look for over-represented oligos with limited degen- (allowing limited degeneracy) in the input sequences eracy, both genome-wide and for clusters generated are identi¯ed as candidate motifs. In the second step, from gene expression clustering based on the time se- for each sequence a binary score vector (serving as a ries data. The approach taken by Beer et al. 5 also covariate vector) is constructed in which each entry use a cluster-¯rst approach. The genes are clustered corresponds to existence of a motif type (or a logical using expression data with k-means clustering and function of a subset of all motif types, a so called AlignACE 12 is used for motif discovery. A very sim- logic tree) and this vector is regressed on expression ilar approach using a custom clustering algorithm is data. The Rim-Finder system of Zilberstein et al. presented in 23. 38 is another method using the regression approach. A variant of the cluster-¯rst approach is TFCC Identi¯cation of synergistic e®ects of pairs of motifs (Transcription Factor Centric Clustering) of Zhu et using co-expression has also been studied 26. al. 37. The idea is to ¯nd a set of genes showing sim- Methods for binary regression (classi¯cation) ilar expression pro¯les to the expression pro¯le of a have also been developed. A large-margin classi¯- particular TF over a set of expression experiments, cation approach, called Medusa, using boosting to- and then look for motifs in that cluster using Alig- gether with alternating decision trees is given in 22.
Recommended publications
  • UNIVERSITY of CALIFORNIA RIVERSIDE Unsupervised And
    UNIVERSITY OF CALIFORNIA RIVERSIDE Unsupervised and Zero-Shot Learning for Open-Domain Natural Language Processing A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Muhammad Abu Bakar Siddique June 2021 Dissertation Committee: Dr. Evangelos Christidis, Chairperson Dr. Amr Magdy Ahmed Dr. Samet Oymak Dr. Evangelos Papalexakis Copyright by Muhammad Abu Bakar Siddique 2021 The Dissertation of Muhammad Abu Bakar Siddique is approved: Committee Chairperson University of California, Riverside To my family for their unconditional love and support. i ABSTRACT OF THE DISSERTATION Unsupervised and Zero-Shot Learning for Open-Domain Natural Language Processing by Muhammad Abu Bakar Siddique Doctor of Philosophy, Graduate Program in Computer Science University of California, Riverside, June 2021 Dr. Evangelos Christidis, Chairperson Natural Language Processing (NLP) has yielded results that were unimaginable only a few years ago on a wide range of real-world tasks, thanks to deep neural networks and the availability of large-scale labeled training datasets. However, existing supervised methods assume an unscalable requirement that labeled training data is available for all classes: the acquisition of such data is prohibitively laborious and expensive. Therefore, zero-shot (or unsupervised) models that can seamlessly adapt to new unseen classes are indispensable for NLP methods to work in real-world applications effectively; such models mitigate (or eliminate) the need for collecting and annotating data for each domain. This dissertation ad- dresses three critical NLP problems in contexts where training data is scarce (or unavailable): intent detection, slot filling, and paraphrasing. Having reliable solutions for the mentioned problems in the open-domain setting pushes the frontiers of NLP a step towards practical conversational AI systems.
    [Show full text]
  • University of California Santa Cruz Sample
    UNIVERSITY OF CALIFORNIA SANTA CRUZ SAMPLE-SPECIFIC CANCER PATHWAY ANALYSIS USING PARADIGM A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in BIOMOLECULAR ENGINEERING AND BIOINFORMATICS by Stephen C. Benz June 2012 The Dissertation of Stephen C. Benz is approved: Professor David Haussler, Chair Professor Joshua Stuart Professor Nader Pourmand Dean Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright c by Stephen C. Benz 2012 Table of Contents List of Figures v List of Tables xi Abstract xii Dedication xiv Acknowledgments xv 1 Introduction 1 1.1 Identifying Genomic Alterations . 2 1.2 Pathway Analysis . 5 2 Methods to Integrate Cancer Genomics Data 10 2.1 UCSC Cancer Genomics Browser . 11 2.2 BioIntegrator . 16 3 Pathway Analysis Using PARADIGM 20 3.1 Method . 21 3.2 Comparisons . 26 3.2.1 Distinguishing True Networks From Decoys . 27 3.2.2 Tumor versus Normal - Pathways associated with Ovarian Cancer 29 3.2.3 Differentially Regulated Pathways in ER+ve vs ER-ve breast can- cers . 36 3.2.4 Therapy response prediction using pathways (Platinum Free In- terval in Ovarian Cancer) . 38 3.3 Unsupervised Stratification of Cancer Patients by Pathway Activities . 42 4 SuperPathway - A Global Pathway Model for Cancer 51 4.1 SuperPathway in Ovarian Cancer . 55 4.2 SuperPathway in Breast Cancer . 61 iii 4.2.1 Chin-Naderi Cohort . 61 4.2.2 TCGA Breast Cancer . 63 4.3 Cross-Cancer SuperPathway . 67 5 Pathway Analysis of Drug Effects 74 5.1 SuperPathway on Breast Cell Lines .
    [Show full text]
  • Introduction
    Introduction IJCAI-01 Conference Committee IJCAI-01 Program Committee: Contents: CONFERENCE CHAIR: Elisabeth André, DFKI GmbH (Germany) Introduction 2 Hector J. Levesque, University of Toronto (Canada) Minoru Asada, Osaka University (Japan) Sponsors & Committees 2-3 Franz Baader, RWTH Aachen (Germany) PROGRAM CHAIR: IJCAI-01 Awards 4 Craig Boutilier, University of Toronto (Canada) Bernhard Nebel,Albert-Ludwigs-Universität, Freiburg Didier Dubois, IRIT-CNRS (France) Conference at a Glance 5 (Germany) Maria Fox, University of Durham (United Kingdom) Workshop Program 6-7 LOCAL ARRANGEMENTS CHAIR: Hector Geffner, Universidad Simón Bolívar Doctoral Consortium 8 James Hoard, The Boeing Company, Seattle (USA) (Venezuela) Tutorial Program 8 SECRETARY-TREASURER: Georg Gottlob,Vienna University of Technology (Austria) Conference Program Highlights 9 Ronald J. Brachman,AT&T Labs – Research (USA) Invited Speakers 10 Haym Hirsh, Rutgers University (USA) IAAI-01 Conference 11 Eduard Hovy, Information Sciences Institute (USA) Advisory Committee: Joxan Jaffar, National University of Singapore Technical Program 12-19 Bruce Buchanan, University of Pittsburgh (USA) (Singapore) Exhibit Program 20-23 Silvia Coradeschi, Örebro University (Sweden) Daphne Koller, Stanford University (USA) RoboCup 2001 24 Olivier Faugeras, INRIA (France) Fangzhen Lin, Hong Kong University of Science and Registration Information 25 Cheng Hu, Chinese Academy of Sciences (China) Technolog y (Hong Kong) General Information 25-27 Nicholas Jennings, University of London (England) Heikki Mannila, Nokia Research Center (Finland) Conference Maps 28-30 Henry Kautz, University of Washington (USA) Robert Milne, Intelligent Applications (United Kingdom) IJCAI-03 Conference 31 Robert Mercer, University of Western Ontario (Canada) Daniele Nardi, Università di Roma “La Sapienza” Special Meetings 31 Silvia Miksch,Vienna University of Technology (Italy) (Austria) Dana Nau, University of Maryland (USA) Devika Subramanian, Rice University (USA) Patrick Prosser, University of Glasgow (UK) Welcome to IJCAI-01 L.
    [Show full text]
  • UCLA UCLA Electronic Theses and Dissertations
    UCLA UCLA Electronic Theses and Dissertations Title Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods Permalink https://escholarship.org/uc/item/0tr9j01r Author Sun, Yidan Publication Date 2021 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Statistics by Yidan Sun 2021 © Copyright by Yidan Sun 2021 ABSTRACT OF THE DISSERTATION Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods by Yidan Sun Doctor of Philosophy in Statistics University of California, Los Angeles, 2021 Professor Jingyi Li, Chair In a bipartite network, nodes are divided into two types, and edges are only allowed to connect nodes of different types. Bipartite network clustering problems aim to identify node groups with more edges between themselves and fewer edges to the rest of the network. The approaches for community detection in the bipartite network can roughly be classified into algorithmic and model-based methods. The algorithmic methods solve the problem either by greedy searches in a heuristic way or optimizing based on some criteria over all possible partitions. The model-based methods fit a generative model to the observed data and study the model in a statistically principled way. In this dissertation, we mainly focus on bipartite clustering under two scenarios: incorporation of node covariates and detection of mixed membership communities.
    [Show full text]
  • Director's Update
    Director’s Update Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Council of Councils Meeting September 6, 2019 Changes in Leadership . Retirements – Paul A. Sieving, M.D., Ph.D., Director of the National Eye Institute Paul Sieving (7/29/19) Linda Birnbaum – Linda S. Birnbaum, Ph.D., D.A.B.T., A.T.S., Director of the National Institute of Environmental Health Sciences (10/3/19) . New Hires – Noni Byrnes, Ph.D., Director, Center for Scientific Review (2/27/19) Noni Byrnes – Debara L. Tucci, M.D., M.S., M.B.A., Director, National Institute on Deafness and Other Communication Disorders (9/3/19) Debara Tucci . New Positions – Tara A. Schwetz, Ph.D., Associate Deputy Director, NIH (1/7/19) Tara Schwetz 2019 Inaugural Inductees Topics for Today . NIH HEAL (Helping to End Addiction Long-termSM) Initiative – HEALing Communities Study . Artificial Intelligence: ACD WG Update . Human Genome Editing – Exciting Promise for Cures, Need for Moratorium on Germline . Addressing Foreign Influence on Research … and Harassment in the Research Workplace NIH HEAL InitiativeSM . Trans-NIH research initiative to: – Improve prevention and treatment strategies for opioid misuse and addiction – Enhance pain management . Goals are scientific solutions to the opioid crisis . Coordinating with the HHS Secretary, Surgeon General, federal partners, local government officials and communities www.nih.gov/heal-initiative NIH HEAL Initiative: At a Glance . $500M/year – Will spend $930M in FY2019 . 12 NIH Institute and Centers leading 26 HEAL research projects – Over 20 collaborating Institutes, Centers, and Offices – From prevention research, basic and translational research, clinical trials, to implementation science – Multiple projects integrating research into new settings .
    [Show full text]
  • Top 100 AI Leaders in Drug Discovery and Advanced Healthcare Introduction
    Top 100 AI Leaders in Drug Discovery and Advanced Healthcare www.dka.global Introduction Over the last several years, the pharmaceutical and healthcare organizations have developed a strong interest toward applying artificial intelligence (AI) in various areas, ranging from medical image analysis and elaboration of electronic health records (EHRs) to more basic research like building disease ontologies, preclinical drug discovery, and clinical trials. The demand for the ML/AI technologies, as well as for ML/AI talent, is growing in pharmaceutical and healthcare industries and driving the formation of a new interdisciplinary industry (‘data-driven healthcare’). Consequently, there is a growing number of AI-driven startups and emerging companies offering technology solutions for drug discovery and healthcare. Another important source of advanced expertise in AI for drug discovery and healthcare comes from top technology corporations (Google, Microsoft, Tencent, etc), which are increasingly focusing on applying their technological resources for tackling health-related challenges, or providing technology platforms on rent bases for conducting research analytics by life science professionals. Some of the leading pharmaceutical giants, like GSK, AstraZeneca, Pfizer and Novartis, are already making steps towards aligning their internal research workflows and development strategies to start embracing AI-driven digital transformation at scale. However, the pharmaceutical industry at large is still lagging behind in adopting AI, compared to more traditional consumer industries -- finance, retail etc. The above three main forces are driving the growth in the AI implementation in pharmaceutical and advanced healthcare research, but the overall success depends strongly on the availability of highly skilled interdisciplinary leaders, able to innovate, organize and guide in this direction.
    [Show full text]
  • Probabilistic Models for Species Tree Inference and Orthology Analysis
    Probabilistic Models for Species Tree Inference and Orthology Analysis IKRAM ULLAH Doctoral Thesis Stockholm, Sweden 2015 TRITA-CSC-A-2015:12 ISSN-1653-5723 KTH School of Computer Science and Communication ISRN-KTH/CSC/A–15/12-SE SE-100 44 Stockholm ISBN 978-91-7595-619-0 SWEDEN Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framläg- ges till offentlig granskning för avläggande av teknologie doktorsexamen i datalogi fredagen den 12 juni 2015, klockan 13.00 i conference room Air, Scilifelab, Solna. © Ikram Ullah, June 2015 Tryck: Universitetsservice US AB iii To my family iv Abstract A phylogenetic tree is used to model gene evolution and species evolution using molecular sequence data. For artifactual and biological reasons, a gene tree may differ from a species tree, a phenomenon known as gene tree-species tree incongruence. Assuming the presence of one or more evolutionary events, e.g, gene duplication, gene loss, and lateral gene transfer (LGT), the incon- gruence may be explained using a reconciliation of a gene tree inside a species tree. Such information has biological utilities, e.g., inference of orthologous relationship between genes. In this thesis, we present probabilistic models and methods for orthology analysis and species tree inference, while accounting for evolutionary factors such as gene duplication, gene loss, and sequence evolution. Furthermore, we use a probabilistic LGT-aware model for inferring gene trees having temporal information for duplication and LGT events. In the first project, we present a Bayesian method, called DLRSOrthology, for estimating orthology probabilities using the DLRS model: a probabilistic model integrating gene evolution, a relaxed molecular clock for substitution rates, and sequence evolution.
    [Show full text]
  • David A. Knowles
    David A. Knowles Stanford University School of Medicine 300 Pasteur Drive Stanford, CA 94305-5105 email: [email protected] url: http://cs.stanford.edu/∼davidknowles/ Nationality: British Education 2008-2012 PhD Engineering (Machine Learning) University of Cambridge Thesis: Bayesian non-parametric models and inference for sparse and hierarchical latent structure Advisor: Prof. Zoubin Ghahramani 2007-2008 MSc Bioinformatics and Systems Biology - Distinction Imperial College London Thesis: Statistical tools for ulta-deep pyrosequencing of fast evolving viruses. Thesis advisor: Prof. Susan Holmes, Statistics Department, Stanford University. 2003-2007 MEng Engineering - Distinction Thesis: A non-parametric extension to Independent Components Analysis. Thesis advisor: Prof. Zoubin Ghahramani. BA Natural Sciences (Physics) - First Class University of Cambridge Academic Positions 2014-ongoing Postdoctoral researcher (Genetics, Pathology) Stanford University Co-advisors: Prof. Jonathan Pritchard, Prof. Sylvia Plevritis 2012-2014 Postdoctoral researcher (Computer Science) Stanford University Advisor: Prof. Daphne Koller 2008-2012 PhD Candidate, Roger Needham Scholar, Wolfson College, University of Cambridge Machine Learning Group, Cambridge University Engineering Department 2006 Summer Undergraduate Research Fellow California Institute of Technology Honours & Awards 2017 Stanford Cancer Systems Biology Symposium — Poster Award 2014 The International Society for Bayesian Statistics Travel Award for best invited Bayesian paper 2014 The International Society for Bayesian Statistics Dennis V. Lindley Prize for innovative research in Bayesian Statistics 2007 Charles Lamb University prize for first place in Information Engineering 1 Sir Joseph Larmor Silver Plate for undergraduates adjudged to be the most worthy for intellectual qualifications or moral conduct and practical activities Three other college prizes (Cargill, Cunningham and College) 2005 Wright Prize for ranking 5/600 in Natural Sciences Earle Year Prize for top 4 students across all subjects at St.
    [Show full text]
  • Learning from Graph Neighborhoods Using Lstms
    Learning From Graph Neighborhoods Using LSTMs Rakshit Agrawal∗ Luca de Alfaro Vassilis Polychronopoulos [email protected], [email protected], [email protected] Computer Science Department, University of California, Santa Cruz Technical Report UCSC-SOE-16-17 School of Engineering, UC Santa Cruz November 18, 2016 Abstract and of the grades they assigned. If we wish to be more so- phisticated, and try to determine which of these people are Many prediction problems can be phrased as inferences good graders, we could look also at the work performed over local neighborhoods of graphs. The graph repre- by these people, expanding our analysis outwards to the sents the interaction between entities, and the neighbor- 2- or 3-neighborhood of each item. hood of each entity contains information that allows the For another example, consider the problem of predict- inferences or predictions. We present an approach for ap- ing which bitcoin addresses will spend their deposited plying machine learning directly to such graph neighbor- funds in the near future. Bitcoins are held in “addresses”; hoods, yielding predicitons for graph nodes on the basis these addresses can participate in transactions where they of the structure of their local neighborhood and the fea- send or receive bitcoins. To predict which addresses are tures of the nodes in it. Our approach allows predictions likely to spend their bitcoin in the near future, it is natural to be learned directly from examples, bypassing the step to build a graph of addresses and transactions, and con- of creating and tuning an inference model or summarizing sider neighborhoods of each address.
    [Show full text]
  • Michael Kearns
    Michael Kearns Home address: Oce address: 1216 Blo om eld Street AT&T Lab oratories { Research Hob oken, New Jersey 07030 180 Park Avenue, Ro om A235 Phone: 201963-5291 Florham Park, New Jersey 07932 Phone: 973360-8322 Fax: 973360-8970 Email: [email protected] Home Page: http://www.research.att.com/~mkearns Date of Birth Octob er 24, 1962, in California. Citizenship U.S.A. Professional In my current p osition as the head of the Arti cial Intelligence Research department Ob jectives at AT&T Labs, my goals are to build and maintain a world-class research group in AI, Machine Learning and related disciplines, to develop and oversee pro jects and systems applying AI to problems of b oth immediate and long-term interest to AT&T, and to explore applications to new areas of technology and the computing industry. My de- partment is part of Information Systems and Services ResearchatAT&T Labs, which has the developmentofnovel internet and web applications and technologies as a central part of its charter. My own research examines algorithms and problems in arti cial intelligence and machine learning and related areas such as data mining, statistics, and neural networks sp eech recognition, and sp oken dialogue systems. I also have strong interests in cryptography and security, and theoretical computer science in general. Education Harvard University, Cambridge, Massachusetts Ph.D. in computer science, May 1989. Dissertation : The Computational Complexity of Machine Learning . Winner of 1989 ACM Distinguished Do ctoral Dissertation Award, published by MIT Press. S.M. in computer science, May 1986.
    [Show full text]
  • Population-Based Meta-Heuristic for Active Modules Identification Leandro Corrêa, Denis Pallez, Laurent Tichit, Olivier Soriani, Claude Pasquier
    Population-based meta-heuristic for active modules identification Leandro Corrêa, Denis Pallez, Laurent Tichit, Olivier Soriani, Claude Pasquier To cite this version: Leandro Corrêa, Denis Pallez, Laurent Tichit, Olivier Soriani, Claude Pasquier. Population- based meta-heuristic for active modules identification. 10th International Conference on Com- putational Systems-Biology and Bioinformatics (CSBio 2019), Dec 2019, Nice, France. pp.1-8, 10.1145/3365953.3365957. hal-02366236 HAL Id: hal-02366236 https://hal.archives-ouvertes.fr/hal-02366236 Submitted on 18 Nov 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Population-based meta-heuristic for active modules identification Leandro Correa Denis Pallez Laurent Tichit [email protected] [email protected] [email protected] Université Côte d’Azur, CNRS, I3S, Université Côte d’Azur, CNRS, I3S, Aix Marseille Univ, CNRS, Centrale France France Marseille, I2M, France Olivier Soriani Claude Pasquier [email protected] [email protected] Université Côte d’Azur, CNRS, Université Côte d’Azur, CNRS, I3S, INSERM, iBV, France France ABSTRACT and Bioinformatics (CSBio 2019), December 4–7, 2019, Nice, France. ACM, The identification of condition specific gene sets from transcrip- New York, NY, USA, 8 pages.
    [Show full text]
  • BE562 Computational Biology: Genomes, Networks, Evolution Fall 2014 Course Information
    BE562 Computational Biology: Genomes, Networks, Evolution Fall 2014 Course Information Lectures Tu/Th 2-3:30, Room GCB 204 Recitations Fri 10-11:30, Room LSE B03 In this course we cover the algorithmic and machine learning foundations of computational biology, combining theory with practice. We study principles of algorithm design and core methods in computational biology, we provide an introduction of important problems in computational biology, and we provide hands on experience analyzing large-scale biological data sets. Topics include: Genomes: sequence analysis, gene prediction, genome alignment and assembly, database search. Networks: gene expression analysis; regulatory motifs prediction; regulatory network mapping, prediction and analysis; computational metabolic network modeling; and systems biology Evolution: phylogenetics, molecular evolutionary theory and modeling. These topics are grounded in fundamental algorithmic techniques including: dynamic programming, hashing, Gibbs sampling, Expectation Maximization, hidden Markov models, stochastic context-free grammars, graph analysis, flux balance analysis, and Bayesian networks. Prerequisites: fundamentals of programming and algorithm design (EK 127 or equivalent), basic molecular biology (BE 209 or equivalent), statistics and probability (BE 200 or equivalent). Course Staff Lecturer: James Galagan, LSEB 1002, [email protected], 617-875-9874 TA: Bing Xia, [email protected], 510-397-9421 Website The course website is being hosted via BU Blackboard 8 at http://learn.bu.edu/. You should be able to access the site if you are registered for the course. If you have any problem with this, please email us. Grading Your grade in this course will be based on the following: Problem sets (35%) Midterm Exam (20%) Final Project (40%) Scribing (5%) Problem Sets There will be three problem sets during the first half of the semester.
    [Show full text]