Virtual Chip-Seq: Predicting Transcription Factor Binding

Total Page:16

File Type:pdf, Size:1020Kb

Virtual Chip-Seq: Predicting Transcription Factor Binding bioRxiv preprint doi: https://doi.org/10.1101/168419; this version posted May 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Virtual ChIP-seq: predicting transcription factor binding 2 by learning from the transcriptome 1,2,3 1,2,4,5 3 Mehran Karimzadeh and Michael M. Hoffman 1 4 Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada 2 5 Princess Margaret Cancer Centre, Toronto, ON, Canada 3 6 Vector Institute, Toronto, ON, Canada 4 7 Department of Computer Science, University of Toronto, Toronto, ON, Canada 5 8 Lead contact: michael.hoff[email protected] 9 May 10, 2018 10 Abstract 11 Motivation: 12 Identifying transcription factor binding sites is the first step in pinpointing non-coding mutations 13 that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is 14 the most common method for identifying binding sites, but performing it on patient samples is 15 hampered by the amount of available biological material and the cost of the experiment. Existing 16 methods for computational prediction of regulatory elements primarily predict binding in genomic 17 regions with sequence similarity to known transcription factor sequence preferences. This has limited 18 efficacy since most binding sites do not resemble known transcription factor sequence motifs, and 19 many transcription factors are not even sequence-specific. 20 Results: 21 We developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new 22 cell types using an artificial neural network that integrates ChIP-seq results from other cell types 23 and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations 24 between gene expression and transcription factor binding at specific genomic regions. This approach 25 outperforms methods that use transcription factor sequence preferences in the form of position 26 weight matrices, predicting binding for 31 transcription factors (Matthews correlation coefficient > 27 0:3). 28 Availability: 29 The datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We 30 have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo. 31 1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 31 transcription 32 factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1243913), and 33 predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Chal- 34 lenge (http://doi.org/10.5281/zenodo.1209308). 1 bioRxiv preprint doi: https://doi.org/10.1101/168419; this version posted May 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 35 1 Introduction 36 Transcription factor (TF) binding regulates gene expression. Each TF can harmonize expression of 37 many genes by binding to genomic regions that regulate transcription. Cellular machinery utilizes 38 these master regulators to regulate key cellular processes and adapt to environmental stimuli. 39 Alteration in sequence or quantity of a given TF can impact expression of many genes. In fact, 40 these alterations can be the primary cause of hereditary disorders, complex disease, autoimmune 1 41 defects, and cancer . 42 TFs bind to accessible chromatin based on weak non-covalent interactions between amino acid 2 3 43 residues and nucleic acids. DNA's primary structure (sequence) , secondary structure (shape) , and 4 44 tertiary structure (conformation) all play roles in TF binding. Many TFs form a complex with 45 others as well as chromatin-binding proteins and therefore bind to DNA indirectly. Some TFs also 46 have different isoforms and undergo various post-translational modifications. In vitro assays, such 5 47 as high throughput systematic evolution of ligands by exponential enrichment (HT-SELEX) and 6 48 protein binding microarrays , have provided a compelling understanding of context-independent 7 49 TF sequence and shape preference . Yet, for the aforementioned reasons, performance of models 8,9 50 trained on these in vitro data are poor when applied on in vivo experiments . To address this 51 challenge, we must explore how to better model DNA shape, TF-TF interactions, and context- 52 dependent TF binding. 10 53 Chromatin immunoprecipitation and sequencing (ChIP-seq) and similar methods, such as 11 12 54 ChIP-exo and ChIP-nexus , can map the presence of a given TF in the genome of a biological 55 sample. To map TFs, these assays require a minimum of 1,000,000 to 100,000,000 cells, depending 56 on properties of the TF itself and available antibodies. Such large numbers of cells are not often 57 available from clinical samples. Therefore, it is impossible to systematically assess TF binding in 58 most disease systems. Assessing chromatin accessibility through transposase-accessible chromatin 13 59 using sequencing (ATAC-seq) , however, requires only hundreds or thousands of cells. One can 60 obtain this many cells from many more clinical samples. While chromatin accessibility does not de- 61 termine TF binding, several methods use this information together with knowledge of TF sequence 14,15,16 62 preference, genomic conservation, and other genomic features to predict TF binding . 63 Predicting TF binding with motif discovery tools within chromatin accessible regions has helped 17 64 us understand the role of several TFs in various disease. For example, He et al. used motif discov- 65 ery tools to identify the role of OCT1 and NKX3-1 after prolonged androgen stimulation in prostate 18 66 cancer. Similarly, Bailey et al. discovered that a known breast cancer risk single nucleotide poly- 67 morphism (SNP) upstream of ESR1 disrupts GATA3 binding and enhances expression of ESR1. 68 We propose that using more accurate tools to predict TF binding will allow understanding the role 69 of TF binding in more contexts. 70 Previous studies have used various approaches to predict TF binding. Several methods use unsu- 14 15 71 pervised approaches such as hierarchical mixture models or hidden Markov models to identify 72 transcription factor footprint using chromatin accessibility data. These approaches use sequence 73 motif scores to attribute footprints to different transcription factors. Convolutional neural network 74 models can boost precision by learning sequence preferences from in vivo, rather than in vitro 20,21 75 data . Variation in sequence specificity and cooperative binding of some transcription factors 76 prevents these methods from accurately predicting binding of all transcription factors. A more re- 77 cent approach uses matrix completion to impute TF binding using a 3-mode tensor representing 22 78 genomic positions, cell types, and TF binding . This method doesn't rely on sequence specificity, 79 but can only predict TF binding in well-studied cell types with many ChIP-seq datasets. This 80 means one cannot use it to predict binding in a cell type where ChIP-seq is not possible, such as 81 limited clinical samples. 82 Identifying the best approach for predicting TF binding remains a challenge, because most 2 a100 75 50 ● 25 0 RB1 TBP PML MAZ SIX5 TAF1 ATF1 TAF7 ATF7 BMI1 MXI1 ZZZ3 E2F1 MTA3 MTA1 EZH2 ZHX1 PHF8 HSF1 ZHX2 BRF1 BRF2 RFX5 PBX3 UBTF RNF2 CBX2 CBX3 CBX8 CBX5 BDP1 CHD1 CHD7 CHD4 CHD2 IKZF1 SMC3 MBD4 SIRT6 ZMIZ1 SIN3B SIN3A CREM − NONO ZFP36 EP300 KAT2B KAT2A SUZ12 H3F3A SAP30 TFDP1 FOXP2 NELFE ZBED1 CTBP2 GTF2B RAD21 NR3C1 NR1H2 RBBP5 MYBL2 HDAC2 HDAC6 HDAC1 HCFC1 CCNT2 BRCA1 CREB1 CEBPZ KDM4A KDM5A KDM5B CEBPD KDM1A SMAD2 RCOR1 NCOR1 TRIM22 TRIM28 HMGN3 WHSC1 ZNF143 ARID3A NANOG NFATC1 GTF2F1 GTF3C2 WRNIP1 SREBF2 SETDB1 TARDBP POLR3A POLR2A POLR3G CREBBP TBL1XR1 HA ZC3H11A CREB3L1 SUPT20H ZKSCAN1 NEUROD1 100 ● 75 ● ● ● ● ● Peaks with sequence motif (%) Peaks ● 50 ● ● ● bioRxiv preprint 25 ● ● ● ● contrast, most MAFK peaksing both sequence contain motif its of sequenceto motif the TFs and same with show TF, motif centralenrichment such occupancy enrichment. as of ATF3, 50% centralwhere or enrichment more. of 50% the or motifTFs more is where of also less peaks low. thanIndividual have In 50% of the points: peaks sequence outliers have motifdian. beyond the Box (high sequence a range: motif motif interquartile whisker. (low range occupancy,distribution motif (IQR). among blue). Whisker: occupancy, datasets most green), from extreme and different valueChIP-seq TFs cell within peaks types quartile for and a TF replicates.Figure with Horizontal any line 1: JASPAR of sequence boxplot: motif from me- the TF's family. Boxplots show the 0 SP2 SP1 SP4 YY1 JUN SRF FOS MNT IRF4 IRF3 MYB MAX IRF2 SPI1 MYC TAL1 ATF2 ATF3 NFIC E2F4 ELF1 E2F6 KLF5 BATF PAX5 PAX8 ELK1 ELK4 TCF3 TCF7 EBF1 ZEB1 ETV6 ETS1 BCL3 USF1 NFE2 USF2 RFX1 NFYA PBX2 SOX6 NRF1 RELA JUND CTCF NFYB REST CUX1 MAFF EGR1 RXRA MAFK STAT1 STAT3 GFI1B MEIS2 GATA2 GATA3 GATA1 TCF12 NR2F2 FOSL2 FOSL1 FOXA1 FOXA2 CTCFL TEAD4 THAP1 HNF4A NR2C2 BACH1 GABPA MEF2A RUNX3 RUNX1 FOXM1 HNF4G MEF2C CEBPB ESRRA SMAD1 STAT5A ZNF384 ZNF217 ZNF274 ZNF263 TCF7L2 ZNF207 NFE2L2 ZBTB33 BCL11A ZBTB7A BCLAF1 POU5F1 SREBF1 POU2F2 BHLHE40 SMARCA4 SMARCB1 SMARCC1 doi: Most ChIP-seq peaks lack the TF's sequence motif. (a) DNA-binding proteins with ChIP-seq data in ENCODE dataset certified bypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. 19 Motif occupancy No sequence motif 0% < Motif occupancy < 50% Motif occupancy ≥ 50% https://doi.org/10.1101/168419 of a TF's motif is lower for TFs with motif occupancy of less than 50% compared JASPAR MA0496.1 MAFK b d 0.04 Peaks with motif = 51569 2.0 No sequence motif 1.5 1.0 Bits 0.5 0.0 0% < Motif occupancy < 50% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.03 JASPAR MA0605.1 Atf3 Peaks with motif = 880 2.0 1.5 Motif occupancy ≥ 50% 1.0 Bits 0.5 0.0 0 20 40 60 80 1 2 3 4 5 6 7 8 0.02 Number of TFs with ChIP−seq data in ENCODE ; c this versionpostedMay11,2018.
Recommended publications
  • CV Page 1 THOMAS JEFFERSON SHARPTON Rank
    THOMAS JEFFERSON SHARPTON Rank: Associate Professor Department: Department of Microbiology, Oregon State University Department of Statistics, Oregon State University Date Hired: October 15, 2013 Address: 226 Nash Hall, Corvallis, OR 97331 Phone: (541) 737-8623 Email: [email protected] Web: lab.sharpton.org A. EDUCATION AND EMPLOYMENT INFORMATION Education 2009 Ph.D. Microbiology, Designated Emphasis in Computational Biology, University of California, Berkeley, CA, USA Dissertation: “Investigations of Natural Genomic Variation in the Fungi” Advisors: John W. Taylor & Sandrine Dudoit 2003 B.S. (Cum Laude) Biochemistry and Biophysics, Oregon State University, Corvallis, OR, USA Employment history 2019- Associate Professor Department of Microbiology, Oregon State University Department of Statistics, Oregon State University 2016- Founding Director Oregon State University Microbiome Initiative 2013-2019 Assistant Professor Department of Microbiology, Oregon State University Department of Statistics, Oregon State University 2013- Center Investigator Center for Genome Research and Biocomputing, Oregon State University 2009-2013 Bioinformatics Fellow Gladstone Institute of Cardiovascular Disease, Gladstone Institutes Advisors: Katherine Pollard & Jonathan Eisen (UC Davis) 2003-2009 Graduate Student Researcher Department of Microbiology, University of California, Berkeley Advisors: John Taylor & Sandrine Dudoit 2005 Bioinformatics Research Intern The Broad Institute Advisor: James Galagan Additional Affiliations 2017-2018 Scientific Advisory Board Member Thomas J. Sharpton – CV Page 1 Resilient Biotics, Inc. B. AWARDS 2019 Phi Kappa Phi Emerging Scholar Award 2018 OSU College of Science Research and Innovation Seed Program Award 2017 OSU College of Science Early Career Impact Award 2014 Finalist: Carter Award in Outstanding & Inspirational Teaching in Science 2013 Gladstone Scientific Leadership Award 2007 Dean’s Council Student Research Representative 2007 Effectiveness in Teaching Award, UC Berkeley 2006-2008 Chang-Lin Tien Graduate Research Fellowship C.
    [Show full text]
  • Nandita R. Garud
    NANDITA R. GARUD Department of Ecology and Evolutionary Biology University of California, Los Angeles 621 Charles E. Young Drive South Los Angeles, CA 90095-1606 [email protected] Curriculum Vitae January 2020 EDUCATION 2015 Ph.D. in Genetics Stanford University, Stanford, CA Focus on adaptation in natural populations 2012 MS in Statistics Stanford University, Stanford, CA 2008 BS in Biometry & Statistics and Biological Sciences, (Magna Cum Laude) Cornell University, Ithaca, NY EMPLOYMENT 2019- current Assistant Professor University of California, Los Angeles, CA Department of Ecology and Evolutionary Biology Department of Human Genetics Member, Interdepartmental Program in Bioinformatics Member, Genetics & Genomics Home Area 2015-2019 Bioinformatics Fellow Postdoctoral Position The Gladstone Institutes at the University of California, San Francisco, CA Advisor: Katherine Pollard 2015 summer Data Scientist Intern Intuit Inc., Mountain View, CA PREVIOUS RESEARCH 2010-2015 Ph.D. Candidate Department of Genetics, Stanford University, Stanford, CA Advisor: Dmitri Petrov 2008-2009 Fulbright Scholar The Bioinformatics Centre, University of Copenhagen, Denmark Advisor: Jakob Pedersen 2006-2008 Undergraduate Honors Researcher Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY Advisor: Andrew Clark 2006 Cold Spring Harbor Summer Undergraduate Research Intern Cold Spring Harbor, NY Advisor: Doreen Ware AWARDS AND HONORS 2014-2015 Stanford Center for Evolution and Human Genomics Fellow 2013 Walter Fitch Finalist, Society for Molecular Biology and Evolution, Chicago. 2010-2013 National Science Foundation Graduate Research Fellow 2008-2009 Fulbright Scholar, Denmark 2009-2010 Stanford Genome Training Program Grant 2007-2008 Cornell Hughes Scholar 2007-2008 CALS Charitable Trust Grant and Morley Grant for Undergraduate Research 2004-2008 Dean’s List, Cornell University 2004-2008 Byrd Undergraduate Scholarship 2004-2008 New York Lottery Undergraduate Scholarship PUBLICATIONS 1.
    [Show full text]
  • MSTATNEWS Athe Membership Magazine of the American Statistical Association What Makes Y O U R V O T E Matter?
    April 2008 • Issue #370 MSTATNEWS AThe Membership Magazine of the American Statistical Association What Makes Y o u r V o t e Matter? Publications Agreement No. 41544521 Board of Directors Meeting Highlights Making Statistics Delicious, Not Just Palatable April 2008 • Issue #370 VISION STATEMENT To be a world leader in promoting statistical practice, applications, and research; publishing statistical journals; improving statistical education; FEATURES and advancing the statistics profession. Executive Director 2 Longtime Members Ron Wasserstein: [email protected] 10 President’s Invited Column Associate Executive Director and Director of Operations Stephen Porzio: [email protected] 11 Highlights of the March 2008 Board of Directors Meeting Director of Programs 12 ASA Board Calls for Audits To Increase Confidence in Martha Aliaga: [email protected] Electoral Outcomes Director of Science Policy Steve Pierson: [email protected] 12 ASA Position on Electoral Integrity Managing Editor Megan Murphy: [email protected] 13 Wise Elders Program Production Coordinators/Graphic Artists 14 What Makes Your Vote Matter? Colby Johnson: [email protected] Lidia Vigyázó: [email protected] 16 Canadian Model for Accrediting Professional Statisticians Publications Coordinator Valerie Snider: [email protected] 17 Proposals Sought for NSF-CBMS Conferences Advertising Manager 18 Making Statistics Delicious, Not Just Palatable Claudine Donovan: [email protected] Contributing Staff Members 20 Big Science and Little Statistics Keith Crank • Ron Wasserstein 21 Free Textbook Offered to Stats Teachers Amstat News welcomes news items and let- 22 Success of Statistical Service Leads to Expanded Network ters from readers on matters of interest to the asso- ciation and the profession. Address correspondence 22 NHIS Paradata Released to Public to Managing Editor, Amstat News, American Statistical Association, 732 North Washington Street, Alexandria VA 22314-1943 USA, or email amstat@ 23 Staff Spotlight amstat.org.
    [Show full text]
  • Cvauc: Cross- Validated Area Under the ROC Curve Confidence Intervals
    CURRICULUM VITAE and BIBLIOGRAPHY CONTACT INFORMATION Name: Mark Johannes van der Laan. Nationality: Dutch. Marital status: Married to Martine with children Laura, Lars, and Robin. University address: University of California Division of Biostatistics School of Public Health 108 Haviland Hall Berkeley, CA 94720-7360 email: [email protected]. Telephone number: office: 510-643-9866 fax: 510-643-5163 Web address: www.stat.berkeley.edu/ laan Working Papers, Division of Biostatistics: www.bepress.com/ucbbiostat EDUCATION 1990-1993: Department of Mathematics, Utrecht University. Ph.D student of Prof. Dr. R.D. Gill. Position included 25% teaching, 75% research and education. Specialization in Estimation in Semiparametric and Censored Data Models. 1991-1992: University of California, Berkeley. Statistics Program at M.S.R.I.: \Semiparametric Models and Survival Analysis". Research with guidance by second Promotor Prof. Dr. P.J. Bickel. Subject: “Efficient Estimation in the Bivariate Censoring Model". December 13, 1993: Official Public Defense of Ph.D Thesis. 1985-1990: Masters degree in Mathematics at the University of Utrecht, The Netherlands. Statistics Major. 1988-1989: One year study, Masters degree courses at the Department of Statistics, North Carolina State University, Raleigh, North Carolina, U.S.A. G.P.A 4.0, Dean's List. 1989-1990: Masters thesis under guidance of Prof. Dr. R.D. Gill. Subject: The Dabrowska Estimator and the Functional Delta method. Grade (from 1-10, 10=top): 9.5. Official Completion: May 1, 1990. 1 ACADEMIC POSITIONS 2006-present: Jiann-Ping Hsu/Karl E. Peace Endowed Chair in Biostatistics. 2013-2018: Investigator and core leader of the methods workgroup of the Sustainable East African Research in Community Health (SEARCH).
    [Show full text]
  • Downloaded from NCBI Genbank Or Sequence
    Lind and Pollard Microbiome (2021) 9:58 https://doi.org/10.1186/s40168-021-01015-y METHODOLOGY Open Access Accurate and sensitive detection of microbial eukaryotes from whole metagenome shotgun sequencing Abigail L. Lind1 and Katherine S. Pollard1,2,3,4,5* Abstract Background: Microbial eukaryotes are found alongside bacteria and archaea in natural microbial systems, including host-associated microbiomes. While microbial eukaryotes are critical to these communities, they are challenging to study with shotgun sequencing techniques and are therefore often excluded. Results: Here, we present EukDetect, a bioinformatics method to identify eukaryotes in shotgun metagenomic sequencing data. Our approach uses a database of 521,824 universal marker genes from 241 conserved gene families, which we curated from 3713 fungal, protist, non-vertebrate metazoan, and non-streptophyte archaeplastida genomes and transcriptomes. EukDetect has a broad taxonomic coverage of microbial eukaryotes, performs well on low-abundance and closely related species, and is resilient against bacterial contamination in eukaryotic genomes. Using EukDetect, we describe the spatial distribution of eukaryotes along the human gastrointestinal tract, showing that fungi and protists are present in the lumen and mucosa throughout the large intestine. We discover that there is a succession of eukaryotes that colonize the human gut during the first years of life, mirroring patterns of developmental succession observed in gut bacteria. By comparing DNA and RNA sequencing of paired samples from human stool, we find that many eukaryotes continue active transcription after passage through the gut, though some do not, suggesting they are dormant or nonviable. We analyze metagenomic data from the Baltic Sea and find that eukaryotes differ across locations and salinity gradients.
    [Show full text]
  • Patrick J. H. Bradley
    (415) 734-2745 Patrick J. H. Bradley Gladstone Institutes, GIDB [email protected] 1650 Owens Street Pronouns: he/him/his Bioinformatics Fellow San Francisco, CA 94158 Current Position ······················································································· 2013— J. David Gladstone Institutes at UCSF Bioinformatics Fellow, Prof. Katherine S. Pollard Lab Academic History ····················································································· 2012—13 Lewis-Sigler Institute for Integrative Genomics, Princeton University Postdoctoral Fellow, Prof. Olga G. Troyanskaya Lab 2005—12 Dept. of Molecular Biology, Princeton University Ph.D. in Molecular Biology, Specialization in Quantitative and Computational Biology Thesis: Inferring Metabolic Regulation from High-Throughput Data Advisors: Prof. Joshua D. Rabinowitz, Prof. Olga G. Troyanskaya Committee: Prof. Ned S. Wingreen, Prof. David Botstein 2005 Dept. of Biology, Harvard College A.B. in Biology Peer-Reviewed Publications ·········································································· 1. Patrick H. Bradley, Katherine S. Pollard. “phylogenize: correcting for phylogeny reveals genes associated with microbial distributions.” Bioinformatics, 2019; btz722.∗;z 2. Patrick H. Bradley, Patrick A. Gibney, David Botstein, Olga G. Troyanskaya, Joshua D. Rabinowitz. “Minor isozymes tailor yeast metabolism to carbon availability.” mSystems, 2019; 4:e00170-18.∗;y 3. Patrick H. Bradley, Stephen Nayfach, Katherine S. Pollard. “Phylogeny-corrected identification
    [Show full text]
  • CV 2015 Sharpton, Thomas.Pdf
    THOMAS JEFFERSON SHARPTON Department of Microbiology [email protected] Oregon State University lab.sharpton.org 226 Nash Hall, Corvallis, OR 97331 @tjsharpton tel: (541) 737-8623 fax: (541) 737-0496 github.com/sharpton EDUCATION 2009 University of California, Berkeley, CA Ph.D., Microbiology, Designated Emphasis in Computational Biology 2003 Oregon State University, Corvallis, OR B.S. (Cum Laude), Biochemistry and Biophysics. APPOINTMENTS 2013- Assistant Professor Departments of Microbiology and Statistics, Oregon State University 2009-2013 Bioinformatics Fellow Gladstone Institute of Cardiovascular Disease, Gladstone Institutes Mentors: Katherine Pollard & Jonathan Eisen (UC Davis) 2003-2009 Graduate Student Researcher Department of Microbiology, University of California, Berkeley Mentors: John Taylor & Sandrine Dudoit 2005 Bioinformatics Research Intern The Broad Institute Mentor: James Galagan 2002-2003 Research Assistant Department of Zoology, Oregon State University 2001 Research Assistant International Arctic Research Center, University of Alaska, Fairbanks, AK HONORS & AWARDS 2013 Gladstone Scientific Leadership Award 2007 Dean’s Council Student Research Representative 2007 Effectiveness in Teaching Award, UC Berkeley 2006-2008 Chang-Lin Tien Graduate Research Fellowship 2002 Student Leadership Fellowship 2002 HHMI Undergraduate Research Fellowship 2002-2003 Dr. Donald MacDonald Excellence in Science Scholarship 2002 OSU College of Science's Excellence in Science Scholarship 1999-2003 Provost Scholarship 1999 American Legion Student Ethics Award PUBLICATIONS Refereed Journal Articles 17. O’Dwyer JP, Kembel SW, Sharpton TJ (In Review) Backbones of Evolutionary History Test Biodiversity Theory for Microbes 16. Quandt AC, Kohler A, Hesse C, Sharpton TJ, Martin F, Spatafora JW. (In Print) Metagenomic sequence of Elaphomyces granulatus from sporocarp tissue reveals Ascomycota ectomycorrhizal fingerprints of genome expansion and a Proteobacteria rich microbiome.
    [Show full text]
  • ASA Commends NSF for Initiative To
    Embargoed until 9 a.m. (EDT) August 13, 2015 STATISTICAL ADVANCES HELP UNLOCK MYSTERIES OF THE HUMAN MICROBIOME SEATTLE, WA, AUGUST 13, 2015 – Advances in the field of statistics are helping to unlock the mysteries of the human microbiome—the vast collection of microorganisms living in and on the bodies of humans, said Katherine Pollard, a statistician and biome expert, during a session today at the 2015 Joint Statistical Meetings (JSM 2015) in Seattle. Pollard, senior investigator at the Gladstone Institutes and professor of epidemiology and biostatistics at the University of California, San Francisco, delivered a presentation titled “Estimating Taxonomic and Functional Diversity in Shotgun Metagenomes” during an invited session focused on statistics, the microbiome and human health. “While we are only just beginning to understand the complex roles microbes play in human biology, it is clear specific changes in microbial flora are associated with—and sometimes cause or cure—disease in the host,” said Pollard while explaining her research focus. “Some of the best-supported links are with autoimmune diseases, which are on the rise in the United States, perhaps due to antibiotic use and lack of exposure to a diverse collection of microbes during childhood. This ‘hygiene hypothesis’ suggests that health risks not attributable to human genetics and behavior may stem from differences in microbiome composition between individuals.” What’s more, a given microbial species can share less than 50% of the same genes when found in two people. These differences track with functional capabilities of the microbial communities, including genes related to sugar metabolism, biosynthesis and two-component systems.
    [Show full text]
  • Clinical Efficacy and Increased Donor Strain Engraftment After Antibiotic
    medRxiv preprint doi: https://doi.org/10.1101/2021.08.07.21261556; this version posted August 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . Clinical efficacy and increased donor strain engraftment after antibiotic pretreatment in a randomized trial of ulcerative colitis patients receiving fecal microbiota transplant Byron J. Smith A,B (ORCID: 0000-0002-0182-404X) Yvette Piceno C (ORCID: 0000-0002-7915-4699) Martin Zydek D Bing Zhang D (ORCID: 0000-0002-3377-0963) Lara Aboud Syriani E Jonathan P. Terdiman D Zain Kassam F Averil Ma G (ORCID: 0000-0003-4817-1258) Susan V. Lynch D,H (ORCID: 0000-0001-5695-7336) Katherine S. Pollard A,B,I,* (ORCID: 0000-0002-9870-6196) Najwa El-Nachef D,* A The Gladstone Institute of Datascience and Biotechnology, San Francisco, CA B Department of Epidemiology and Biostatistics, University of California, San Francisco, CA C Symbiome, Inc., San Francisco, CA D Division of Gastroenterology, University of California, San Francisco, CA E College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA F Finch Therapeutics, Somerville, MA G Department of Medicine, University of California, San Francisco, CA H Benioff Center for Microbiome Medicine, University of California, San Francisco, CA I Chan-Zuckerberg Biohub, San Francisco, CA * Corresponding authors: [email protected] [email protected] NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
    [Show full text]
  • Exploring the Behaviour of the Hidden Markov Model on Cpg Island Prediction
    Exploring the behaviour of the Hidden Markov Model on CpG island prediction A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master of Science in the Department of Computer Science University of Saskatchewan Saskatoon By Arnie Berg c Arnie Berg, May/2013. All rights reserved. Abstract DNA can be represented abstrzctly as a language with only four nucleotides represented by the letters A, C, G, and T, yet the arrangement of those four letters plays a major role in determining the development of an organism. Understanding the significance of certain arrangements of nucleotides can unlock the secrets of how the genome achieves its essential functionality. Regions of DNA particularly enriched with cytosine (C nucleotides) and guanine (G nucleotides), especially the CpG di-nucleotide, are frequently associated with biological function related to gene expression, and concentrations of CpGs referred to as \CpG islands" are known to collocate with regions upstream from gene coding sequences within the promoter region. The pattern of occurrence of these nucleotides, relative to adenine (A nucleotides) and thymine (T nucleotides), lends itself to analysis by machine-learning techniques such as Hidden Markov Models (HMMs) to predict the areas of greater enrichment. HMMs have been applied to CpG island prediction before, but often without an awareness of how the outcomes are affected by the manner in which the HMM is applied. Two main findings of this study are: 1. The outcome of a HMM is highly sensitive to the setting of the initial probability estimates. 2.
    [Show full text]
  • Chromatin Features Constrain Structural Variation Across Evolutionary Timescales
    Chromatin features constrain structural variation across evolutionary timescales Geoff Fudenberga,1 and Katherine S. Pollarda,b,c,d,e,f,1 aGladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158; bDepartment of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158; cInstitute for Human Genetics, University of California, San Francisco, CA 94158; dQuantitative Biology Institute, University of California, San Francisco, CA 94158; eInstitute for Computational Health Sciences, University of California, San Francisco, CA 94158; and fChan-Zuckerberg Biohub, San Francisco, CA 94158 Edited by Jasper Rine, University of California, Berkeley, CA, and approved December 10, 2018 (received for review May 23, 2018) The potential impact of structural variants includes not only the with autism and developmental delay, where deletions occur re- duplication or deletion of coding sequences, but also the pertur- markably uniformly across the genome, and in cancer, where dele- bation of noncoding DNA regulatory elements and structural tions in fact show a slight enrichment for disrupting otherwise chromatin features, including topological domains (TADs). Struc- important features. Together our analyses uncover a genome- tural variants disrupting TAD boundaries have been implicated wide pattern of negative selection against deletions that could po- both in cancer and developmental disease; this likely occurs via tentially alter chromatin structure and lead to enhancer hijacking. “enhancer hijacking,” whereby
    [Show full text]
  • Exploring the Behaviour of the Hidden Markov Model on Cpg Island Prediction
    CORE Metadata, citation and similar papers at core.ac.uk Provided by University of Saskatchewan's Research Archive Exploring the behaviour of the Hidden Markov Model on CpG island prediction A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master of Science in the Department of Computer Science University of Saskatchewan Saskatoon By Arnie Berg c Arnie Berg, May/2013. All rights reserved. Abstract DNA can be represented abstrzctly as a language with only four nucleotides represented by the letters A, C, G, and T, yet the arrangement of those four letters plays a major role in determining the development of an organism. Understanding the significance of certain arrangements of nucleotides can unlock the secrets of how the genome achieves its essential functionality. Regions of DNA particularly enriched with cytosine (C nucleotides) and guanine (G nucleotides), especially the CpG di-nucleotide, are frequently associated with biological function related to gene expression, and concentrations of CpGs referred to as \CpG islands" are known to collocate with regions upstream from gene coding sequences within the promoter region. The pattern of occurrence of these nucleotides, relative to adenine (A nucleotides) and thymine (T nucleotides), lends itself to analysis by machine-learning techniques such as Hidden Markov Models (HMMs) to predict the areas of greater enrichment. HMMs have been applied to CpG island prediction before, but often without an awareness of how the outcomes are affected by the manner in which the HMM is applied. Two main findings of this study are: 1.
    [Show full text]