Genome Informatics Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

Total Page:16

File Type:pdf, Size:1020Kb

Genome Informatics Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Comparative and Functional Genomics Comp Funct Genom 2001; 2: 376–383. DOI: 10.1002/cfg.114 Meeting Highlights Joint Cold Spring Harbor Laboratory and Wellcome Trust conference – genome informatics Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. August 8–12, 2001 Jo Wixon1*, Jennifer Ashurst2 and Jo Dicks3 1 HGMP-RC, Hinxton, Cambridge, UK 2 Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK 3 John Innes Centre, Norwich Research Park, Colney, Norwich, UK *Correspondence to: HGMP-RC, Hinxton, Cambridge, CB10 1SB, UK. E-mail: [email protected] Published online: 26 October 2001 Annotation pipelines described CAEPA, an online, bulk EST sequence processing and annotation pipeline. The group’s Ewan Birney (EBI, UK) opened the meeting with main aim is gene discovery using EST data, so far a presentation on the Ensembl gene building sys- they have annotated and submitted y500 000 tem (http://www.ensembl.org). The first part of the ESTs. They have produced an automated initial process is Exonerate (Guy Slater), this is the ‘basic annotation pipeline, and provide support for serial guts’ of BLAST, allows multiplexing and makes subtraction, which includes clustering to detect a lazy evaluation of the path between HSPs, to novelty, and large scale BLAST for synteny assess- rapidly build gapped alignments. This was designed ment. Their EST processing pipeline consists of an for ESTs, but is now being applied to mouse whole EST preparation stage with a vector and contami- genome shotgun data. The next stage is Pmatch nant screen and a repeat masker and low complex- (Richard Durbin), a hyper-fast protein-based exact ity screen, a local annotation tool looks for 3k and 5k matcher (similar to Jim Kent’s BLAT). This finds exact 14mer substrings by building a table of non- overlapping 5mer matches, and using pairs of con- secutive 5mer matches as seeds. A targeted gene build uses Pmatch to match all known human proteins to the entire genome, and is then refined using Genome Wise. Genome Wise (Ewan Birney) uses information such as EST alignments to the genome to build gene models (Figure 1 ). It can reconcile overlapping alignments and uses ‘tunnel- ling’ in the absence of splices in a match, this extends the ORF, where possible (until it finds a stop codon). Tom Casavant (University of Iowa, USA) Figure 1. An illustration of the abilities of Genome Wise Copyright # 2001 John Wiley & Sons, Ltd. Meeting Highlights 377 EST features, this is followed by clustering and then work is centred on Neurospora crassa, but they radiation hybrid mapping. To make this accessible hope to scale up to a larger eukaryote genome next. to groups running small projects, they have pro- Inna Dubchak (LBNL, USA) described a pipe- duced a web ‘front end’ with a data drop point, and line for real-time comparative analysis of the human e-mail return of results from custom BLAST and mouse genomes (http://pipeline.lbl.gov), which is searches of internal datasets. Users can choose called Godzilla, because it is going to be huge. They which parts of the pipeline they which to take download mouse data daily from GenBank and advantage of and can make other selections such as pass this through Masker Aid and a first round which libraries of repeats to screen against. homology search using BLAT, to determine which Ron Chen (DoubleTwist, USA) described their sequences are then compared to the April freeze of strategy for prioritising predicted human genes for the Golden Path human annotation using BLAT. experimental validation. Their high-throughput The next stage is called AVID, this is a global approach starts with the ab initio gene prediction alignment engine which can handle a combination programs GenScan, FGENESH and GrailEXP. To of finished and draft sequence. The final alignments this data they add protein homology data and are viewed using VISTA. The pipeline has been results from comparisons to RefSeq, UniGene and used to discover conserved sequence fragments, their own gene index. These three sources of which they have seen in coding and non-coding evidence are then considered by Gene Squasher, regions. It has also found overlapping contigs in which makes locus predictions. It constructs exons GenBank and detected mis-assemblies. Their data is by assembling overlapping candidates from the soon to be included in the UCSC genome viewer. three methods and assembles them into genes, if Michelle Clamp (EBI, UK) spoke about the there is a common gene model or other supporting Ensembl analysis pipeline. They currently have evidence. All the data goes into their Agave XML 4.3 Gb of data (480 000 sequence reads) to analyse. format and they have a genomic viewer, which 16 types of analysis are used, which requires a lot of handles the data, showing the three types of exon organisation. They have automated job submission, data and the final predictions. Their current locus tracking programs, a set-up to retry failed jobs prediction counts are ab initio – 39 000, protein and access to large, file-based databases. They have homology – 31 000, and gene index match – 68 000. a cluster with 2 Terabytes of storage and a farm of Checking for overlaps brings the total down to 320 machines with 60 Gb local drive. New entries y98 000, half of these have only gene index match go immediately into tracking, which determines data, only 10% are detected by all three approaches. which jobs need doing and what order to do them, 30.2% (y29 500) are high confidence, with two lines then forms a job and sends it to the farm. Com- of evidence. munication between the farm and the cluster is very William Fitzhugh (Whitehead Institute, USA) complex, so they keep as much as possible on the presented Calhoun, a comprehensive system for local drives (hence their large size) such as BLAST genome sequence annotation. A crucial part of this databases, binaries and runtime files. The data is approach is that it has a platform to tie the existing fetched or written directly to the databases. They stand-alone genome annotation tools together use BioPerl (which Michelle likes very much) and and keep track of results. It has a tool for viewing MySQL. This is an open source project, the results and can manage and run a large number of software and data are freely available. The entire jobs at any given time. Calhoun has an Oracle system is portable and there are currently more relational database, a Java sequence browser, web- than 10 remote installations of the website. based tools accessible by the public and an ana- GANESH; a sequence analysis and display pack- lysis pipeline. The database schema is huge, with a age, was presented by Holger Hummerich (Imperial vast array of entities, from sequence, to taxa, to College, UK). This is a user-friendly tool aimed at mapping data. They use SQL to ask biologically researchers interested in small regions, such as those relevant questions. The system uses an analysis hunting for disease genes (http://zebrafish.doc.ic. queue table to organise analysis jobs, which can ac.uk/Ganesh/ganesh.html). It is dynamic and updates keep track of all the jobs done on a particular daily, allows selective use (such as choosing to view sequence. Web ‘front end’ tools include a gene only new data), and can display data at varying search and the ability to visualise the data and a levels of resolution, from clone contigs up to BLAST combined physical and genetic map. Their current alignments. The system downloads the relevant data Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 376–383. 378 Meeting Highlights from GenBank and runs several analysis programs, reduce the high risk of mis-assembly that exists with including Repeat Masker, GenScan and Pfam. such incomplete and repeat rich data. They now BLAST searches are performed across a wide range plan to improve this tool and to take on a pilot of databases. This open source tool uses a MySQL project using Caenorhabditis briggsae data. relational database and has a Java front-end display. David Jaffe (Whitehead Institute, USA) descri- The display shows the sequence across the centre of bed the Arachne whole genome shotgun assembler, the window, with data for one strand above and the which we have already reported on in the highlights other strand below. Features such as ESTs, SNPs, of the CSHL Genome Sequencing and Biology repeats, exons and promoters are marked on the meeting in issue 2(4) of Comparative and Functional display in different colours and users can add their Genomics. own annotation. Colin Semple (University of Edinburgh, UK) pre- TIGR gene indices (http://www.tigr.org/tdb/tgi. sented a computational comparison of human geno- shtml) have now been assembled for 13 animal mic sequence assemblies for a region of chromosome species, 11 plant species, 12 protists and 5 fungi. 4. His group has a contig of 58 BAC/PAC clones Dan Lee (TIGR, USA) described how to build an covering a 5.8 Mb region linked to manic depres- index they take ESTs from GenBank and their own sion. These have been used to order 107 STS databases and remove vector, poly A tails, mito- markers across the region. They used this high- chondrial and ribosomal sequences etc. They then resolution physical mapping data to assess the harvest expressed transcripts from GenBank cds coverage of the region by the UCSC, NCBI and entries and reduce the redundancy amongst these Celera genome assemblies. The principal observa- before comparing to the ESTs to build clusters. tion was that there was significant variation bet- Each cluster is assembled into a tentative consensus ween the three assemblies.
Recommended publications
  • Human OMG Blocking Peptide (CDBP2120) This Product Is for Research Use Only and Is Not Intended for Diagnostic Use
    Human OMG blocking peptide (CDBP2120) This product is for research use only and is not intended for diagnostic use. PRODUCT INFORMATION Product Overview Blocking/Immunizing peptide for anti-OMG antibody Antigen Description OMG (oligodendrocyte myelin glycoprotein) is a protein-coding gene. Diseases associated with OMG include patulous eustachian tube, and internuclear ophthalmoplegia, and among its related super-pathways are p75NTR regulates axonogenesis and Development MAG-dependent inhibition of neurite outgrowth. GO annotations related to this gene include identical protein binding. Nature Synthetic Expression System N/A Species Human Species Reactivity Human Conjugate Unconjugated Applications Apuri, BL, ELISA Procedure None Format Lyophilized powder Size 100 μg Preservative None Storage Shipped at ambient temperature, store at -20°C. ANTIGEN GENE INFORMATION Gene Name OMG oligodendrocyte myelin glycoprotein [ Homo sapiens ] Official Symbol OMG Synonyms OMG; oligodendrocyte myelin glycoprotein; oligodendrocyte-myelin glycoprotein; OMGP; Entrez Gene ID 4974 mRNA Refseq NM_002544 45-1 Ramsey Road, Shirley, NY 11967, USA Email: [email protected] Tel: 1-631-624-4882 Fax: 1-631-938-8221 1 © Creative Diagnostics All Rights Reserved Protein Refseq NP_002535 UniProt ID P23515 Chromosome Location 17q11-q12 Pathway Axonal growth inhibition (RHOA activation), organism-specific biosystem; Signal Transduction, organism-specific biosystem; Signalling by NGF, organism-specific biosystem; p75 NTR receptor-mediated signalling, organism-specific biosystem; p75(NTR)-mediated signaling, organism-specific biosystem; p75NTR regulates axonogenesis, organism-specific biosystem; Function identical protein binding; 45-1 Ramsey Road, Shirley, NY 11967, USA Email: [email protected] Tel: 1-631-624-4882 Fax: 1-631-938-8221 2 © Creative Diagnostics All Rights Reserved.
    [Show full text]
  • Cancer Informatics: New Tools for a Data-Driven Age in Cancer Research Warren Kibbe1, Juli Klemm1, and John Quackenbush2
    Cancer Focus on Computer Resources Research Cancer Informatics: New Tools for a Data-Driven Age in Cancer Research Warren Kibbe1, Juli Klemm1, and John Quackenbush2 Cancer is a remarkably adaptable and formidable foe. Cancer Precision Medicine Initiative highlighted the importance of data- exploits many biological mechanisms to confuse and subvert driven cancer research, translational research, and its application normal physiologic and cellular processes, to adapt to thera- to decision making in cancer treatment (https://www.cancer.gov/ pies, and to evade the immune system. Decades of research and research/key-initiatives/precision-medicine). And the National significant national and international investments in cancer Strategic Computing Initiative highlighted the importance of research have dramatically increased our knowledge of the computing as a national competitive asset and included a focus disease, leading to improvements in cancer diagnosis, treat- on applying computing in biomedical research. Articles in the ment, and management, resulting in improved outcomes for mainstream media, such as that by Siddhartha Mukherjee in many patients. the New Yorker in April of 2017 (http://www.newyorker.com/ In melanoma, the V600E mutation in the BRAF gene is now magazine/2017/04/03/ai-versus-md), have emphasized the targetable by a specific therapy. BRAF is a serine/threonine protein growing importance of computing, machine learning, and data kinase activating the MAP kinase (MAPK)/ERK signaling pathway, in biomedicine. and both BRAF and MEK inhibitors, such as vemurafenib and The NCI (Rockville, MD) recognized the need to invest in dabrafenib, have shown dramatic responses in patients carrying informatics. In 2011, it established a funding opportunity, the mutation.
    [Show full text]
  • Finding New Order in Biological Functions from the Network Structure of Gene Annotations
    Finding New Order in Biological Functions from the Network Structure of Gene Annotations Kimberly Glass 1;2;3∗, and Michelle Girvan 3;4;5 1Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA 2Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 3Physics Department, University of Maryland, College Park, MD, USA 4Institute for Physical Science and Technology, University of Maryland, College Park, MD, USA 5Santa Fe Institute, Santa Fe, NM, USA The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to each other. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternate natural way to group the functional terms which is different from the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our method connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different experiments. We show that grouping terms by this alternate scheme is distinct from term relationships defined in the ontological structure and provides a new framework with which to describe and predict the functions of experimentally identified sets of genes. Tools and supplemental material related to this work can be found online at http://www.networks.umd.edu.
    [Show full text]
  • CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Bioinformatic
    CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Bioinformatic Comparison of the EVI2A Promoter and Coding Regions A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Biology By Max Weinstein May 2020 The thesis of Max Weinstein is approved: Professor Rheem D. Medh Date Professor Virginia Oberholzer Vandergon Date Professor Cindy Malone, Chair Date California State University Northridge ii Table of Contents Signature Page ii List of Figures v Abstract vi Introduction 1 Ecotropic Viral Integration Site 2A, a Gene within a Gene 7 Materials and Methods 11 PCR and Cloning of Recombinant Plasmid 11 Transformation and Cell Culture 13 Generation of Deletion Constructs 15 Transfection and Luciferase Assay 16 Identification of Transcription Factor Binding Sites 17 Determination of Region for Analysis 17 Multiple Sequence Alignment (MSA) 17 Model Testing 18 Tree Construction 18 Promoter and CDS Conserved Motif Search 19 Results and Discussion 20 Choice of Species for Analysis 20 Mapping of Potential Transcription Factor Binding Sites 20 Confirmation of Plasmid Generation through Gel Electrophoresis 21 Analysis of Deletion Constructs by Transient Transfection 22 EVI2A Coding DNA Sequence Phylogenetics 23 EVI2A Promoter Phylogenetics 23 EVI2A Conserved Leucine Zipper 25 iii EVI2A Conserved Casein kinase II phosphorylation site 26 EVI2A Conserved Sox-5 Binding Site 26 EVI2A Conserved HLF Binding Site 27 EVI2A Conserved cREL Binding Site 28 EVI2A Conserved CREB Binding Site 28 Summary 29 Appendix: Figures 30 Literature Cited 44 iv List of Figures Figure 1. EVI2A is nested within the gene NF1 30 Figure 2 Putative Transcriptions Start Site 31 Figure 3. Gel Electrophoresis of pGC Blue cloned, Restriction Digested Plasmid 32 Figure 4.
    [Show full text]
  • Discoveryspace: an Interactive Data Analysis Application
    Open Access Software2007RobertsonetVolume al. 8, Issue 1, Article R6 DiscoverySpace: an interactive data analysis application comment Neil Robertson, Mehrdad Oveisi-Fordorei, Scott D Zuyderduyn, Richard J Varhol, Christopher Fjell, Marco Marra, Steven Jones and Asim Siddiqui Address: Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre (BCCRC), British Columbia Cancer Agency (BCCA), Vancouver, BC, Canada. reviews Correspondence: Neil Robertson. Email: [email protected]. Mehrdad Oveisi-Fordorei. Email: [email protected]. Asim Siddiqui. Email: [email protected] Published: 08 January 2007 Received: 24 March 2006 Revised: 4 July 2006 Genome Biology 2007, 8:R6 (doi:10.1186/gb-2007-8-1-r6) Accepted: 8 January 2007 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2007/8/1/R6 reports © 2007 Robertson et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Interactive<p>DiscoverySpace, data analysis a graphical application for bioinformatics data analysis, in particular analysis of SAGE data, is described</p> deposited research Abstract DiscoverySpace is a graphical application for bioinformatics data analysis. Users can seamlessly traverse references between biological databases and draw together annotations in an intuitive tabular interface. Datasets can be compared using a suite of novel tools to aid in the identification of significant patterns. DiscoverySpace is of broad utility and its particular strength is in the analysis of serial analysis of gene expression (SAGE) data.
    [Show full text]
  • In This Table Protein Name, Uniprot Code, Gene Name P-Value
    Supplementary Table S1: In this table protein name, uniprot code, gene name p-value and Fold change (FC) for each comparison are shown, for 299 of the 301 significantly regulated proteins found in both comparisons (p-value<0.01, fold change (FC) >+/-0.37) ALS versus control and FTLD-U versus control. Two uncharacterized proteins have been excluded from this list Protein name Uniprot Gene name p value FC FTLD-U p value FC ALS FTLD-U ALS Cytochrome b-c1 complex P14927 UQCRB 1.534E-03 -1.591E+00 6.005E-04 -1.639E+00 subunit 7 NADH dehydrogenase O95182 NDUFA7 4.127E-04 -9.471E-01 3.467E-05 -1.643E+00 [ubiquinone] 1 alpha subcomplex subunit 7 NADH dehydrogenase O43678 NDUFA2 3.230E-04 -9.145E-01 2.113E-04 -1.450E+00 [ubiquinone] 1 alpha subcomplex subunit 2 NADH dehydrogenase O43920 NDUFS5 1.769E-04 -8.829E-01 3.235E-05 -1.007E+00 [ubiquinone] iron-sulfur protein 5 ARF GTPase-activating A0A0C4DGN6 GIT1 1.306E-03 -8.810E-01 1.115E-03 -7.228E-01 protein GIT1 Methylglutaconyl-CoA Q13825 AUH 6.097E-04 -7.666E-01 5.619E-06 -1.178E+00 hydratase, mitochondrial ADP/ATP translocase 1 P12235 SLC25A4 6.068E-03 -6.095E-01 3.595E-04 -1.011E+00 MIC J3QTA6 CHCHD6 1.090E-04 -5.913E-01 2.124E-03 -5.948E-01 MIC J3QTA6 CHCHD6 1.090E-04 -5.913E-01 2.124E-03 -5.948E-01 Protein kinase C and casein Q9BY11 PACSIN1 3.837E-03 -5.863E-01 3.680E-06 -1.824E+00 kinase substrate in neurons protein 1 Tubulin polymerization- O94811 TPPP 6.466E-03 -5.755E-01 6.943E-06 -1.169E+00 promoting protein MIC C9JRZ6 CHCHD3 2.912E-02 -6.187E-01 2.195E-03 -9.781E-01 Mitochondrial 2-
    [Show full text]
  • Extracting Biological Meaning from High-Dimensional Datasets John
    Extracting Biological Meaning from High-Dimensional Datasets John Quackenbush Dana-Farber Cancer Institute and the Harvard School of Public Health, U.S.A. The revolution of genomics has come not from the ”completed” genome sequences of hu- man, mouse, rat, and other species. Nor has it come from the preliminary catalogues of genes that have been produced in these species. Rather, the genomic revolution has been in the creation of technologies - transcriptomics, proteomics, metabolomics - that allow us to rapidly assemble data on large numbers of samples that provide information on the state of tens of thousands of biological entities. Although the gene-by-gene hypothesis testing approach remains the standard for dissecting biological function, ’omic technolo- gies have become a standard laboratory tool for generating new, testable hypotheses. The challenge is now no longer generating the data, but rather in analyzing and inter- preting it. Although new statistical and data mining techniques are being developed, they continue to wrestle with the problem of having far fewer samples than necessary to constrain the analysis. One way to deal with this problem is to use the existing body of biological data, including genotype, phenotype, the genome, its annotation and the vast body of biological literature. Through examples, we will demonstrate show how diverse datasets can be used in conjunction with computational tools to constrain ’omics datasets and extract meaningful results that reveal new features of the underlying biology. [ John Quackenbush, 44 Binney Street, SM822 Boston, MA 02115-6084, U.S.A.; [email protected]].
    [Show full text]
  • Molecular Processes During Fat Cell Development Revealed by Gene
    Open Access Research2005HackletVolume al. 6, Issue 13, Article R108 Molecular processes during fat cell development revealed by gene comment expression profiling and functional annotation Hubert Hackl¤*, Thomas Rainer Burkard¤*†, Alexander Sturn*, Renee Rubio‡, Alexander Schleiffer†, Sun Tian†, John Quackenbush‡, Frank Eisenhaber† and Zlatko Trajanoski* * Addresses: Institute for Genomics and Bioinformatics and Christian Doppler Laboratory for Genomics and Bioinformatics, Graz University of reviews Technology, Petersgasse 14, 8010 Graz, Austria. †Research Institute of Molecular Pathology, Dr Bohr-Gasse 7, 1030 Vienna, Austria. ‡Dana- Farber Cancer Institute, Department of Biostatistics and Computational Biology, 44 Binney Street, Boston, MA 02115. ¤ These authors contributed equally to this work. Correspondence: Zlatko Trajanoski. E-mail: [email protected] Published: 19 December 2005 Received: 21 July 2005 reports Revised: 23 August 2005 Genome Biology 2005, 6:R108 (doi:10.1186/gb-2005-6-13-r108) Accepted: 8 November 2005 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2005/6/13/R108 © 2005 Hackl et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which deposited research permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Gene-expression<p>In-depthadipocytecell development.</p> cells bioinformatics were during combined fat-cell analyses with development de of novo expressed functional sequence annotation tags fo andund mapping to be differentially onto known expres pathwayssed during to generate differentiation a molecular of 3 atlasT3-L1 of pre- fat- Abstract Background: Large-scale transcription profiling of cell models and model organisms can identify novel molecular components involved in fat cell development.
    [Show full text]
  • Day 1 Session 4
    USING NETWORKS TO RE-EXAMINE THE GENOME-PHENOME CONNECTION John Quackenbush Dana-Farber Cancer Institute Harvard School of Public Health The WØRD When you feel it in your gut, you know it must be right. 697 SNPs explain 20% of height ~2,000 SNPs explain 21% of height ~3,700 SNPs explain 24% of height ~9,500 SNPs explain 29% of height 97 SNPs explain 2.7% of BMI All common SNPs may explain 20% of BMI Do we give up on GWAS, fine map everything, or think differently? eQTL Analysis Use genome-wide data on genetic variants (SNPs = Single Nucleotide Polymorphisms) and gene expression data together Treat gene expression as a quantitative trait Ask, “Which SNPs are correlated with the degree of gene expression?” Most people concentrate on cis-acting SNPs What about trans-acting SNPs? John Platig eQTL Networks: A simple idea • eQTLs should group into communities with core SNPs regulating particular cellular functions • Perform a “standard eQTL” analysis using Matrix_EQTL: Y = β0 + β1 ADD + ε where Y is the quantitative trait and ADD is the allele dosage of a genotype. John Platig Which SNPs affect function? Many strong eQTLs are found near the target gene. But what about multiple SNPs that are correlated with multiple genes? SNPs Can a network of SNP- gene associations inform the functional roles of these SNPs? Genes John Platig Results: COPD John Platig What about GWAS SNPs? John Platig What about GWAS SNPs? The “hubs” are a GWAS desert! John Platig What are the critical areas? Abraham Wald: Put the armor where the bullets aren’t! http://cameronmoll.com/Good_vs_Great.pdf Network Structure Matters? • Are “disease” SNPs skewed towards the top of my SNP list as ranked by the overall out degree? • No! • The collection of highest-degree SNPs is devoid of disease-related SNPs • Highly deleterious SNPs that affect many processes are probably removed by strong negative selection.
    [Show full text]
  • The Human Proteome
    Vidal et al. Clinical Proteomics 2012, 9:6 http://www.clinicalproteomicsjournal.com/content/9/1/6 CLINICAL PROTEOMICS MEETING REPORT Open Access The human proteome – a scientific opportunity for transforming diagnostics, therapeutics, and healthcare Marc Vidal1, Daniel W Chan2, Mark Gerstein3, Matthias Mann4, Gilbert S Omenn5*, Danilo Tagle6, Salvatore Sechi7* and Workshop Participants Abstract A National Institutes of Health (NIH) workshop was convened in Bethesda, MD on September 26–27, 2011, with representative scientific leaders in the field of proteomics and its applications to clinical settings. The main purpose of this workshop was to articulate ways in which the biomedical research community can capitalize on recent technology advances and synergize with ongoing efforts to advance the field of human proteomics. This executive summary and the following full report describe the main discussions and outcomes of the workshop. Executive summary human proteome, including the antibody-based Human A National Institutes of Health (NIH) workshop was Protein Atlas, the NIH Common Fund Protein Capture convened in Bethesda, MD on September 26–27, 2011, Reagents, the mass spectrometry-based Peptide Atlas with representative scientific leaders in the field of pro- and Selected Reaction Monitoring (SRM) Atlas, and the teomics and its applications to clinical settings. The main Human Proteome Project organized by the Human purpose of this workshop was to articulate ways in which Proteome Organization. Several leading laboratories the biomedical research community can capitalize on re- have demonstrated that about 10,000 protein products, cent technology advances and synergize with ongoing of the about 20,000 protein-coding human genes, can efforts to advance the field of human proteomics.
    [Show full text]
  • Roles of the Mathematical Sciences in Bioinformatics Education
    Roles of the Mathematical Sciences in Bioinformatics Education Curtis Huttenhower 09-09-14 Harvard School of Public Health Department of Biostatistics U. Oregon Who am I, and why am I here? • Assoc. Prof. of Computational Biology in Biostatistics dept. at Harvard Sch. of Public Health – Ph.D. in Computer Science – B.S. in CS, Math, and Chemistry • My teaching focuses on graduate level: – Survey courses in bioinformatics for biologists – Applied training in systems for research computing – Reproducibility and best practices for “computational experiments” 2 HSPH BIO508: Genomic Data Manipulation http://huttenhower.sph.harvard.edu/bio508 3 Teaching quantitative methods to biologists: challenges • Biologists aren’t interested in stats as taught to biostatisticians – And they’re right! – Biostatisticians don’t usually like biology, either – Two ramifications: excitement gap and pedagogy • Differences in research culture matter – Hands-on vs. coursework – Lab groups – Rotations – Length of Ph.D. –… • There are few/no textbooks that are comprehensive, recent, and audience-appropriate – Choose any two of three 4 Teaching quantitative methods to biologists: (some of the) solutions • Teach intuitions – Biologists will never need to prove anything about a t-distribution – They will benefit minimally from looking up when to use a t-test – They will benefit a lot from seeing when a t-test lies (or doesn’t) • Teach applications – Problems/projects must be interactive – IPython Notebook / RStudio / etc. are invaluable • Teach writing • Demonstrate
    [Show full text]
  • Expert Panel Report of the Cancer Systems Biology Consortium Program Evaluation
    MEMORANDUM November 11, 2020 To: Hannah Dueck National Cancer Institute (NCI) From: Carly S. Cox Science and Technology Policy Institute (STPI) Through: Bill Brykczynski Acting Director, STPI CC: Amana Abdurrezak and Brian Zuckerman STPI Subject: Expert Panel Report of the Cancer Systems Biology Consortium Program Evaluation The National Cancer Institute (NCI) Division of Cancer Biology (DCB) asked the Science and Technology Policy Institute (STPI) to facilitate an expert panel to evaluate the Cancer Systems Biolgoy Consortium (CSBC) ahead of its 2020 renewal request. To fulfill this request, STPI selected panelists, issued invitations, convened the panel for meetings to discuss program evaluation materials, facilitated a meeting between CSBC investigators and panelists, and supported the panel in its deliberations. Based on the NCI provided program information, conversations with CSBC resources, and information gathered at the CSBC Annual Meeting, the panel formulated answers to the five program evaluation questions charged to them by NCI. STPI summarized the panel’s answers to these questions in a draft document and completed a round of revisions with the panel and NCI. A finalized summary of the panel’s response to the program evaluation questions is provided in the attached document. ReleasedAttachment: “Expert Panel Report of the CancerMarch Systems Biology Consortium 2021 Program Evaluation” 1 Expert Panel Report of the Cancer Systems Biology Consortium Program Evaluation Edison Liu, Chair The Jackson Laboratories Kenneth Buetow Arizona State University Teresa Przytycka NIH National Center for Biotechnology Information John Quackenbush Harvard T.H. Chan School of Public Health Cynthia Reinhart-King Vanderbilt University October 2020 This document is under review and is subject to modification or withdrawal.
    [Show full text]