Course Logistics Module Overview Todays Menu

Barry Grant, Ph.D. INTRODUCTION TO [email protected] BIOINFORMATICS > Ryan Mills, Ph.D. Please take the initial BIOINF525 questionnaire: [email protected] < http://tinyurl.com/bioinf525-questions Barry Grant University of Michigan www.thegrantlab.org Hongyang Li (GSI) [email protected] BIOINF 525 http://bioboot.github.io/bioinf525_w16/ 12-Jan-2016 COURSE LOGISTICS MODULE OVERVIEW Lectures: Tuesdays 2:30-4:00 PM Objective: Provide an introduction to the practice of Rm. 2062 Palmer Commons bioinformatics as well as a practical guide to using common bioinformatics databases and algorithms Labs: Session I: Thursdays 2:30 - 4:00 PM Session II: Fridays 10:30 - 12:00 PM Rm. 2036 Palmer Commons 1.1. ‣ Introduction to Bioinformatics 1.2. ‣ Sequence Alignment and Database Searching Website: http://tinyurl.com/bioinf525-w16 1.3 ‣ Structural Bioinformatics Lecture, lab and background reading material plus homework and course announcements 1.4 ‣ Genome Informatics: High Throughput Sequencing Applications and Analytical Methods TODAYS MENU HOMEWORK Overview of bioinformatics • The what, why and how of bioinformatics? Complete the initial course questionnaire: • Major bioinformatics research areas. http://tinyurl.com/bioinf525-questions • Skepticism and common problems with bioinformatics. Bioinformatics databases and associated tools Check out the “Background Reading” material on Ctools: • Primary, secondary and composite databases. http://tinyurl.com/bioinf525-w16 • Nucleotide sequence databases (GenBank & RefSeq). Complete the lecture 1.1 homework questions: • Protein sequence database (UniProt). http://tinyurl.com/bioinf525-quiz1 • Composite databases (PFAM & OMIM). Database usage vignette • Searching with ENTREZ and BLAST. • Reference slides and handout on major databases. Q. What is Bioinformatics? Q. What is Bioinformatics? “Bioinformatics is the application of computers to the collection, archiving, organization, and analysis of biological data.” [After Orengo, 2003] … Bioinformatics is a hybrid of biology and computer science … Bioinformatics is computer aided biology! Computer based management and analysis of biological and biomedical data with useful applications in many disciplines, particularly genomics, proteomics, metabolomics, etc... MORE DEFINITIONS MORE DEFINITIONS ‣ “Bioinformatics is conceptualizing biology in terms of ‣ “Bioinformatics is conceptualizing biology in terms of macromolecules and then applying "informatics" techniques macromolecules and then applying "informatics" techniques (derived from disciplines such as applied maths, computer (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. information associated with these molecules, on a large-scale. Luscombe NM, et al. Methods Inf Med. 2001;40:346. Luscombe NM, et al. Methods Inf Med. 2001;40:346. ‣ “Bioinformatics is research, development, or application of ‣ “Bioinformatics is research, development, or application of computational approaches for expanding the use of computational approaches for expanding the use of biological, medical, behavioral or health data, including those biological, medical, behavioral or health data, including those Key Point: Bioinformatics is Computer Aided Biology to acquire, store, organize and analyze such data.” to acquire, store, organize and analyze such data.” National Institutes of Health (NIH) ( http://tinyurl.com/l3gxr6b ) National Institutes of Health (NIH) ( http://tinyurl.com/l3gxr6b ) Major types of Bioinformatics Data Major types of Bioinformatics Data Literature and ontologies Literature and ontologies Gene expression Gene expression Genomes Genomes Protein sequence Protein sequence DNA & RNA sequence DNA & RNA sequence Protein structure Protein structure DNA & RNA structure DNA & RNA structure Integrate sequence, 3D structure, expression Chemical entities Goal: Chemical entities Protein families, Protein families, . motifs and domains patterns,motifs interaction and domains and function of biomolecules to gain a deeper understanding of biological mechanisms, process and systems Protein interactions Protein interactions Pathways Pathways Systems Systems Major types of Bioinformatics Data BIOINFORMATICS RESEARCH AREAS Literature and ontologies Gene expression Include but are not limited to: Genomes • Organization, classification, dissemination and analysis of Protein sequence biological and biomedical data (particularly ‘-omics' data). DNA & RNA sequence • Biological sequence analysis and phylogenetics. Protein structure • Genome organization and evolution. DNA & RNA structure • Regulation of gene expression and epigenetics. Chemical entities BioinformaticsProtein families, aims to bridge the gap between • Biological pathways and networks in healthy & disease states. Goal: motifs and domainsdata and knowledge. • Protein structure prediction from sequence. • Modeling and prediction of the biophysical properties of Protein interactions biomolecules for binding prediction and drug design. • Design of biomolecular structure and function. Pathways With applications to Biology, Medicine, Agriculture and Industry Systems Where did bioinformatics come from? Why do we need Bioinformatics? Bioinformatics arose as molecular biology began to be transformed Bioinformatics is necessitated by the rapidly expanding by the emergence of molecular sequence and structural data quantities and complexity of biomolecular data Recap: The key dogmas of molecular biology • Bioinformatics provides • DNA sequence determines protein sequence. methods for the efficient: ‣ storage • Protein sequence determines protein structure. ‣ annotation • Protein structure determines protein function. ‣ search and retrieval • Regulatory mechanisms (e.g. gene expression) determine the ‣ data integration amount of a particular function in space and time. ‣ data mining and analysis Bioinformatics is now essential for the archiving, organization Bioinformatics is essential for the archiving, organization and and analysis of data related to these processes. analysis of data from sequencing, structural genomics, microarrays, proteomics and new high throughput assays. Why do we need Bioinformatics? How do we do Bioinformatics? Bioinformatics is necessitated by the rapidly expanding • A “bioinformatics approach” involves the quantities and complexity of biomolecular data application of computer algorithms, computer Growth in solved 3D structures • Bioinformatics provides models and computer databases with the broad methods for the efficient: goal of understanding the action of both individual ‣ storage genes, transcripts, proteins and large collections ‣ annotation of these entities. ‣ search and retrieval ‣ data integration ‣ data mining and analysis DNA RNA Protein Genome Transcriptome Proteome Bioinformatics is essential for the archiving, organization and analysis of data from sequencing, structural genomics, microarrays, proteomics and new high throughput assays. How do we actually do Bioinformatics? SIDE-NOTE: SUPERCOMPUTERS AND GPUS Pre-packaged tools and databases ‣ Many online ‣ New tools and time consuming methods frequently require downloading ‣ Most are free to use Tool development ‣ Mostly on a UNIX environment ‣ Knowledge of programing languages frequently required (Python, Perl, R, C Java, Fortran) ‣ May require specialized or high performance computing resources… SIDE-NOTE: SUPERCOMPUTERS AND GPUS Put Levit’s Slide here on Computer Power Increases! To–Johnny Do! Appleseed Skepticism & Bioinformatics Common problems with Bioinformatics We have to approach computational results the Confusing multitude of tools available same way we do wet-lab results: ‣ Each with many options and settable parameters • Do they make sense? Most tools and databases are written by and for nerds • Is it what we expected? ‣ Same is true of documentation - if any exists! • Do we have adequate controls, and how did they come out? Most are developed independently • Modeling is modeling, but biology is different... Notable exceptions are found at the: • EBI (European Bioinformatics Institute) and What does this model actually contribute? • NCBI (National Center for Biotechnology Information) • Avoid the miss-use of ‘black boxes’ Key Online Bioinformatics Resources: NCBI & EBI The NCBI and EBI are invaluable, publicly available resources for biomedical research Even Blast has many settable parameters > Related tools with different terminology Please take the initial BIOINF525 questionnaire: < tinyurl.com/bioinf525-questions http://www.ncbi.nlm.nih.gov https://www.ebi.ac.uk 30 Key Online Bioinformatics National Center for Biotechnology Resources: NCBI & EBI Information (NCBI) The NCBI and EBI are invaluable, publicly • Created in 1988 as a part of the National Library of available resources for biomedical research Medicine (NLM) at the National Institutes of Health • NCBI’s mission includes: ‣ Establish public databases ‣ Develop software tools ‣ Education on and dissemination of biomedical Bethesda,MD information • We will cover a number of core NCBI databases and http://www.ncbi.nlm.nih.gov https://www.ebi.ac.uk software tools in the lecture http://www.ncbi.nlm.nih.gov http://www.ncbi.nlm.nih.gov http://www.ncbi.nlm.nih.gov Key Online Bioinformatics Resources: NCBI & EBI The NCBI and EBI are invaluable, publicly available resources for biomedical research Notable NCBI databases include: GenBank,

Course Logistics Module Overview Todays Menu

Tyrosine Kinase Oncogenes in Normal Hematopoiesis and Hematological Disease

Anti-GRAP Antibody (ARG63124)

Biological Databases What Is a Database ?

Chromosomal Instability and Copy Number Alterations in Barrett's Esophagus and Esophageal Adenocarcinoma Thomas G

NLRP3 and ASC Differentially Affect the Lung Transcriptome During Pneumococcal Pneumonia

GRAP (NM 006613) Human Tagged ORF Clone Product Data

GRAP Rabbit Pab

Regulation of Ras Exchange Factors and Cellular Localization of Ras Activation by Lipid Messengers Int Cells

And Y-Linked Ampliconic Genes in Human Populations

Gads Is a Novel SH2 and SH3 Domain-Containing Adaptor Protein That Binds to Tyrosine-Phosphorylated Shc

The Clinical Aspect of Adaptor Molecules in T Cell Signaling: Lessons Learnt from Inborn Errors of Immunity

The Breakpoint Region of the Most Common Isochromosome, I(17Q), In