[BIOINFORMATIC DATA ANALYSIS] Reader for the Bioinformatics Part of the Systems Biology Course
Total Page:16
File Type:pdf, Size:1020Kb
2020 Theoretical Biology and Bioinformatics Utrecht University Bas E. Dutilh & Can Keșmir [BIOINFORMATIC DATA ANALYSIS] Reader for the Bioinformatics part of the Systems Biology course. A PDF of this reader can be downloaded for free and in full color at: http://tbb.bio.uu.nl/BDA You can also find the errata through the link above. Course planning 3feb 13:15-15:00 MyTimeTable DWO modules 1 & 2 (http://bit.ly/BDA-DWO) Self-study: R modules 1, 2 & 3 (bacK of reader) 15:15-17:00 Ruppert Blauw Welcome & Lecture: Chapter 1 4feb 9:00-10:45 Educ Megaron Lecture: Chapter 2 11:00-12:45 Self-study: R modules 4, 5 & 6 (back of reader) 13:15-15:00 Ruppert Wit Lecture: Chapter 3 15:15-17:00 Self-study: Read, videos & exercises of Chapters 1 & 2 6feb 9:00-10:45 Self-study: R modules 7 & 8 (back of reader) 11:00-12:45 MyTimeTable Exercises Chapter 3 13:00-17:00 Self-study: Read & watch videos of Chapters 3 & 4 10feb 13:15-15:00 MyTimeTable DWO modules 3 & 4 (http://bit.ly/BDA-DWO) Self-study: R module 9 (back of reader) 15:15-17:00 Ruppert Blauw Q&A: R with Can Keșmir 11feb 9:00-10:45 Educ Megaron Lecture: Chapter 4 11:00-12:45 MyTimeTable Exercises Chapter 4 13:00-17:00 Self-study: R module 10; Read & watch videos of Chapter 5 13feb 9:00-10:45 Educ Megaron Lecture: Chapter 5 11:00-12:45 MyTimeTable Exercises Chapter 5 13:00-17:00 Self-study: Read & watch videos of Chapters 6 & 7 17feb 13:15-15:00 MyTimeTable DWO modules 5 & 6 (http://bit.ly/BDA-DWO) 15:15-17:00 Ruppert Blauw Lecture: Chapter 6 18feb 9:00-10:45 Educ Megaron Lecture: Chapter 7 11:00-12:45 MyTimeTable Exercises Chapters 6 & 7 13:00-17:00 Self-study: Study for exam, email questions for Q&A 20feb 9:00-10:45 Educ Megaron Lecture: Chapter 7 continued 11:00-12:45 MyTimeTable Exercises Chapters 6 & 7 13:00-17:00 Self-study: Read & watch videos Chapters of 8 & 9 24feb 13:15-15:00 MyTimeTable Exercises until Chapter 7 15:15-17:00 Ruppert Blauw Lecture: Chapter 8 25feb 9:00-10:45 Educ Megaron Lecture: Chapter 9 11:00-12:45 MyTimeTable Exercises Chapters 8 & 9 13:00-17:00 Self-study: Study for exam, email questions for Q&A 27feb 9:00-10:45 Educ Megaron Lecture: Chapter 9 continued 11:00-12:45 MyTimeTable Exercises Chapters 8 & 9 13:00-17:00 Self-study: Study for exam, email questions for Q&A 2mrt 13:15-15:00 MyTimeTable Q&A: With teaching assistants (exercises) 15:15-17:00 Ruppert Blauw Q&A: With Bas Dutilh (email questions by 1 March) 3mrt 13:30-16:00 Educ Beta EXAM 2 Contents 1. Computers in biology: systems biology and bioinformatics ............................................................. 6 1.1. The Bioinformatic Data Analysis course ...................................................................................... 6 1.2. Biology as a data science .............................................................................................................. 7 1.3. Reproducibility ............................................................................................................................. 7 1.4. DNA sequencing ........................................................................................................................... 7 1.5. Types of computational biologists ................................................................................................ 9 1.5.1. NoriKo Cassman, PhD student, Microbial Ecology, Netherlands Institute of Ecology .......... 9 1.5.2. Jayne Hehir-Kwa, assistant professor, Translational Genomics, Radboudumc .................... 10 1.5.3. Mattias de Hollander, bioinformatician, Netherlands Institute of Ecology ........................... 10 1.5.4. Robin van der Lee, PhD student, Comparative and Integrative Genomics, Radboudumc .... 11 1.5.5. Walter Pirovano, head, Bioinformatics Department, BaseClear ........................................... 11 1.5.6. Linsey RaaijmaKers, PhD student, Mass Spectrometry and Proteomics, UU ....................... 12 1.5.7. Daan Speth, PhD student, Microbiology, Radboud University ............................................. 12 1.5.8. Wouter Touw, PhD student, Centre for Molecular and Biomolecular Informatics .............. 13 1.5.9. Marcel van VerK, Senior scientist, Bioinformatics / Plant-Microbe Interactions, UU .......... 13 1.5.10. RensKe Vroomans, PhD student, Computational Developmental Biology, UU ................... 14 1.5.11. Juline Walter, PhD student, Marine Microbiology, Federal University of Rio de Janeiro ... 14 1.6. Reading and videos ..................................................................................................................... 14 1.7. Questions and exercises .............................................................................................................. 15 2. Talking to computers ......................................................................................................................... 17 2.1. Algorithms and variables ............................................................................................................ 17 2.2. Computer languages ................................................................................................................... 18 2.2.1. Simple questions with logical answers: TRUE or FALSE? .................................................. 19 2.2.2. Functions ............................................................................................................................... 20 2.3. Data types ................................................................................................................................... 22 2.3.1. Numbers ................................................................................................................................ 22 2.3.2. Text ........................................................................................................................................ 22 2.3.3. Variables that can contain multiple values ............................................................................ 23 2.4. Examples ..................................................................................................................................... 23 2.4.1. Minimum ............................................................................................................................... 23 2.4.2. Euclid’s algorithm ................................................................................................................. 24 2.5. R-specific functions .................................................................................................................... 25 2.5.1. Help ....................................................................................................................................... 25 2.5.2. Reading a table ...................................................................................................................... 25 2.5.3. Plotting data ........................................................................................................................... 26 2.6. File formats ................................................................................................................................. 27 2.7. Computer power: CPU and RAM ............................................................................................... 29 2.8. Learning more about scripting .................................................................................................... 30 2.9. Databases: data and metadata ..................................................................................................... 30 2.10. Reading and videos ..................................................................................................................... 34 2.11. Questions and exercises .............................................................................................................. 34 3. Dimensions of data ............................................................................................................................. 36 3.1. Second generation DNA sequencing as a profiling technology ................................................. 36 3.2. Visualization of data ................................................................................................................... 37 3.3. Similarity between relative abundance profiles .......................................................................... 39 3.4. Distance measures ...................................................................................................................... 40 3.5. Correlation .................................................................................................................................. 41 3.6. Clustering algorithms .................................................................................................................. 42 3.7. NewicK tree format ..................................................................................................................... 43 3.8. Reading and videos ..................................................................................................................... 45 3 3.9. Exercises ..................................................................................................................................... 45 4. Phylogenetic trees and genetic relatedness ...................................................................................... 50 4.1. Phylogenetic trees ......................................................................................................................