2016 Winter School in Mathematical & Computational Biology

4-8 July 2016

Auditorium Queensland Bioscience Precinct The University of Queensland Brisbane, Australia

Program

Hosted by:

IMB

2016 Winter School in Mathematical and Computational Biology 4-­‐8 July 2016 http://bioinformatics.org.au/ws16

Queensland Bioscience Precinct (Building #80) The University of Queensland Brisbane, Australia

MONDAY 4 JULY 2016

08:00 Registration desk open

NEXT GENERATION SEQUENCING & 09:00 – 09:05 Welcome and introduction Dr Nicholas Hamilton Research Computing Centre and Institute for Molecular Bioscience The University of Queensland 09:05 – 09:45 Next-­‐generation sequencing overview (Game of Thrones Edition) Dr Ken McGrath Australian Genome Research Facility Ltd, Brisbane 09:45 – 10:30 NGS mapping, errors and quality control Dr Felicity Newell Queensland University of Technology, Brisbane

10:30 – 11:00 Morning Tea

11:00 – 11:45 Mutation detection in -­‐ whole genome sequencing Dr Ann-­‐Marie Patch QIMR Berghofer Medical Research , Institute Brisbane 11:45 – 12:30 De novo genome assembly A/Professor Torsten Seemann Victorian Life Sciences Computation , Initiative The University of

12:30 – 13:30 Lunch

13:30 – 14:30 Long-­‐read sequencing: an overview of technologies and applications Dr Mathieu Bourgey Montréal Node, McGill University and Genome Québec Innovation Centre, Canada 14:30 – 15:15 resources -­‐ feeding your inner bioinformatician A/Professor Mik Black University of Otago, Dunedin, New Zealand

15:15 – 15:45 Afternoon Tea

15:45 – 16:30 Defensive NGS informatics -­‐ what can go wrong and how do you know when to throw in the towel? Mr John Pearson QIMR Berghofer Medical Research , Institute Brisbane 16:30 – 17:15 The current and upcoming challenges and opportunities in bioinformatics Dr Annette McGrath DATA61 | CSIRO, Canberra 17:15 – 17:30 Resource talk: what the Australian Bioinformatics and Computational Biology Society can do for you Professor David Lovell, Queensland University of Technology, Brisbane

17:45 Social BBQ Venue: Auditorium foyer

i

TUESDAY 5 JULY 2016 NEXT GENERATION SEQUENCING & BIOINFORMATICS 09:00 – 09:45 Analysing RNA-­‐seq data: differential expression and beyond Dr Alicia Oshlack Murdoch Childrens Research Institute, Melbourne 09:45 – 10:30 MicroRNAs -­‐ sequencing, analysis ... and then what? Dr Pamela Mukhopadhyay QIMR Berghofer Medical Research , Institute Brisbane

10:30 – 11:00 Morning Tea

TUESDAY 5 JULY 2016 BIOINFORMATICS METHODS, MODELS AND APPLICATIONS TO DISEASE 11:00 – 11:45 Evolution teaches protein prediction Professor Burkhard Rost Technische Universität München (TUM), Munich, Germany

11:45 – 12:45 Lunch

12:45 – 13:30 Personalised health: harnessing the power of diversity Professor Burkhard Rost Technische Universität München (TUM), Munich, Germany 13:30 -­‐ 14:15 The predictive power of machine learning techniques in data-­‐driven biomedical knowledge discovery Dr Jiangning , Song Monash University 14:15 – 15:00 Personalised medicine: d iscriminating disease-­‐causing from n eutral g enetic v ariations Professor Yaoqi Zhou Griffith University, Brisbane

15:00 – 15:30 Afternoon Tea

15:30 – 16:15 Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-­‐seq) data Ms Alexandra Essebier, The University of Queensland 16:15 – 17:00 VariantSpark: applying -­‐ Spark based machine learning methods to genomic information Dr Denis Bauer CSIRO Health Program, Sydney 17:00 – 17:15 Resource talk: h ow QCIF enables research Ms Belinda Weaver e-­‐Research Analyst Team , Manager Queensland Cyber Infrastructure Foundation, Brisbane

ii

WEDNESDAY 6 JULY 2016 BIOINFORMATICS METHODS, MODELS AND APPLICATIONS TO DISEASE 09:00 – 09:45 The role of common genetic variations in complex diseases and pharmacogenomics studies Dr Siew-­‐Kee Amanda Low The University of Sydney 09:45 – 10:30 I've got my list of differentially expressed genes, now what? Dr Mirana Ramialison Australian Regenerative Medicine Institute, Monash University, Melbourne

10:30 – 11:00 Morning Tea

11:00 – 12:00 Bioinformatics software testing and quality assurance Dr Joshua W.K. Ho Victor Chang Cardiac Research , Institute Sydney 12:00 – 12:30 Panel discussion Chair: Professor Mark Ragan Institute for Molecular Bioscience, The University of Queensland

*** FREE WEDNESDAY AFTERNOON ***

SPECIAL ACTIVITIES IN THE AFTERNOON

12:30 – 13:00 IMB tour (1) – Limited to 50 attendees only Meeting point: Auditorium foyer

13:00 – 13:30 IMB tour (2) -­‐ If more requests are received and is also limited to 50 attendees only Meeting point: Auditorium foyer

14:00 – 17:00 Special Wednesday Afternoon Workshop An introduction to Galaxy with the NeCTAR Genomics Virtual Laboratory Dr Igor Makunin Research Computing Centre, The of University Queensland

Venue: Multi Media Room (Room 3.141, access through the auditorium foyer) (This workshop is limited to 36 attendees only and is intended for bench scientists, and no previous informatics experience is needed.)

What is required before attending the workshop? Remember to download Galaxy Workshop Information Sheet from the 2016 Winter School web site.

http://bioinformatics.org.au/ws16/program/

iii

THURSDAY 7 JULY 2016 ADVANCED BIO-­‐DATA VISUALISATION 09:00 – 10:30 Data sualisation vi in bioinformatics: exploring the ‘dark’ proteome Dr Seán O’Donoghue CSIRO, Sydney

10:30 – 11:00 Morning Tea

11:00 – 12:00 Experimentation at the interface of art and science: narrative, cognitive embodiment and alternative visual language Dr Kate Patterson , Garvan Institute of Medical Research, Sydney

12:00 – 13:00 Lunch

13:00 – 14:00 Network and data visualisation and analysis in Cytoscape Dr Melissa Davis Walter and Eliza Hall Institute for Medical Research, Melbourne 14:00 – 15:00 Big data visual analytics Professor Seok-­‐Hee Hong University of Sydney

15:00 – 15:30 Afternoon Tea

15:30 – 17:00 Creating data visualisations that won’t be forgotten using the R programming language Dr Chris Brown Australian Rivers Institute, Griffith University, Brisbane 17:00 – 17:15 Resource talk: w hat h t e COMBINE network does for bioinformatics ECRs in and computational b iology Ms Leah Roberts Vice President, COMBINE (Computational Biology d an Bioinformatics Student Group) School of Chemistry Molecular and Biosciences, The University of Queensland

iv

FRIDAY 8 JULY 2016 ECOGENOMICS 09:00 – 09:05 Welcome & introduction Professor Gene Tyson Australian Centre for Ecogenomics, The University of Queensland 09:05 – 09:45 The extraordinary evolution of the great ape microbiome Professor Howard Ochman University of Texas, Austin, USA 09:45 – 10:30 Tools and methods for microbial ecological genomics Dr David Wood Australian Centre for Ecogenomics, The University of Queensland

10:30 – 11:00 Morning Tea

11:00 – 11:45 Illuminating microbial dark matter -­‐ via single cell genomics Dr Christian Rinke Australian Centre for Ecogenomics, The University of Queensland 12:00 – 13:00 IMB Friday Noon Seminar in conjunction with Winter School Towards 4 dimensional (eco) systems biology in the sea Professor Edward F. DeLong University of Hawaii, Honolulu, USA

13:00 – 13:45 Lunch

13:45 – 14:30 Genomes from metagenomes: recovery and analysis of population genomes Dr Kate Ormerod Australian Centre for Ecogenomics, The University of Queensland 14:30 – 15:15 Community diversity in metagenomes: one, many and thousands Dr Ben Woodcroft Australian Centre for Ecogenomics, The University of Queensland 15:15 – 16:00 Comparing the variome and pan-­‐genome of bacterial isolates A/Professor Torsten Seemann Victorian Life Sciences Computation , Initiative The University of Melbourne

16:00 Winter School wrap-­‐up and refreshment with IMB/ECRs

~*~*~*~*~

v

BIOGRAPHY AND ABSTRACT

Dr Ken McGrath Brisbane Node Manager Australian Genome Research Facility Ltd (AGRF) Brisbane

Biography: Ken McGrath is the node manager of the Brisbane Lab of the Australian Genome Research Facility, based at the UQ St Lucia campus. Ken has worked with plant and microbial genetics and transcriptomics, completed his PhD in biochemistry and molecular pathology at UQ in 2005. Following this, his postdoctoral research involved examining the of mixed microbial communities in industrial and agricultural settings. In 2009, Ken joined the AGRF, and currently manages a range of lab processes and sequencing projects including next-­‐generation sequencing platforms. Ken is also a founding member of the “eXtreme Microbiome Project” -­‐ an international collaboration studying microbial communities of the extreme environments around our planet.

Date: Monday 4 July 2016

Presentation title: Next-­‐generation s equencing o verview (Game of Thrones Edition)

Abstract: The “Next-­‐Generation Sequencing” landscape is one of constant change, with new and emerging technologies always competing with established platforms -­‐ much like the different characters and families from the “Game of Thrones” universe. Using this analogy, Ken talks about the sequencing technologies that have had their day on the throne, and looks in detail at the current rulers – and who is best positioned to usurp them. In doing so, Ken will explain how the echnologies sequencing t work and give examples of projects that can be run on them, as well as hint at what’s -­‐ “next” in Next Gen.

1 BIOGRAPHY AND ABSTRACT

Dr Felicity Newell Research Fellow in Computational Biology Queensland University of Technology Brisbane

Biography: Felicity Newell originally trained in the fields of molecular and cellular biology, and received her PhD from The University of Queensland in 2007. Following this, she completed a Master of Information Technology at the Queensland University of Technology. She has worked as a bioinformatics programmer, developing biological web applications at QFAB and software for the analysis of cancer sequencing data at the Queensland Centre for Medical Genomics at UQ. Since then, t she has conducted pos doctoral research at The University of Queensland Diamantina Institute, and this year she joined QUT as a Research Fellow in Computational Biology. Her current interests involve using next-­‐generation sequencing data to investigate the genetics of autoimmune diseases and cancer.

Date: Monday July 4 2016

Presentation title: NGS mapping, errors and quality control

Abstract: The first step that is often required to analyse next generation sequencing data is to align the reads that are generated to a reference genome. Current sequencing platforms can generate high volumes of raw read data. Such reads are usually short in length and may contain sequencing errors. Therefore tools that perform mapping need to be able to efficiently identify the location of a read within the reference genome while accounting for real sequence variations as well as technical artefacts. In this presentation I will describe some of the approaches to sequence alignment, highlighting some of the popular tools that are in use. A good understanding of the common errors and biases that can occur with mapping is necessary in order to obtain high quality data from downstream analyses such as variant detection. I will also discuss some of these errors and outline some quality controls steps that can be performed.

2 BIOGRAPHY AND ABSTRACT

Dr Ann-­‐Marie Patch Senior Research Officer Medical Genomics Group QIMR Berghofer Medical Research Institute Brisbane

Biography: Ann-­‐Marie is currently a Senior Research Officer within the Medical Genomics group led by Dr Nicola Waddell, at the QIMR Berghofer Medical Research Institute. Her current research focuses on cancer genomics working with large collaborative groups to identify the molecular basis of melanoma and mesothelioma. The detection and advancement in the understanding of the consequences of structural variants in cancer, linking to understanding the mechanisms of DNA repair are of particular interest to her. With a PhD, gained in 2006 from the University of Exeter UK, that combined bioinformatics and laboratory approaches to study fission and budding yeast genetics she joined the intertwined research and . diagnostic teams of Prof Andrew Hattersley and Prof. Sian Ellard at the Peninsula College of Medicine & Dentistry and Royal Devon and Exeter Molecular Genetics Laboratory using next-­‐generation sequencing to identify monogenic causes of neonatal diabetes and causal mutations for a broad spectrum of genetic disorders. Cancer genomics has been her focus for the last five years leading the analysis of the ovarian cancer data as part of the Australian ICGC team led by Prof. Sean Grimmond that she continues in her current role.

Date: Monday 4 July 2016

Presentation title: Mutation detection in -­‐ whole genome sequencing

Abstract: Through landmark studies carried out as part of the Australian International Cancer Genome Consortium projects studying the molecular basis of pancreatic, ovarian and now melanoma tumours the development of robust mutation detection methods has been key. Initially at the Queensland cs, Centre for Medical Genomi at IMB and now at the QIMR Berghofer Medical Research Institute an expert team of researchers and informatics specialists have set up a high performing framework to enable the analysis of whole human genomes for the presence of DNA, RNA and epigenetic variants that are associated with the hallmarks of cancer. This talk will describe and discuss the principles and challenges of identifying the full range of mutation types including single nucleotide variants, indels up to large structural variants (SVs) using whole genome sequencing. I will present the bases of mutation detection for ICGC projects with examples of how mechanisms driving tumorigenesis may be identified.

3 BIOGRAPHY AND ABSTRACT

A/Professor Torsten Seemann Lead Bioinformatician Victorian Life Sciences Computation Initiative and Microbiological Diagnostics Unit Public Health Laboratory The University of Melbourne

Biography: A/Prof. Torsten Seemann is lead bioinformatician at the Victorian Life Sciences Computation Initiative and the Microbiological Diagnostics Unit Public Health Laboratory, both at the University of Melbourne. His work uses bioinformatics and genomics to rstand better unde the spread and evolution of bacterial pathogens and antimicrobial resistance. He is best known for his software tools which are used internationally, and he is a strong supporter of open science.

Date: Monday 4 July 2016

Presentation title: De novo genome assembly

Abstract: How do we generate the genome sequence of our favourite organism? In this talk I will introduce the problem of de novo genome assembly; describe the strategies and caveats of the way the problem is tackled; and outline ways to assess the results. The related problems of and metagenome assembly, and how the latest technologies are de transforming novo assembly, will also be touched upon.

4 BIOGRAPHY AND ABSTRACT

Dr Mathieu Bourgey Bioinformatics Manager Canadian Centre Computational for Genomics (C3G) Montréal Node, McGill University and Genome Québec Innovation Centre Montréal, Canada

Biography: Mathieu Bourgey is the manager of the Research and é Development team at the Montr al node of the Canadian Centre for Computational Genomics. He completed his Master degree with honours in 2003 at the Université Pierre et Marie Curie -­‐ Paris VI (France) working on developing evolution model of genomics large repeats. He transitioned to Université Paris-­‐Sud XI (France) for his PhD work on modeling risk of developing the coeliac disease based on genetic and familial information. Following this, his postdoctoral research at Université de Montréal focused on modeling -­‐ gene gene interactions and foeto-­‐maternal interactions in the susceptibility to childhood leukemia. After completing his postdoctoral studies in 2010, he participated at the development of the bioinformatics side of a large next generation sequencing project of Acute Lymphoblastic Leukemia samples. In 2011, Mathieu joined the bioinformatics platform of the McGill University and Genome Québec Innovation Centre (MUGQIC) as senior analyst where he was involved in the analysis of a wide range of genomics projects from bacteria s to human using all the various type of sequencing technology available (Illumina, 454, Life Technology B and Pac io). In 2014, he became team leader of the data production and service at the MUGQIC bioinformatics platform and in 2015 he started managing the bioinformatics research and development group. He s manage software and analysis pipeline development on a wide range of next-­‐ generation sequencing platforms technology and takes part in national and international projects studying cancer genomics, genome assembly and transcriptomics. He is also involved in the s organi ation of international genomics workshops.

Date: Monday 4 July 2016

Presentation title: Long-­‐read sequencing: overview an of technologies and applications

Abstract: Next-­‐generation sequencing technologies offer vast improvements over traditional Sanger sequencing. However, these major sequencing technologies suffer from a main limitation, the short lengths of their reads. Short reads are poorly suited to study complex genomic regions or for nonreference-­‐based analysis. Long reads offer an alternative approach to overcome many of these limitations. With longer reads we can sequence through extended repetitive regions, detect base modifications, identify gene isoforms and assemble finished genomes. Pacific BioSciences, Oxford Nanopore and Illumina are the three major competitors which developed different long reads sequencing technologies. Each of their technologies has specific limitations that need to be taken into consideration while designing a long read sequencing project.

5 BIOGRAPHY AND ABSTRACT

Associate Professor Mik Black Department of Biochemistry University of Otago Dunedin, New Zealand

Biography: Mik received a BSc (Hons) in statistics from the University of Canterbury, and an MSc (mathematical statistics) and PhD (statistics) from Purdue University. After completing his PhD in 2002, Mik returned to New Zealand work as a lecturer in the Department of Statistics at the University of Auckland. An ongoing involvement in a number of Dunedin-­‐based collaborative genomics projects resulted in a move to the University of Otago in 2006. Mik's research focuses on the development and application of statistical methods for the analysis of data from genomics experiments, with a particular emphasis on human disease. Mik is also heavily involved in two major initiatives designed to put in place sustainable national research infrastructure for NZ: NZGL (New Zealand Genomics Ltd) for genomics (where he was the interim Bioinformatics Team Leader during -­‐ 2012 2013), and NeSI (New Zealand eScience Infrastructure) for computing/eResearch.

Date: Monday 4 July 2016

Presentation title: Genomics resources -­‐ feeding your inner bioinformatician

Abstract: In the current research environment, the ability to manage, analyse and interpret data produced by high-­‐ throughput sequencing platforms has become an essential skill for both wet-­‐ and dry-­‐lab researchers. While a number of options exist for outsourcing these tasks, the reality need is that researchers still (and desire) a level of analytic skill that allows them to perform basic exploratory analysis of their data, without having to rely on external assistance.

In this talk, I will discuss some of the initiatives that have been undertaken in New Zealand and Australia to provide both genomics and bioinformatics support for researchers, as well as highlighting some of the tools and skills that help to ensure the robustness and reproducibility of the analyses being carried out.

6 BIOGRAPHY AND ABSTRACT

Mr John Pearson Team Leader Genome Informatics QIMR Berghofer Medical Research Institute Brisbane

Biography: John Pearson has spent 25 years as a bioinformatician creating software for medical researchers and has worked at NIH, UQ, QIMR Berghofer and was a founding Faculty member at the Translational Genomics Research Institute (TGen) in Phoenix, Arizona. John has held software development grants from Microsoft, the American Cancer Society, and the National Institutes of Health and has participated in the 1000 Genomes Project and the International Cancer Genome Consortium.

Date: Monday 4 July 2016

Title: Defensive NGS informatics -­‐ what can go wrong and how do you know when to throw in the towel?

Abstract: Next-­‐generation sequencing has radically changed medical research by allowing deep interrogation of the DNA and RNA of pathogenic organisms, families with inherited -­‐ disorders and the de novo mutations responsible for tumourigenesis. As with any new technology, a "gold rush" mentality can arise where being first to the answer can push rigour and methodological soundness into the background. In this seminar, I'll talk from QCMG experience about some of the ways sequencing can go wrong, how the problems became apparent, what we did about them, and tools we to developed to try catch the same problems in future.

7 BIOGRAPHY AND ABSTRACT

Dr Annette McGrath Principal Research Scientist and Team Leader in Life Science I nformatics DATA61 | CSIRO Canberra

Annette McGrath graduated from the National University of Ireland with a PhD in molecular biology and from The University of Queensland with a graduate diploma in statistics. Following postdoctoral work in bioinformatics on multiple sequence alignment, she worked for three years as a staff scientist and team leader in a biotech company in Auckland, New Zealand. She then spent eight years as Head of Bioinformatics at the Australian Genome Research Facility, followed by Head of Bioinformatics at Queensland Facility for Advanced Bioinformatics in 2010. In 2011 she was recruited to establish ormatics and lead the CSIRO Bioinf Core, dedicated to enhancing capability in bioinformatics across CSIRO. She is a Principal Research Scientist and Team Leader in life science informatics in CSIRO Data61 with interests in the application of ‘omics technologies and big data h and wit a passion for bioinformatics education and training.

Date: Monday 4 July 2016

Presentation title: The current and upcoming challenges and opportunities in bioinformatics

Abstract: Molecular biology has become a data science, driven by advances in measurement and data acquisition technologies that allow very substantial amounts of data to be readily produced and aided by spectacular drops in the price of this data. The impact -­‐ of this shift to a data driven science can be seen across a broad range of applications – from human health, advanced manufacturing, agriculture and ecosystems. As molecular techniques improve, many practical and methodological challenges are presented by increased in data volumes, complexity and dimensionality of the data. Bioinformatics is facing challenges in managing, storing, processing, analysing and integrating different types of molecular biological information.

Nonetheless, there is a wealth of research opportunities emerging for bioinformaticians as effective analysis and interpretation of molecular bioscience data offers new ways to uncover hidden patterns in data and to build better predictive models.

This talk will present an overview of where genomics and bioinformatics are currently having an impact and will also take a look at some of the challenges and opportunities likely in coming years the field of bioinformatics.

8 BIOGRAPHY AND ABSTRACT

Dr Alicia Oshlack Head of Bioinformatics Murdoch Childrens Research Institute Melbourne

Biography: Dr Alicia Oshlack is a NHMRC Career Development Fellow and is the Head of Bioinformatics at the Murdoch Childrens Research Institute where she leads a team of ten students, postdocs and research assistants. Alicia completed a PhD in astrophysics and has been working in the field of bioinformatics for more than 12 years. She is best known for her body of work developing methods for the analysis of transcriptome data. She also works in epigenomics and clinical genomics. Alicia has also built an extensive collaborative network with many national and international research groups uncovering molecular mechanisms of development and disease using a variety of genomic approaches. Alicia is on the editorial board of Genome Biology, was awarded the Australian Academy of Science Gani Medal for human genetics in 2011 and the Lorne Genome Conference Millennium Science Award in 2015.

Date: Tuesday 5 July 2016

Presentation title: Analysing RNA-­‐seq data: differential expression and beyond

Abstract: In this talk I will give a basic overview of the statically and computational analysis for performing differential expression using RNA-­‐seq data. I will discuss the steps in a standard analysis for both well studied organisms, like humans, and more exotic organisms, without a sequenced genome.

9 BIOGRAPHY AND ABSTRACT

Dr Pamela Mukhopadhyay Bioinformatician QIMR Berghofer Medical Research Institute Brisbane

Biography: Dr Pamela Mukhopadhyay received her PhD in Bioinformatics in 2010 from University of Jadavpur, India. Following this, she was employed as a bioinformatician at Queensland Diamantina Institute working on various disease and cancer genomic and sequencing projects , with Dr Paul Leo and has gained good knowledge on large scale GWAS studies. In 2013 she conducted her postdoctoral erghofer research at QIMR B Medical Research Institute -­‐ on UV induced melanoma mouse models and to investigate the role of various genetic and epigenetic factors for X-­‐chromosome inactivation in the Smchd1 knock-­‐out mouse.

Pamela’s area of expertise involves -­‐ analysing next generation sequencing data and developing algorithms and methods for genomic research. She has extensive experience working on R programming language. She also provides training on various techniques involving -­‐ analysing next generation sequencing data. Her current bioinformatician role involves providing bioinformatics support to QIMR Berghofer medical researchers and also to understand potential genomic signatures of UV-­‐induced and spontaneous melanomas.

Date: Tuesday 5 July 2016

Presentation title: MicroRNAs -­‐ sequencing, analysis ... and then what?

Abstract: MicroRNAs (miRNAs) are an important class of non-­‐coding regulatory RNAs, which interfere with the translation of protein-­‐coding mRNA transcripts. By incorporation into the RNA induced silencing complex (RISC), miRNAs can inhibit translation, promote sequestration of mRNAs to P-­‐bodies, and/or destabilise and degrade target mRNAs. The small size of mature miRNAs (typically makes only 20 to 24 nucleotides) them ideal for characterisation using -­‐ short tag RNA-­‐sequencing (RNA-­‐seq) technologies as you can capture the entire molecule n i a single read. Unlike hybridisation approaches such as microarray profiling or Northern blotting, massive-­‐scale sequencing provides a way to discriminate discrete but closely related RNA molecules, and profile miRNAs without priori a knowledge of expression.

MicroRNAs perform their biological roles by binding to mRNAs through Watson-­‐Crick base-­‐pairing. The attractive simplicity of using nucleotide complementarity ify to ident mRNA targets has given rise to many bioinformatics tools. These are based (to differing extents) on complementarity to the seed, evolutionary conservation, and free energy of binding.

So with great technology and plenty of well researched and well respected bioinformatics tools, miRNAs should be easy, right? This talk will systematically crush this rosy view of miRNAs as a field of study, and lay before you the desolate wasteland to navigate on your path to publication. Those towards the end of their PhD study on miRNAs may wish to avoid this talk.

10 BIOGRAPHY AND ABSTRACT

Professor Burkhard Rost Chair of Bioinformatics Department of Computer Sciences (Informatik) Technische Universität München (TUM) Munich, Germany *Image courtesy of photographers Eckert and Heddergott

Biography: After studying physics, history and philosophy at the Universities of Giessen and Heidelberg, Burkhard Rost received his doctorate at the European Molecular Biology Laboratory (EMBL) in 1994. Following research stays at EMBL and the European Bioinformatics Institute in Cambridge (UK), as well as a brief period in industry at LION Bioscience in Heidelberg, he assumed a professorship at Columbia University (New York) in 1998. In 2009, he accepted an appointment to the Chair of Bioinformatics at TUM. He is a member of the New York Academy of Sciences and has been President of the International Society for Computational Biology since 2007. He has authored 200 scientific publications with a 81 Hirsch index of ().

Professor Rost conducts research on bioinformatics and computer-­‐aided biology, with a focus on predicting the functions and structures of proteins and genes. His particular interest is predicting protein interactions and the effects of changing individual amino acids, with the goal of fostering a better understanding of how proteins, genes and cells work. He also focuses on enabling earlier diagnosis and more effective treatment of illnesses. The specific niche of his research group links artificial intelligence and machine learning to evolution.

Date: Tuesday 5 July 2016

Presentation title 1: Evolution teaches protein prediction

Abstract: The objective of our group is to predict aspects of protein function and structure from sequence. The wealth of evolutionary information available through comparing the whole bio-­‐diversity of species makes such an ambitious goal achievable. Our particular niche is the combination of evolutionary information with machine learning. We develop the in silico prediction of protein interactions, including networks, sub-­‐cellular localisation, and functional classifications such as EC and GO In numbers from sequence. this talk I will focus on predictions of protein s locali ation and protein-­‐protein interactions and lessons learned from those predictions. I will also present the concept of the Dark Proteome and how protein disorder appears to play a unique role in evolution.

Presentation title 2: Personalised health: harnessing the power of diversity

Abstract: Our group introduced the combination of evolutionary information with machine learning. The wealth of evolutionary information available through the comparison -­‐ of the whole bio diversity of species helps us to develop methods that predict aspects n of protei structure and function from sequence. In this talk, I will focus on methods that predict the effect of single amino acid variants (SAVs/nsSNPs) upon molecular function and the organism. One application of such methods is to predict the effects of all possible SAVs. The resulting landscape of complete in silico mutagenesis (or deep mutational scanning) describes how a protein is susceptible to change is becoming an important feature to foster the understanding of the molecular details protein function. Predicting the effects of sequence variants has also become important for medical research and is an important challenge toward advancing s personali ed health. I will present some surprising results from such prediction methods for the analysis ulations. of large pop

11 BIOGRAPHY AND ABSTRACT

Dr Jiangning Song Senior Research Fellow School of Biomedical Sciences Monash University Melbourne

Biography: Jiangning Song is a Senior Research Fellow and group leader in the Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Monash University, Melbourne, Australia. His research focuses on bioinformatics, systems biology, machine learning, systems pharmacology and enzyme -­‐ engineering. He has co authored >80 publications, with >1300 citations. h His researc has won a number of academic awards, including the JSPS Postdoctoral Fellowship, CAS Hundred Talents Fellowship and Australian NHMRC Peter Doherty Biomedical Fellowship and Future Research Leadership at Monash University.

Date: Tuesday 5 July 2016

Presentation title: The p redictive p ower of m achine l earning t echniques in d ata-­‐driven b iomedical k nowledge d iscovery

Abstract: Structural bioinformatics is the branch of bioinformatics which is concerned with the analysis and prediction of the three-­‐dimensional structure of biological macromolecules on a genomic scale by developing computational methods. Recently, machine learning techniques based on statistical learning have provided efficient solutions to challenging problems that were previously considered difficult to address. In this talk, by combining my research experiences, I highlight some important recent developments in the prediction and analysis of functional residues or sites that are based on such methods. In particular, I focus on two tasks in structural bioinformatics, i.e. predicting protease-­‐specific substrate cleavage sites and predicting enzyme catalytic sites from sequence and/or structural information. I dig deeper into the predictions, showing how machine learning methods can extract the predictive power xtent and to what e heterogeneous features derived at different levels i.e. ( sequence, structure, and network) of the data samples can contribute to the model’s performance. Some of the existing difficult issues and problems will also be discussed in the talk.

12 BIOGRAPHY AND ABSTRACT

Professor Yaoqi Zhou Professor of Computational Biology & Research Leader Institute for Glycomics Griffith University Brisbane

Biography: Prof. Yaoqi Zhou received his BS in Chemical Physics from University of Science and Technology of China PhD in Chemical Physics from State University of New York at Stony Brook, USA in 1990. He conducted his postdoctoral studies at North Carolina State ersity Univ and Harvard University. He was an Assistant Professor and later Associate Professor at Department of Physiology and Biophysics, State University of New York at Buffalo, USA from 2000 to 2006, and a full Professor in the Schools of Medicine matics, and Infor Indiana University, Indianapolis, USA from 2006 to 2013.

He joined the Institute for Glycomics at Griffith University as a Professor of Computational Biology in 2013. His group has developed more than ten popular bioinformatics tools in d protein an RNA structure and function prediction. His current projects include development of computational algorithms and bioinformatics techniques that predict structural and functional properties of proteins and RNAs, discriminate disease-­‐causing from neutral netic ge variations, and design small molecules and peptides for -­‐ antibiotic, antiviral and anti cancer therapeutics.

Date: Tuesday 5 July 2016

Presentation title: Personalised medicine: discriminating disease-­‐causing from neutral genetic variations

Abstract: Personalised medicine predicts the likelihood of various diseases based on individual’s genomic, proteomic and metabolomic data and eliminates disease risks by designing individualised prevention programs and precision medicine. This talk will focus on an important subject in personalised medicine: How to discriminate disease-­‐causing from neutral genetic variations. We will discuss issues involving in setting up the dataset, designing discriminative features, and minimising the -­‐ risk of over training.

13 BIOGRAPHY AND ABSTRACT

Ms Alexandra Essebier PhD Candidate School of Chemistry and Molecular Biosciences The University of Queensland

Biography: Alexandra Essebier completed her undergraduate degrees in Science (Biochemistry and Molecular Biology) and Information Technology at The University of Queensland in 2013 then continued her studies and completed her Master of Bioinformatics in 2015. Alex has undertaken a number of research projects over the last three years as part of A/Prof. Mikael Bodén’s group at UQ. Her main focus is on the use of probabilistic models, specifically Bayesian networks, to analyse high-­‐throughput genomic datasets. She is currently enrolled as a PhD student investigating the application of probabilistic models to the integration of datasets relevant to transcriptional regulation such as chromatin immunoprecipitation followed by sequencing (ChIP-­‐seq) and RNA sequencing.

Date: Tuesday 5 July 2016

Presentation title: Processing, integrating and analysing chromatin pitation immunopreci followed by sequencing -­‐ (ChIP seq) data

Abstract: High throughput sequencing (HTS) technology has contributed to a number of discoveries in the human genome. One technique which relies on HTS is chromatin immunoprecipitation followed by sequencing (ChIP-­‐ seq); a technique that allows us to identify where proteins are binding in vivo. A handful of consortiums have generated thousands of data sets which use ChIP-­‐seq to describe transcription factor (TF) binding and histone modifications (HMs). Individual labs are also generating numerous data sets exploring TFs in specific cell types and cell states using HMs as support. In -­‐ the last decade, ChIP seq has become so popular that although we have a standard way of performing the experiment in the wet lab, we have seen the development of a staggering number of processing pipelines and protocols designed to interpret the resulting sequenced reads. This begs the question of which approach is the ‘best’.

Once you have your -­‐ ChIP seq peaks, you will quickly discover that despite the challenges, generating them was the easy part and now you’re faced with interpreting the data at hand. To fully appreciate the information stored in your peaks, you have to understand the strengths and weaknesses of ChIP-­‐seq and the power of integrating your -­‐ ChIP seq result with other available data.

This talk aims to guide you -­‐ through the ChIP seq processing steps exploring the available tools and discussing the important considerations that must be made throughout. It will also cover approaches for analysing ChIP-­‐ seq peaks and highlight the importance of data integration in the interpretation of processed ChIP-­‐seq datasets.

14 BIOGRAPHY AND ABSTRACT

Dr Denis Bauer Team Leader, Transformational Bioinformatics CSIRO Health Program Sydney

Biography: Dr Denis Bauer is the team leader of the transformational bioinformatics team in CSIRO’s -­‐ e health program. Her expertise is in high throughput genomic data analysis, computational genome engineering, as well Spark/Hadoop and high-­‐performance compute system. She holds a PhD in Bioinformatics and has done her postdoctoral training in machine learning and human genetics respectively. Her collaborators include Prof. Simon Foote on mammalian susceptibility to infectious . diseases, Prof Ian Blair on molecular mechanisms on motor neuron disease, . and Prof Rodney Scott on obesity-­‐driven cancer. She -­‐ has 23 peer reviewed publications (9 first author, 4 senior author) with three e.g. in journals of IF>8 ( Nature Genetics) and -­‐ H index 9. To date she has attracted more than AU$25M in funding (NHMRC).

Date: Tuesday 5 July 2016

Presentation title: VariantSpark: applying -­‐ Spark based machine learning methods to genomic information

Abstract: Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Catering for this need, we developed VariantSpark, /Spark a Hadoop framework that utilises the machine learning library, MLlib, thereby providing the means of parallelisation for population-­‐scale bioinformatics tasks. VariantSpark offers an interface to the standard variant format (VCF), -­‐ seamless genome wide sampling of variants and provides a pipeline for visualising results.

To demonstrate the capabilities of VariantSpark, we cluster of more than 3,000 individuals with 80 million variants each to determine the population structure in the s dataset. VariantSpark i 80% faster than the Spark-­‐ based genome clustering approach developed by the Global Alliance for Genomics and Health, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.

Here I will give a short introduction into Hadoop and Spark as well as detail other approaches like the ADAM framework before talking about our solution, VariantSpark.

15 BIOGRAPHY AND ABSTRACT

Dr Siew-­‐Kee Amanda Low Lecturer Faculty of Pharmacy The University Sydney of

Biography: Dr Siew-­‐Kee Amanda Low is a lecturer from the Faculty of Pharmacy, University of Sydney. She completed her PhD from the University of Tokyo in 2011. Her main research interest is to study the contributions of genetic variations in x comple diseases and drug response (pharmacogenomics studies). Her expertise is in comprehensive genomic analyses by utilising genome-­‐wide association studies (GWAS) and next generation sequencing (NGS) with big data obtained from the Biobank Japan. Thus far, she has identified a handful of common genetic variations associated with complex diseases that include intracranial aneurysm, breast cancer, gastric cancer, endometriosis, pancreatic cancer, uterine fibroids and primary open-­‐angle glaucoma. She also carried out a large-­‐scale pharmacogenomics study that consist of approximately 13,000 cancer patients from the Biobank Japan, which aimed to identify common variations that are associated conventional chemotherapy-­‐induced toxicity. She also collaborates arious with v universities and institutions including Yale University, University of Chicago, Harvard Medical School, NIH, University of Cambridge and QIMR Berghofer Medical Research Institute, and participates in the Asia Breast Cancer Consortium. She has hed publis in Nature Genetics, PNAS, and Human Molecular Genetics.

Date: Wednesday 6 July 2016

Presentation title: The role of common genetic variations in complex diseases and pharmacogenomics studies

Abstract: In this seminar, I will demonstrate that genome-­‐wide association study (GWAS) is not just a useful approach to identify common genetic variations that are associated with disease susceptibility but it also suggested additional genes involvement that could improve the understanding of disease pathogenesis by using intracranial aneurysm and breast cancer as examples. I will also discuss about the applicability of genetic variants identified from GWAS and the possibility of developing prediction algorithm to evaluate disease susceptibility in general population. The second part of the talk will cover the applications of GWAS, a hypothesis-­‐free approach which facilitates the identification of novel genetic loci associated with drug response in pharmacogenomics studies. Further challenges utilising GWAS in pharmacogenomics studies will be addressed.

16 BIOGRAPHY AND ABSTRACT

Dr Mirana Ramialison Group Leader – NHMRC/NHF Career Development Fellow Australian Regenerative Medicine Institute Monash University Melbourne

Biography: Dr Ramialison received her Engineering degree from the Aix-­‐Marseille Université (Luminy), France in 2002, after which she worked as a database programmer at the ERATO differentiation project in Kyoto. After receiving her PhD summa cum laude in 2007 in Developmental Genomics at the European Molecular Biology Laboratory in Heidelberg, Germany, she joined the Victor Chang Cardiac Research Institute in Sydney in 2010 as an EMBO and HFSP postdoctoral fellow. She is now a Group Leader at the Australian Regenerative Medicine Institute, where she leads her Systems Developmental Biology laboratory researching on heart development, evolution and disease. Dr Ramialison is a NHMRC/Heart Foundation Career Development Fellow.

Date: Wednesday 6 July 2016

Presentation title: I've got my list of differentially expressed genes, now what?

Abstract: The development of robust automatised pipelines has allowed the standardisation of the processing high-­‐ throughput sequencing datasets. For instance, RNA-­‐sequencing data can now be easily systematically processed from raw reads directly to differentially expressed genes between various conditions. However, the next steps are less obvious. My talk will focus on downstream analysis where no “standard” pipelines are established. Based on collaborative experiences, I will attempt -­‐ to provide a non exhaustive overview of different avenues to explore downstream -­‐ of an RNA seq experiment, from the traditional pathway analysis towards systems-­‐level approaches.

17 BIOGRAPHY AND ABSTRACT

Dr Joshua W.K. Ho Head, Bioinformatics and Systems Medicine Laboratory Victor Chang Cardiac Research Institute Sydney

Biography: Joshua Ho completed a BSc (Hon 1, Medal) in Biochemistry and Computer Science in 2006 and a PhD Bioinformatics in 2010, both from the University of Sydney. He then completed an interdisciplinary postdoctoral fellowship at the Harvard Medical School, and was promoted to an Instructor in Medicine 2012. In 2013, he returned to Australia to set up the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute. Joshua is also an NHMRC/National Heart Foundation Career Development Fellow, and a conjoint senior lecturer at UNSW. In 2015, he was awarded the NSW Ministerial Award for Rising Stars in Cardiovascular Research, and the Australian Epigenetics Alliance’s Illumina Early Career Research Award. His research focuses on developing fast and reliable bioinformatics methods to identify the genetic cause of inherited heart diseases, using a range of approaches such as whole genome sequencing, machine learning, systems biology, cloud computing, and software testing and quality assurance.

Date: Wednesday 6 July 2016

Presentation title: Bioinformatics software testing and quality assurance

Abstract: Bioinformatics is the application of computational, mathematical and statistical techniques to solve problems in biology and medicine. Arguably the main research focus has so far been on the computational and statistical basis of the algorithms. ly Surprising much less effort has been placed on the quality of the design and implementation of these algorithms -­‐ even though clearly correct design and implementation of the underlying algorithm is at least as important as the algorithm ed itself. Incorrectly comput results may lead to wrong biological conclusions, and subsequently misguide downstream experiments. This problem is especially critical if these bioinformatics tools are to be used in a translational clinical setting. In this lecture, we will discuss key concepts and methods in the field of software testing. We believe introducing and adapting state-­‐of-­‐the-­‐art software testing and quality assurance techniques in bioinformatics is a critical step in improving the quality, reproducibility, and accountability of bioinformatics tools.

18 BIOGRAPHY AND ABSTRACT

Dr Igor Makunin NeCTAR Genomics Virtual Laboratory (GVL) Project Research Computing Centre The University of Queensland

Biography: Igor has extensive experience in analysis of nextGen sequencing data, comparative genomics, genetics and molecular biology. He provides support for biologists working with nextGen sequencing data on the Galaxy platform.

Igor has worked as a scientist at the Queensland Institute of Medical Research, The University of Queensland, Institute of Cytology and Genetics (Novosibirsk, Russia), the University of Geneva and the University of Cambridge.

Date: Wednesday 6 July 2016

Special Workshop: An introduction to Galaxy with the NeCTAR Genomics Virtual Laboratory

Abstract: The Galaxy platform is one of the world’s most popular and fastest -­‐ growing bioinformatics web based interfaces. With Galaxy, biologists can access a huge range of bioinformatics -­‐ tools, using user friendly and intuitive graphical interfaces. Galaxy also ecords captures and r analysis pipelines to provide full reproducibility, and simplifies sharing of data and analyses between colleagues.

The NeCTAR*-­‐supported Genomics Virtual Laboratory project has adopted Galaxy as one of its major platforms to bring the power of the national research cloud to bench biologists. Through the GVL and NeCTAR, Australian researchers and their collaborators have free access to high performance bioinformatics computing resources.

This workshop will focus -­‐ on a hands on introduction to using Galaxy on the research cloud. Participants will learn where and how they can access a Galaxy instance, how to upload and access data, running basic analysis pipelines, and using both integrated nd a plug-­‐in functions to visualise genomic data.

We will introduce histories and workflows, and explore how they can be used to run reproducible analysis pipelines and to share analyses with colleagues. We will also discuss how to extend the standard Galaxy build to add new tools and custom reference genomes.

The workshop is intended for bench scientists, and no previous bioinformatics experience is needed.

*National eResearch Collaboration Tools and Resources

19 BIOGRAPHY AND ABSTRACT

Dr Seán O’Donoghue Office of the Chief Executive (OCE) Science Leader CSIRO, Sydney

Biography: Seán O’Donoghue (http://odonoghuelab.org/) is an Office of the Chief Executive Science Leader in Australia's Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney. He is also Group Leader and Senior Faculty Member at the Garvan Institute of Medical Research in Sydney. He received his BSc (Hons) in 1987 and PhD in Biophysics in 1992 from the University of Sydney, Australia. Much of his career was spent in Heidelberg, Germany, where he worked in the Structural and Computational Biology programme at the European Molecular Biology Laboratory (EMBL), and also at Lion Bioscience AG -­‐ then the world's largest bioinformatics company -­‐ where he was Director of Scientific Visualisation. His work has received many awards, including the Elsevier Grand Challenge (first prize), the Eureka Prize for Excellence in Interdisciplinary Scientific Research (finalist, 2015), the NSW Emerging Creative Talent Award (finalist, 2015), and the NSW iAward for Research and Development (first . prize, 2015) His contributions have been recognised with a C.J. Martin Fellowship from the National Health & Medical Research Council of Australia, an Achievement Award from Lion Bioscience AG, election and by as a Fellow of the Royal Society of Chemistry.

Date: Thu rsday 7 July 2016

Presentation title: Data visualisation in bioinformatics: exploring the dark ‘ ’ proteome

Abstract: The rapidly increasing volume and complexity of biological data calls for new approaches to help life scientists gain insight from these data, rather than being overwhelmed. To address lication this, the app of modern data visualisation principles and methods will be critical, in combination with improved data management, machine learning, and statistics. I will illustrate the power of this 'BioVis' approach by presenting several bioinformatics resources that empower biologists by making complex data easier to access and use. This includes:

Aquaria (http://aquaria.ws), Compartments http://compartments.jensenlab.org/ ( ), Tissues (http://tissues.jensenlab.org/), and Minardo (http://minardo.org/snapshot), and http://rondo.ws Rondo ( ).

I will showcase how these resources are being used to explore the known and unknown ('dark') proteome, generating new insights into human biology and health. I will also discuss VIZBI, an international initiative aimed at raising the global standard of bioinformatics software (http://vizbi.org/). Finally, I’ll discuss the use of visualisation to create molecular -­‐ and cellular scale animations aimed at educating and inspiring the public about cutting-­‐edge biomedical research (http://vizbi.org/plus).

20 BIOGRAPHY AND ABSTRACT

Dr Kate Patterson Visual Science Communicator and Biomedical Animator Garvan Institute of Medical Research Sydney

Biography: Kate Patterson uses visual language to transform complex scientific concepts for a general audience. Kate is trans-­‐disciplinary researcher working at the interface of art and science, bringing together the historically segregated fields of technology, art and science in order to contribute new work and knowledge to the field of visual science communication. Science can be complex, dynamic and invisible to the naked eye. Kate makes this accessible by a broad audience through the combined use of hand drawn illustration, computer generated imagery and 3D animation. Kate transforms raw scientific data using the tools of visual arts and cinematography into a form that can be used for education, communication and awareness purposes.

Kate first practiced as a veterinarian, but with a particular interest in cancer biology undertook a PhD at The Garvan Institute of Medical Research, which was awarded in 2009. During this time she developed her interest in science communication, using s visuali ation. Following her PhD Kate worked as a scientific writer and illustrator and in 2012 she was awarded an Inspiring Australia government grant to produce compelling, 3D animations on cancer and epigenetics. Kate is a Lab Research Fellow in the 3D visualisation and aesthetics laboratory at UNSW Art and Design and a visual science communicator in the Epigenetics Research Group at the Garvan Institute of Medical Research. She writes the “Drawing from science” column for The Conversation and works freelance as a writer, scientific illustrator and animator trading as MediPics and Prose.

Date: Thursday 7 July 2016

Presentation title: Experimentation at the interface of art and science: narrative, cognitive embodiment and alternative visual language

Abstract: Biodata visualisation is a term used to describe a diverse field that encompasses data visualisation, science visualisation for education, visual science storytelling and expressive artwork inspired by science. Scientific visualisation has a long history, however, the combined evolution of technology and scientific knowledge present modern visualisation and communication challenges.

In biomedical research, visuals are a powerful tool for effective communication and when combined with narrative, can help o t contextualise biomedical research, to distill complex information, to spark emotion and to cause behaviour change. Biomedical animation in particular plays a key role, transforming and contextualising raw scientific data for varied audiences by blending storytelling with art and design. Biomedical animation is characterised by high scientific integrity, preservation of data and blending of multiresolution data from different sources. There is a balance between accuracy and artistry, which is achieved with careful transformation of objective scientific data and orchestrated design choices.

This talk will focus on case studies where storytelling and art combine to communicate concepts in genomics and epigenetics. Communication of genomics and poses epigenetics specific challenges, because these fields of research are complex, and the biological systems are dynamic and stochastic in nature. There are multiple levels of control and the molecular machines associated with these biological systems are invisible – it all happens inside our cells at a scale smaller than the wavelength of light.

21 BIOGRAPHY AND ABSTRACT

Dr Melissa Davis Laboratory Head Division of Bioinformatics Walter and Eliza Hall Institute for Medical Research Melbourne

Biography: Dr Melissa Davis is a computational biologist and Laboratory Head in the Bioinformatics Division of the Walter and Eliza Hall Institute of Medical Research. Her background is in genetics and computational cell biology with expertise in the analysis of genome-­‐scale molecular networks and knowledge-­‐based modelling.

Melissa received her PhD at UQ and continued as a postdoc at the Institute for Molecular Science. In 2014, she was awarded a National Breast Cancer Foundation Career Development Fellowship, and took up a position as Senior Research Fellow in Computational Systems Biology at the University of Melbourne in the Systems Biology Laboratory, before moving to the Walter and Eliza Hall Institute for Medical Research as a Laboratory Head in 2016. pecialises Melissa s in the integration of genomic, transcriptomic, and proteomic data with knowledge-­‐based network models to understand the regulatory logic of mammalian systems.

Date: Thursday 7 July 2016

Presentation title: Network and data visualisation analysis and in Cytoscape

Abstract: The representation of systems of molecular interactions as networks enables the application of graph-­‐based visualisation and analysis methods. In this presentation, I will briefly review different methods for the visualisation and analysis of molecular networks, and then present in depth an introduction to the Cytoscape application. Cytoscape is freely available to the academic community, and is a powerful application for visualisation and analysis of biological networks.

22 BIOGRAPHY AND ABSTRACT

Professor Seok-­‐Hee Hong ARC Future Fellow School of Information Technologies University of Sydney

Biography: Prof. Hong is a Future Fellow at nformation the School of I Technologies, University of Sydney. She was a Humboldt Fellow, and a project leader of VALACON (Visualisation and Analysis of Large and Complex Networks) project at NICTA (National ICT Australia). Her research interests include graph drawing, algorithms, information visualisation and visual analytics. In 2006, she won the CORE (Computing Research and Education Association of Australasia) Chris Wallace Award for Outstanding Research Contribution in the field of computer science. Prof. Hong has more than 140 publications and has given 50 invited seminars worldwide. In particular, she has developed an open source visual analytic software GEOMI with her research team. She serves as a Steering Committee member of IEEE PacificVis (International Symposium on Pacific Visualisation) and ISAAC (International Symposium on Algorithms and Computations), and Journal an editor of JGAA ( of Graph Algorithms and Applications) and IEEE Compute Graphics and Applications. In particular, she has formed the Information Visualisation research mmunity co in the Asia-­‐Pacific Region, by founding IEEE PacificVis Symposium.

Date: Thursday 7 July 2016

Presentation title: Big data v isual a nalytics

Abstract: Recent technological advances have led to the production of a big data and complex networks in many domains. Examples include biological networks such as phylogenetic network, gene regulatory network, metabolic pathways and protein-­‐protein interaction networks. re Other examples a social networks such as Facebook, Twitter, LinkedIn, telephone , call citation and collaboration s network . Visualisation is an effective analysis tool for complex networks. Good visualisation reveals the hidden structure of the networks and amplifies human understanding, thus leading to new insights, new sis findings, new hypothe and predictions. However, constructing good visualisation of big data is extremely challenging due to scalability and complexity.

This talk will present a framework for visual analytics of big data. Visual Analytics is the science of analytical reasoning facilitated by interactive visual interfaces. Our framework is based on the tight integration of analysis, visualisation and interaction methods. I will present a number of case studies using various networks derived from big data, in particular biological networks and social networks.

23 BIOGRAPHY AND ABSTRACT

Dr Chris Brown Research Fellow Australian Rivers Institute Griffith University Brisbane

Biography: Chris is a research fellow at Griffith University’s Australian Rivers Institute. He works on the conservation of marine ecosystems and sustainable management of fisheries. His work brings ecological complexity to the planning tools used to inform decision making (http://www.seascapemodels.org/Research/). His research applies both mathematical models and statistical analysis, using primarily the R programming language (https://cran.r-­‐project.org/). He enjoys teaching R and his teaching resources are free and open-­‐access (http://www.seascapemodels.org/Rstats/).

Date: Thursday 7 July 2016

Presentation title: Creating data visualisations that won’t be forgotten using the R programming language

Abstract: Visualisations are a crucial aid for interpreting your data and also for communicating your research, yet too little attention is paid to teaching the skills required to create great visuals. A great visual will enhance the impact of your research, by helping other scientists, policy makers and the public understand remember and ultimately re-­‐communicate your research. Creating great visuals requires specialised tools that bridge data analysis and creative design. The R programming language is fast becoming the most powerful and flexible package for data analysis and ation. visualis R’s flexibility also means it can be used y to create almost an design you can dream up. This talk will cover some of the basics of visualisations, starting with the psychology of communication, then covering how visuals can help market your research, and finally how R can be used to realise your creativity.

24 BIOGRAPHY AND ABSTRACT

Professor Howard Ochman Department of Integrative Biology University of Texas Austin, USA

Biography: Howard Ochman was trained as a population geneticist at the University of Rochester, where he received his PhD in 1984. Technical advances r in molecula biology prompted his switch to studying s the organi ation and evolution of bacterial genomes, and for the past three decades, he has been applying molecular and computational approaches to investigate the evolution, diversity and es. interactions among microb After a postdoctoral stint in the Department of Biochemistry at UC Berkeley, he worked as a research scientist on the Human Genome Project; and in 1987, moved to Washington University to study the evolution of bacterial pathogenesis. Prior to joining he t faculty at the University of Texas at Austin, he held faculty appointments at the University of Rochester (1991-­‐1998), the University of Arizona (1998-­‐2010) and Yale University (2010-­‐ 2013).

Date: Friday 8 July 2016

Presentation title: The e xtraordinary e volution of g the reat a pe m icrobiome

Abstract: Despite the large body of work concerning the human microbiome and its role in human health, there is little information about how the microbiome evolves or the factors causing differentiation among species. Analysis of the gut microbiomes of great ape species, including humans, revealed that the phylogeny based on microbiome compositions was completely congruent with the known relationships of the hosts. Our investigations of the microbiomes of great apes have informed several features of the human microbiomes, including its stratification into community types and the effects of certain infective states on microbiomes contents. By comparing the gut microbiomes of great ape species in a phylogenetic context, we reconstructed how the human microbiome evolved during great ape diversification. We have found that human gut microbiomes have been diverging at a greatly accelerated rate since our split from other great apes due to the loss of microbial diversity at every taxonomic level.

25 BIOGRAPHY AND ABSTRACT

Dr David Wood Postdoctoral Researcher Australian Centre for Ecogenomics The University of Queensland

Biography: David completed his science undergraduate studies at the Australian National University in 2003. He then worked as a bioinformatician at the Australian Genome Research Facility and then at Queensland Facility for Advanced Bioinformatics in Brisbane. In 2010 he undertook a PhD in mammalian transcriptomics and RNA-­‐seq analysis at the Queensland Centre for Medical Genomics under the supervision of Professor Sean Grimmond and Dr Nicole Cloonan. David is now a postdoctoral researcher supervised by Professor Phil Hugenholtz at the Australian Centre for Ecogenomics -­‐ studying host associated microbial ecology focused on clinical projects. Always intrigued by the natural world he feels privileged to have fulfilled his childhood dream of becoming an 'ologist’. With broad training across multiple disciplines he is engaged by genomics of all sorts, by the computational analysis of rich data sets -­‐ produced by high throughput sequencing technologies, and has a habit of finding almost anything fascinating.

Date: Friday 8 July 2016

Presentation title: Tools and m ethods for microbial ecological genomics

Abstract: The advent of culture-­‐independent techniques for studying microbial ecology (microbial ecological genomics) has had tremendous industrial, medical and environmental impact, and greatly expanded our knowledge of the tree of life. Advances in sequencing and bioinformatics continue to underpin this al field. Princip experimental goals in microbial ecology include determining community membership and composition, changes in composition, functional analysis and genome discovery. In this talk I will discuss current methods and associated statistical techniques for -­‐ both gene and genome-­‐centric approaches to address these goals.

26 BIOGRAPHY AND ABSTRACT

Dr Christian Rinke Research Officer Australian Centre for Ecogenomics The University of Queensland

Biography: Christian Rinke is a Research Officer at the Australian Centre for Ecogenomics (ACE), The University of Queensland. He received his PhD in Zoology in 2007 from the Marine Biology Department at the University of Vienna, Austria and has since shifted us his foc to the microbial world.

His research interests include genomics and the phylogeny and ecology of symbiotic and free living microbes. He focuses in particular on the uncultured majority of microbes (99%) which elude current culturing efforts. This so called “Microbial Dark Matter” can only be explored with culture-­‐independent methods. Chris pioneered methods in high throughput -­‐ single cell genomics, the separation and sequencing of single bacterial and archaeal cells, and also employs metagenomics irect (the d sequencing of environmental samples) to illuminate microbial dark matter.

Date: Friday 8 July 2016

Presentation title: Illuminating microbial dark matter via single-­‐cell genomics

Abstract: Our view of microbial genomic diversity is severely skewed with the majority of all sequenced bacterial and archaeal genomes belonging to only four bacterial phyla. This bias results in part from our inability to cultivate most microbes, a necessary step for traditional whole genome sequencing. n Through cultivatio -­‐independent approaches such as -­‐ single cell genomics, one can now explore the genetic diversity and metabolic potential of uncultivated environmental microorganisms. We successfully amplified several hundred single cells from free-­‐ living and symbiotic ulations pop without cultured representatives, known as microbial dark -­‐ matter. The single cell genomes allowed us to explore -­‐ their intra and inter-­‐phylum-­‐level relationships, to decipher encoded pathways, and to discover novel metabolic features. -­‐ The single cell reference genomes also facilitate the interpretation of metagenomic data sets and substantially improve phylogenetic anchoring of up to 20% metagenomic reads in some habitats. While there is still much ground to cover, single-­‐cell-­‐genomics has proven to be a valuable tool to improve our understanding of microbial evolution on earth.

27 BIOGRAPHY AND ABSTRACT

Professor Edward F. DeLong Professor of Oceanography Co-­‐Director, C-­‐MORE and Co -­‐Director, SCOPE University of Hawaii Honolulu, USA

Biography: Edward DeLong received his Bachelor of Science degree in Bacteriology at the University of California Davis in 1982, and his PhD in Marine Biology in 1986 at Scripps Institute of Oceanography at the University of California San Diego. He was a Professor at the University of California Santa Barbara in the Department of Ecology for seven years, before moving to the Monterey Bay Aquarium Research Institute where he was a Senior Scientist and Chair of the Science Department also for seven years. Until July 2014, he served as a Professor at the Massachusetts Institute of Technology in the Departments of Civil and Environmental, and Biological Engineering, where he held the Morton and Claire Goulder Family Professorship in Environmental . Systems He is now a Professor of Oceanography in the School of Ocean and Earth Science and Technology at the University of Hawaii, Manoa. He currently serves as co-­‐Director for both the Center for Microbial Oceanography: Research and Education (C-­‐MORE), and the Simons Collaboration on Ocean Processes and Ecology (SCOPE). DeLong is a Fellow in the American Academy of Microbiology, the American Academy of Arts and Science, the U.S. National Academy of Science, and the American Association for the Advancement of Science.

DeLong’s scientific interests focus primarily on central questions in marine microbial genomics, biogeochemistry, ecology, and evolution. A large part rts of DeLong’s effo have been devoted to the study of microbes and microbial processes in the ocean, combining laboratory and field-­‐based approaches. Development and application of genomic, biochemical and metabolic approaches to study and exploit microbial communities and processes is another area of interest. Currently, Delong is coupling the use of autonomous robotic sensors and samplers with genomic technologies, to derive highly resolution spatial and temporal maps of microbial community gene expression datasets in situ.

Date: Friday 8 July 2016

Presentation title: Towards 4 dimensional (eco) systems biology in the sea

Abstract: Microbial communities regulate the cycling of energy and matter in the marine environment, yet the variability of their activities in space and time, and how they dynamically respond to both natural and anthropogenic environmental changes, is not well understood. -­‐ Genome enabled methodologies are now providing deeper perspective on the nature and the ial identity of microb taxa, genes, and metabolic diversity in the marine environment. Yet one of the larger challenges remaining is defining the variability of these microbial taxa, genes and processes on different spatial and temporal scales in the environment. Questions that need to be better addressed include: How do activities of different microbial species vary of the course minutes, hours, days and weeks? Over what spatial scales are temporal dynamics coherently predictable? How does variation in any specific ation popul correlate corresponding environmental variation, and the variability of other taxa? Novel in situ robotic sampling strategies that capture transcriptomic temporal profiles of wild planktonic microbial populations, have potential provide a four dimensional motion picture of microbial gene expression dynamics that can begin to address such questions. New results using such approaches show that individual coexisting eukaryote, bacterial and archaeal populations display remarkably similar, time-­‐variable patterns of synchronous gene expression over extended periods of time. Furthermore these patterns appear to be robust, and conserved in genetically related populations that span the Pacific Ocean. These results suggest that specific environmental cues may elicit cross-­‐species coordination of gene expression among diverse microbial groups that potentially enable multispecies coupling of metabolic activity. These data are leading to specific, testable hypotheses about how microbial interspecies matter and energy exchange may influence the cycling of matter and energy in the ocean.

28 BIOGRAPHY AND ABSTRACT

Dr Kate Ormerod Postdoctoral Research Fellow Australian Centre for Ecogenomics The University of Queensland

Biography: Kate Ormerod is a postdoctoral researcher at the Australian Centre for Ecogenomics (ACE), The University of Queensland. She completed her PhD in 2015 at The University of Queensland working on microevolution of the opportunistic fungal pathogen Cryptococcus neoformans. She is now investigating uncultured members of the gut microbiota using metagenomic sequencing.

Date: Friday 8 201 July 6

Presentation title: Genomes from metagenomes: recovery and analysis of population genomes

Abstract: The host microbiome has been firmly established as critical to host physiology. However, many members of this community have yet to be cultured and characterised. As such, it remains difficult to ascribe their contributions to gut and systemic function, and thereby, host health -­‐ and well being. In this talk I will be discussing the use of population genomes recovered via metagenomic sequencing to fill in these knowledge gaps using as illustration an example of an abundant but uncultured family present within the mouse gut microbiota.

29 BIOGRAPHY AND ABSTRACT

Dr Ben Woodcroft Postdoctoral Researcher Australian Centre for Ecogenomics The University of Queensland

Biography: Ben has the happy knack of enjoying most things that are plonked in front of him, but has tended to gravitate towards development and application of bioinformatic techniques to studying biological systems in rewarding ways. Topics of interest include soil meta-­‐omics, eukaryotic cell biology, endosymbiosis, methanogen biology and open-­‐source software.

Ben received his Bachelor of Engineering in 2006 at The University of Queensland . (UQ) Starting from a computational background at UQ, Ben’s interest in biological systems was sparked by an undergraduate project in protein structure with Dr on Nicholas Hamilt in 2005, then an honours project in 2006 with Prof. Bernie Degnan’s marine biology laboratory studying the genome structure of the most basal animals, sponges. In 2008 he moved south to the University of Melbourne, in doing ionarily so moving further away evolut from animals, studying malaria parasites. Specifically, under the guidance The of Dr Stuart Ralph ( University of Melbourne) and Prof. (Walter & Eliza Hall Institute of Medical Research) his PhD concentrated on the development and application bioinformatic tools to understand the parasite’s complex cell biology. He completed his PhD in 2013. He recently continued his evolutionary trajectory by taking up this position at Australian Centre for Ecogenomics (ACE) with Prof. Gene Tyson, using metagenomic approaches to try to understand the carbon cycle in thawing permafrost, concentrating particularly on the role of methanogens in climate change. He is now pursuing scalable hypothetical gene annotation approaches through coupling of genome-­‐centric metagenomics with ultra-­‐high resolution -­‐ mass spectrometry based large molecule metabolomics.

Date: Friday 8 July 2016

Presentation title: Community diversity in metagenomes: one, many and thousands

Abstract: Shotgun Illumina sequencing is an increasingly common method of studying microbial communities as it allows estimation of functional potential and recovery of population genomes. Studying communities with metagenomics is not subject to the primer biases inherent -­‐ in traditional amplicon based approaches.

However, unlike traditional methods which group sequences into operational taxonomic units (OTUs), bioinformatic synthesis of metagenome es data do not resolve sequence types and instead simply estimates abundances of taxonomic groups. This presentation will focus on the applications of SingleM, a tool for finding OTUs from metagenomes and determining microbial community structure.

Resolving community profiles into sequence-­‐based rather than -­‐ taxonomy based groupings enables community profiles to be resolved more finely. Ecological alpha and beta diversity metrics can be calculated even for complex communities containing novel lineages, and community profiles derived from metagenome sequences can be directly linked to recovered population genomes. Comparison made between unrelated metagenomic studies can also empower genome recovery efforts. -­‐ An associated NeCTAR hosted website allows searching of >4 million distinct sequence types from thousands of public metagenomes.

30 BIOGRAPHY AND ABSTRACT

A/Professor Torsten Seemann Lead Bioinformatician Victorian Life Sciences Computation Initiative and Microbiological Diagnostics Unit Public Health Laboratory The University of Melbourne

Biography: A/Prof. Torsten Seemann is lead bioinformatician at the Victorian Life Sciences Computation Initiative and the Microbiological Diagnostics Unit Public Health Laboratory, both at the University of Melbourne. His work uses bioinformatics and genomics to rstand better unde the spread and evolution of bacterial pathogens and antimicrobial resistance. He is best known for his software tools which are used internationally, and he is a strong supporter of open science.

Date: Friday 8 July 2016

Presentation title: Comparing the variome -­‐ and pan genome of bacterial isolates

Abstract: This talk will discuss methods for comparing large numbers of bacterial isolates at the small scale (such as SNVs) and the large scale (pan-­‐genome content). The methods, important caveats and visualisation of bacterial population SNV and -­‐ pan genome determination will be covered. These techniques are now becoming applicable to metagenomic data sets now because increasingly long sequence read length makes it possible to extract individuals from a population.

~*~*~*~*~

31

Sponsored by:

AUSTRALIAN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY SOCIETY

RDS Research Data Services

Thanks to both the Institute for Molecular Bioscience and Faculty of Science at The University of Queensland for providing support to their students to attend the 2016 Winter School.