CLC Genomics Workbench Features & Benefits

CLC Genomics Workbench Features & Benefits Solving the data analysis challenges of Some features of CLC Genomics Workbench High-Throughput Sequencing Director of the Einstein Genomics Center for Epigenomics at With High-Throughput Sequencing machines, High • Read mapping of Sanger, 454, Illumina Genome Ana- the Albert Einstein College lyzer and SOLiD sequencing data of Medicine, Dr. John Gre- Throughput Sequencing has become accessible to a very large group of researchers. However, data analysis repre- • De novo assembly of genomes of any size (only limited ally: by RAM available) sents a serious bottleneck in NGS pipelines of most R&D • Color space mapping departments, which in turn dramatically reduces the Re- • Advanced visualization, scrolling, and zooming tools CLC bio's tools are go- turn of Investment of current NGS assets. • SNP detection using advanced quality filtering ing to put sophisticated • Support for multiplexing with DNA barcoding analytical ability into CLC Genomics Workbench solves this problem and will en- the hands of molecular able everyone to rapidly analyze and visualize the huge Transcriptomics biologists at Einstein, amounts of data generated by NGS machines. The user- and will greatly enhance friendly and intuitive interface essentially takes High- • RNA-seq incl. support for paired data and transcript- level expression their ability to explore Throughput Analysis away from hardcore bioinformatics • Small RNA analysis the massively-parallel programmers doing command-line scripts, and hands it • Expression profiling by tags sequencing data that to scientists searching for biological results. Furthermore, • EST library construction we are generating. We the versatile nature of CLC Genomics Workbench allows • Advanced visualization, scrolling, and zooming tools see this as a way of it to blend seamlessly into existing sequencing analysis • Gene expression analysis lowering barriers for workflows, easing implementation and maximizing return Epigenomics scientists who have not on investment. previously performed • ChIP-seq analysis these high-throughput • Peak finding and peak refinement epigenomic assays, al- Multi technology – multi platform • Case/control analysis lowing them to explore their data and explore CLC Genomics Workbench includes High Performance Classical sequence analysis tools hypotheses. Computing accelerated assembly of High-Throughput Se- quencing data as well as a large number of downstream • Primer design analysis tools. • Molecular cloning • BLAST • Alignments CLC Genomics Workbench is the first comprehensive anal- • Phylogenetic trees ysis package which can analyze and visualize data from • Advanced RNA structure prediction and editing For Windows, Mac OS X, all major NGS platforms, like SOLiD, 454, Sanger, Illumina • Integrated 3D molecule analysis and Linux and Ion Torrent. Collaboration with instrument manufac- • Secondary protein structure predictions • And much more... CLC bio©Copyright 2011 turers is a natural part of CLC bio’s development process. clcbio.com CLC Genomics Workbench 1 / 4 Like all other Workbenches from CLC bio, CLC Genomics Workbench runs on Support for analysis of hybrid data Mac OS X, Windows, and Linux platforms. You decide which computer to run Read mapping as well as de novo assembly support the analysis of different your software on – not us. kinds of data at the same time. An example would be the de novo assembly of Sanger data, 454 single read data, and Illumina paired end data in the Genomics Features same analysis. This functionality dramatically reduces manual work for the scientists, facilitating focus on deriving biological results from the data in- CLC bio’s world renowned scientists have designed completely new and inno- stead of doing tedious data-crunching. vative algorithms to power the features of CLC Genomics Workbench. These highly advanced and cutting edge algorithms incorporate SIMD processor ac- Multiplexing celerating technology to yield a significant speed-up of the read mapping as When doing batch sequencing of different samples, you can use multiplexing well as the de novo assembly processes. techniques to run different samples in the same run. There is often a data analysis challenge to separate the sequencing reads, so that the reads from one sample are analyzed together. CLC Genomics Workbench supports a large number of multiplexing protocols for various types of multiplexing based on name and multiplexing based on tags or barcoding. SNP detection Fig. 1: A region of low coverage has been found in the assembly view, and the cor- CLC Genomics Workbench offers automated SNP detection. The SNP de- responding region of the contig sequence is automatically highlighted. tection in CLC Genomics Workbench is based on the Neighborhood Quality Standard (NQS) algorithm of [Altshuler et al., 2000] (also see [Brockman et al., 2008] for more information). Read mapping The read mapping functionality of CLC Genomics Workbench supports both If the reference sequence is annotated with ORF or CDS annotations, the SNP short and long reads, it supports paired reads, it supports gapped and un- detection will also report whether the SNP is synonymous or non-synony- gapped alignments, and it supports Sanger, 454, Illumina Genome Analyzer mous. If the SNP variant changes the amino acid in the protein translation, and SOLiD sequencing data. the new amino acid will be reported. CLC Genomics Workbench map reads to genomes of any size as long as the The graphical user interface allows the user to easily identify SNPs and get a computer has the necessary RAM. A 10 fold human genome read mapping graphical overview of SNPs in smaller or larger genomic regions. can be carried out on a standard computer with 16 GB of RAM. Identifying genomic rearrangements Mapping of SOLiD data is carried out in native color space, using a high per- Through the advanced graphical user interface, CLC Genomics Workbench formance computing based algorithm. Up to 80% more hits have been found supports the identification of a variety of genomic rearrangements like inser- when assembling 35mer SOLiD data in color space, compared to assembling tions, deletions, duplications and inversions. the same data in base space. De novo assembly Transcriptomics Features The de novo assembly of CLC Genomics Workbench supports both short and CLC Genomics Workbench has tools to support a full work flow in analysis of long reads, it supports paired reads, and it supports Sanger, 454, Illumina expression data. These include visual quality control tools, such as principal Genome Analyzer and SOLiD sequencing data. component blots and box plots, transformation and normalization tools, tools for statistical testing and false discovery rate control, clustering al- The de novo assembler can perform scaffolding for joining contigs based on gorithms, heat-map visualization, and tests on gene annotations, such as paired reads information. A combination of paired data protocols can be Hyper Geometric tests and Gene Set Enrichment analysis. used mixing paired end and mate pair data with various inset sizes in the same assembly. Data supported for expression analysis is RNA-seq, Small RNA, tag based expression based profiling and single color microaray gene expression data. Depending on the coverage and quality of the data, and, CLC Genomics Work- bench de novo assembles genomes of any size. The interactivity of the multiple available views allows easy navigation and Benchmarks – E. Coli Minutes: 454: Read mapping and visualization of 439,000 reads to E. Coli (5 mega bases) on a 1,500 USD 2GB dual core, 2.13 GHz, 32 bit laptop computer 2 Illumina Genome Analyzer: Read mapping and visualization of 2 x 2.7 = 5.4 million paired end reads (1 lane) to E. Coli (5 Mega bases) on a 32GB, 8 core, 2.5 GHz, 64 bit desktop computer 3 2 / 4 CLC Genomics Workbench overview of data and analysis results. The complete integration of the ex- other resources. The annotations can be grouped on the precursor or mature pression analysis in the workbench enables the user to carry out downstream miRNA level. The final results can be visualized and analyzed using the ex- analysis of genes of interest with the comprehensive set of sequence analysis pression analysis tools. tools provided, immediately and without the hassle of switching between softwares. Expression profiling by tags CLC Genomics Workbench includes a powerful tag profiling functionality which is an extension to SAGE, using NGS technology. The full workflow ex- tracting tags from sequence reads of tag counting, creating virtual tag list, and annotating tag counts with gene names are supported. EST library construction An EST library can be constructed using the de novo assembly algorithm - e.g. to be used as reference sequences for mRNA seq or tag based transcriptomics. Epigenomics analyses Fig. 2: Heat-map visualization tool letting you depict the table of expression CLC Genomics Workbench includes a fully integrated ChIP-seq analysis solu- values. tion which can easily enable researchers to go from raw data, through reference alignment and onto advanced visual and tabular output of ChIP-seq Digital Gene Expression result. Data can be based on the information contained in a single sample CLC Genomics Workbench includes mRNA seq based on the approach from subjected to immunoprecipitation (ChIP-sample) or by comparing a ChIP- Mortazavi A, et.al, "Mapping and quantifying mammalian transcriptomes by sample to a control sample. RNA-Seq", Nat Methods. 2008 Jul;5(7):585-7.

CLC Genomics Workbench Features & Benefits

Current Status and Future Perspectives of Bioinformatics in Tanzania

White Paper on CLC Read Mapper

Fpgas in Bioinformatics

Buying in to Bioinformatics: an Introduction to Commercial Sequence Analysis Software David Roy Smith

CLC Sequence Viewer

HMMER User's Guide

A Novice's Guide to Analyzing NGS-Derived Organelle And

CLC Sequence Viewer Manual for CLC Sequence Viewer 6.5 Windows, Mac OS X and Linux

CLC Science Server Administrator

Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex- Specific Nuclease and Tetramethylammonium Chloride

Supplement on Visualizing Biological Data

Diploma Thesis a Module for Semi-Automated Annotation of Megabase-Sized DNA Sequences by Homolgy Search Stefan Michael Schuster