KNIME Bioinformatics Extensions

Karol Kozak

ETH Zurich

January 2011, Zurich, KNIME SIG your name Start HCS

your name HCS Experiments and Informatics

your name HCS & Open Source

HCDB – OpenBIS & Library Database

Bioinformatics KNIME KNIME Matlab Off-Target LIMS Nodes KNIME, Cell Classifiers

Spotfire KNIME KNIME Matlab your name Research interest

Role Software Classification Database engineering + RNA technology Pattern understanding modeling recognition

Annotation database Role of Bioinformatics in RNAi technology to detect functionality of Off-target genes in mammalian cells prediction -New prediction algorithms including Kernel methods -Post analysis of existing hits 2D/3D Structure relationships -Traditional bioinformatics: homology, blast, alignment score -Database development Gene functionality your name RNAi Libraries

- Genome wide - Functional groups (Kinases..) - n - Ologinucleotides - Pooled

your name RNAi Libraries - Qiagen - Thermo Fisher Scientific (Dharmacon) - Applied Biosystems (Ambion) - Sigma-Aldrich (esiRNA)

your name RNAi Library evlolution

V1 Purchase Sigma esiRNA AppBiosyst Qiagen TFisher annot annot annot Annot database database database database

V2 Dharmacon

annot TIME database

V3 Less genes Dharmacon More known oligo Today annot New design database Transcripts analysis Off-target information

your name Annotation of human genome & Meier Roger Reliability of siRNA libraries Qiagen genome wide siRNA library (HsDgV3 (Human Druggable Genome siRNA Set V3); HsNmV1 (Human Refseq Xm siRNA Set V1); HsXmV1 (Human Predicted genome Set V1)) 2006: 22’832 genes / 90’728 siRNAs 2010: 16’199 genes

0.07 % 0.11 % 0.01 % 0.23 % 0.01 % % of genes target by: 12.512.5% % 10.310.3% % 1 siRNA 2 siRNA 30.630.6% % 41 % 3939% % 3 siRNA 16.516.5% % 41% 25.625.6% % 4 siRNA 71 % 5 siRNA 71% 6 siRNA 3333% % 7 siRNA 2020% % 8 siRNA 9 siRNA

target genes with off target(s) siRNAs with off target(s) wrong predicted genes siRNAs against wrong predicted genes target genes siRNAs w/o off target(s) your name Based on GENEID

your name Annotated off-target by companies

How did companies select these off targets siRNA ?

- There are eliminated a lot of Ribosomal siRNA - There are eliminated a lot of siRNA against "membrane" proteins (2061 in old list, 945 in clean list) - did not eliminate many siRNA against "kinase" proteins (2624 in old list, 1605 in list) - besides the 2800 proteins that there are discarded slightly enriched in virus related pathways

your name Library reduction

• siRNA without geneID? • The web site shows the gene that the siRNA matched at the time it was selected. The database table from which we get current annotation shows what the siRNA matches in the current Refseq.

your name Library handling

From pooled 2 Oligo based

Library analysis

your name Bioinformatics - RNAi

-Off-target effect

UUGCCGUACAGGAUGGACGtg UUAACUGAUGUUCCAAUCCtg

your name Sandra Kaestner Bioinformatics

your name Bioinformatics

your name Workflow

your name Project Ribosome biogenesis translation of ribosomal proteins

maturation and export of 80S ribosome ribosomal mRNAs ribosomal proteins

80S ribosome

Pol II mRNA mature 40S subunit transcription of ribosomal mRNAs maturation and export of pre-40S particle mature 60S subunit pre-40S particle final maturation 90S particle pre-60S particle

large ribosomal proteins Pol I Pol III maturation and 35S rRNA 5S rRNA export of pre-60Sparticle small ribosomal rDNA proteins transcription of ribosomal RNAs trans-acting nucleolus nucleoplasm cytoplasm factors

18S rRNA maturation 18S 5.8S 25S 18S 18S 18S

35S rRNA 23S rRNA 20S rRNA 20S rRNA 18S rRNAyour name Thomas Wild The Rps2-YFP read out

80S ribosome ribosomal proteins

80S ribosome

mature 40S subunit

mature 60S subunit

Nucleolus Nucleoplasm Cytoplasm

your name The Rps2-YFP read out

80S ribosome ribosomal proteins

80S ribosome

mature 40S subunit

mature 60S subunit

Nucleolus Nucleoplasm Cytoplasm

your name The Rps2-YFP read out

80S ribosome ribosomal proteins

80S ribosome

mature 40S subunit

mature 60S subunit

Nucleolus Nucleoplasm Cytoplasm

your name The Rps2-YFP read out

80S ribosome ribosomal proteins

80S ribosome

mature 40S subunit

mature 60S subunit

Nucleolus Nucleoplasm Cytoplasm

your name Biogenesis

siRNA 84062- DTNBP1 AACCTTCAAAGCTGAACTAGA DTNBP1

HCDC RPS3 Transcript NM_001005

your name Results Biogenesis

Results

RNAi RNAi Library Hit Library + Results = Wonder Eg. 4 oligo Hit LIST

Off-target Real in HIT LIST Off-target in Potential Off-targets Hit LIST Results HIT LIST

your name Analysis Off targets

Known Off-targets

Build model

your name AllStars MVP_si03 MVP_si06

Uncoating

Acidification

AllStars si03 si06

MVP

Tubulin your name Virus screen 1 out of 3

siRNA TGGGCCTGAGATGCAGGTAAA MVP NM_003010

HCDC Qiagen off-target SM MAP2K4 MAP2K4 6416|NM_003010|2890|AS

your name mRNA 2D variants

mRNA

siRNA

- 2D structure relation to Off-target effects - We can model 2D structures quite robust (Metaserver, Python) - We can predict potential-target effects - We must find relation?

your name mRNA

We want to identify structural motifs in a set of mRNA sequences

your name mRNA 3D Structure Nature Movie

Nature Movie

- Dicer, tRNA, - Off-target relations RNA-RNA i RNP-RNA - Model how RISC is related to microRNA/siRNA and how it finds own target and bind to him your name RNA 3D

your name your name Workflow

ModeRNA Python BioPython for parsing structural data from the PDB format

your name

DatabaseHCS and architecture databases (LIMS)

Library and annotation

But need maintenance Screening experiment Sample, Results, Management, View

Done

Public database Phenotypic data

your name OpenBis - Open Source database

your name Screen DB Open Source database HCDB - OpenBIS

your name Open Source database OpenBIS (ETH) -web client -Command line client -Java technology, GWT Google

your name Screen DB Open Source database HCDB - OpenBIS

your name Screen DB Open Source database HCDB - OpenBIS

Adam Srebniak your name Screen DB Open Source database HCDB - OpenBIS

your name Image Processing and KNIME

your name Adam Srebniak Open Source KNIME WEB

DESKTOP

your name Dharmaco Qiagen Ambion nannot annot Annot database database database

Library Oligo dharmacon Oligo qiagen Oligo ambion files With target gene With target gene With target gene

www External databases Gene

Cross reference database for gene/oligo annotation + Workflow off-target prediction

OpenBis

your name Image Processing

One of the first Open Source: CellProfiler your name

High Content Image Processing (HCIP)

your name

High Content Image Processing (HCIP)

your name Teaching module

Slawek Mazur your name HCDC-HITS

Bio-Formats developed by OME software (Jason Swedlow), UW-Madisonyour name LOCI and Glencoe Software. Gabor Bakos HCDC-HITS

your name Gabor Bakos HCDC-HITS

your name HCDC-HITS

your name HCDC-HITS

your name Visualization Improvement

your name Lukasz Zwolinski, ETH – PWR Student Acknowledgement Bioquant Heidelberg and ETH Holger Erfle, Karol Kozak, Berend Rind Juergen Reymann, Gabor Csucs, Adam Srebniak Slawek Mazur, Sandra Kaestner Welcome to join

TU Konstanz Trinity College IR MPI-IB, Berlin DE TU Breslau: Dorit Merhof Anthony Davies Peter Braun Karol Kozak Andre Maeurer Lukasz Miroslaw your name