KNIME Bioinformatics Extensions
Karol Kozak
ETH Zurich
January 2011, Zurich, KNIME SIG your name Start HCS
your name HCS Experiments and Informatics
your name HCS & Open Source
HCDB – OpenBIS & Library Database
Bioinformatics KNIME KNIME Matlab Off-Target LIMS WEKA Nodes KNIME, Java Cell Classifiers
Spotfire KNIME KNIME Matlab your name Research interest
Role Software Classification Database engineering + RNA technology Pattern understanding modeling recognition
Annotation database Role of Bioinformatics in RNAi technology to detect functionality of Off-target genes in mammalian cells prediction -New prediction algorithms including Kernel methods -Post analysis of existing hits 2D/3D Structure relationships -Traditional bioinformatics: homology, blast, alignment score -Database development Gene functionality your name RNAi Libraries
- Genome wide - Functional groups (Kinases..) - n - Ologinucleotides - Pooled
your name RNAi Libraries - Qiagen - Thermo Fisher Scientific (Dharmacon) - Applied Biosystems (Ambion) - Sigma-Aldrich (esiRNA)
your name RNAi Library evlolution
V1 Purchase Sigma esiRNA AppBiosyst Qiagen TFisher annot annot annot Annot database database database database
V2 Dharmacon
annot TIME database
V3 Less genes Dharmacon More known oligo Today annot New design database Transcripts analysis Off-target information
your name Annotation of human genome & Meier Roger Reliability of siRNA libraries Qiagen genome wide siRNA library (HsDgV3 (Human Druggable Genome siRNA Set V3); HsNmV1 (Human Refseq Xm siRNA Set V1); HsXmV1 (Human Predicted genome Set V1)) 2006: 22’832 genes / 90’728 siRNAs 2010: 16’199 genes
0.07 % 0.11 % 0.01 % 0.23 % 0.01 % % of genes target by: 12.512.5% % 10.310.3% % 1 siRNA 2 siRNA 30.630.6% % 41 % 3939% % 3 siRNA 16.516.5% % 41% 25.625.6% % 4 siRNA 71 % 5 siRNA 71% 6 siRNA 3333% % 7 siRNA 2020% % 8 siRNA 9 siRNA
target genes with off target(s) siRNAs with off target(s) wrong predicted genes siRNAs against wrong predicted genes target genes siRNAs w/o off target(s) your name Based on GENEID
your name Annotated off-target by companies
How did companies select these off targets siRNA ?
- There are eliminated a lot of Ribosomal siRNA - There are eliminated a lot of siRNA against "membrane" proteins (2061 in old list, 945 in clean list) - did not eliminate many siRNA against "kinase" proteins (2624 in old list, 1605 in list) - besides the 2800 proteins that there are discarded slightly enriched in virus related pathways
your name Library reduction
• siRNA without geneID? • The web site shows the gene that the siRNA matched at the time it was selected. The database table from which we get current annotation shows what the siRNA matches in the current Refseq.
your name Library handling
From pooled 2 Oligo based
Library analysis
your name Bioinformatics - RNAi
-Off-target effect
UUGCCGUACAGGAUGGACGtg UUAACUGAUGUUCCAAUCCtg
your name Sandra Kaestner Bioinformatics
your name Bioinformatics
your name Workflow
your name Project Ribosome biogenesis translation of ribosomal proteins
maturation and export of 80S ribosome ribosomal mRNAs ribosomal proteins
80S ribosome
Pol II mRNA mature 40S subunit transcription of ribosomal mRNAs maturation and export of pre-40S particle mature 60S subunit pre-40S particle final maturation 90S particle pre-60S particle
large ribosomal proteins Pol I Pol III maturation and 35S rRNA 5S rRNA export of pre-60Sparticle small ribosomal rDNA proteins transcription of ribosomal RNAs trans-acting nucleolus nucleoplasm cytoplasm factors
18S rRNA maturation 18S 5.8S 25S 18S 18S 18S
35S rRNA 23S rRNA 20S rRNA 20S rRNA 18S rRNAyour name Thomas Wild The Rps2-YFP read out
80S ribosome ribosomal proteins
80S ribosome
mature 40S subunit
mature 60S subunit
Nucleolus Nucleoplasm Cytoplasm
your name The Rps2-YFP read out
80S ribosome ribosomal proteins
80S ribosome
mature 40S subunit
mature 60S subunit
Nucleolus Nucleoplasm Cytoplasm
your name The Rps2-YFP read out
80S ribosome ribosomal proteins
80S ribosome
mature 40S subunit
mature 60S subunit
Nucleolus Nucleoplasm Cytoplasm
your name The Rps2-YFP read out
80S ribosome ribosomal proteins
80S ribosome
mature 40S subunit
mature 60S subunit
Nucleolus Nucleoplasm Cytoplasm
your name Biogenesis
siRNA 84062- DTNBP1 AACCTTCAAAGCTGAACTAGA DTNBP1
HCDC RPS3 Transcript NM_001005
your name Results Biogenesis
Results
RNAi RNAi Library Hit Library + Results = Wonder Eg. 4 oligo Hit LIST
Off-target Real in HIT LIST Off-target in Potential Off-targets Hit LIST Results HIT LIST
your name Analysis Off targets
Known Off-targets
Build model
your name AllStars MVP_si03 MVP_si06
Uncoating
Acidification
AllStars si03 si06
MVP
Tubulin your name Virus screen 1 out of 3
siRNA TGGGCCTGAGATGCAGGTAAA MVP NM_003010
HCDC Qiagen off-target SM MAP2K4 MAP2K4 6416|NM_003010|2890|AS
your name mRNA 2D variants
mRNA
siRNA
- 2D structure relation to Off-target effects - We can model 2D structures quite robust (Metaserver, Python) - We can predict potential-target effects - We must find relation?
your name mRNA
We want to identify structural motifs in a set of mRNA sequences
your name mRNA 3D Structure Nature Movie
Nature Movie
- Dicer, tRNA, - Off-target relations RNA-RNA i RNP-RNA - Model how RISC is related to microRNA/siRNA and how it finds own target and bind to him your name RNA 3D
your name your name Workflow
ModeRNA Python BioPython for parsing structural data from the PDB format
your name
DatabaseHCS and architecture databases (LIMS)
Library and annotation
But need maintenance Screening experiment Sample, Results, Management, View
Done
Public database Phenotypic data
your name OpenBis - Open Source database
your name Screen DB Open Source database HCDB - OpenBIS
your name Open Source database OpenBIS (ETH) -web client -Command line client -Java technology, GWT Google
your name Screen DB Open Source database HCDB - OpenBIS
your name Screen DB Open Source database HCDB - OpenBIS
Adam Srebniak your name Screen DB Open Source database HCDB - OpenBIS
your name Image Processing and KNIME
your name Adam Srebniak Open Source KNIME WEB
DESKTOP
your name Dharmaco Qiagen Ambion nannot annot Annot database database database
Library Oligo dharmacon Oligo qiagen Oligo ambion files With target gene With target gene With target gene
www External databases Gene
Cross reference database for gene/oligo annotation + Workflow off-target prediction
OpenBis
your name Image Processing
One of the first Open Source: CellProfiler your name
High Content Image Processing (HCIP)
your name
High Content Image Processing (HCIP)
your name Teaching module
Slawek Mazur your name HCDC-HITS
Bio-Formats developed by OME software (Jason Swedlow), UW-Madisonyour name LOCI and Glencoe Software. Gabor Bakos HCDC-HITS
your name Gabor Bakos HCDC-HITS
your name HCDC-HITS
your name HCDC-HITS
your name Visualization Improvement
your name Lukasz Zwolinski, ETH – PWR Student Acknowledgement Bioquant Heidelberg and ETH Holger Erfle, Karol Kozak, Berend Rind Juergen Reymann, Gabor Csucs, Adam Srebniak Slawek Mazur, Sandra Kaestner Welcome to join
TU Konstanz Trinity College IR MPI-IB, Berlin DE TU Breslau: Dorit Merhof Anthony Davies Peter Braun Karol Kozak Andre Maeurer Lukasz Miroslaw your name