Identification of Bacillus Subtilis RNA Genes Using Tiling Arrays Cyprien Guérin

Identification of Bacillus Subtilis RNA Genes Using Tiling Arrays Cyprien Guérin

Identification of Bacillus subtilis RNA genes using Tiling Arrays Cyprien Guérin, . Basysbio To cite this version: Cyprien Guérin, . Basysbio. Identification of Bacillus subtilis RNA genes using Tiling Arrays. Bioin- formatique des ARN, Feb 2012, Toulouse, France. hal-02804688 HAL Id: hal-02804688 https://hal.inrae.fr/hal-02804688 Submitted on 5 Jun 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Identication of Bacillus subtilis RNA genes using Tiling Arrays Cyprien GUÉRIN BaSysBio Consortium Summary High-resolution transcriptome Analysis of Tiling Array signals Exemple of new features discovered with TA Promoter and terminator predictions Perspectives 2/25 High-resolution transcriptome Systematic exploration of B. subtilis transcriptional landscape New genes/features discovery in Bacillus subtilis. Explore most of the bacterium's lifestyles: 1 wild-type strain, maybe better called prototype strain. 1 array design (Basysbio tiling array, Nimblegen technology) : strand-specic expression signal with a 22-bp step. 269 hybridizations sampling a maximum variety of lifestyles, 104 dierent biological conditions, most with 2-3 biological replicates (experiments). Growth on various media (rich/poor, solid/liquid, aerobic/anaerobic), sporulation, germination, competence, variety of stresses (including ethanol, salt, temperature, oxidative), etc. 3/25 High-resolution transcriptome Tiling array 22 bp 22 bp ≈ 380; 000 probes tiling the 4.2 Mbp Bacillus subtilis genome. Long probes (45-65 nt), lengths adjusted to achieve relative homogenous anity. 4/25 Analysis of Tiling Array signals Principles Automatic detection of Transcription Units with a HMM model [1], taking into account: normalization (with genomic DNA hybridizations): 1. probes are not isothermal, 2. response is not linear, 3. outliers are discarded. continuous variation of the signal. [1] Nicolas P., et al. (2009). Bioinformatics. 5/25 Analysis of Tiling Array signals Normalisation using chromosomal DNA log(genomic DNA) from x4 pooled data log(mRNA) log(mRNA) − log(genomic DNA) Probe anity is variable, despite the adjustment of probe lengths. 6/25 Analysis of Tiling Array signals Shift and drift signal level 6 8 10 12 14 16 CDSs moves 1100000 1102000 1104000 1106000 1108000 1110000 position on chromosome (bp) 7/25 Exemple of new features discovered with TA RNA genes 1 228 001 1 238 000 mecA yjbF yjbG yjbL yjbM yjbN yjbH yjbI yjbJ yjbK yjbO yjbE 2.946 2.845 8/25 Exemple of new features discovered with TA Coding sequences 1 070 001 1 080 000 yhaI ecsA ecsB ecsC prsA yhaK hpr yhaH yhaG serC hit yhaA yhaJ Sequence features annotation Transcriptome Forward strand Log(2) ratio Fwd 5.667 Backward strand Log(2) ratio Bwd 9/25 2.409 Exemple of new features discovered with TA Antisense related to stress 3 567 001 3 577 000 yvcN crh yvcL yvcK yvcJ yvcI trxB yvcE yvcD 10/25 3.008 3.393 7.624 2.639 6.296 3.111 2.848 4.37 Exemple of new features discovered with TA A few numbers In B. subtilis annotation v3: 4,256 CDSs, 5 RNA genes, 30 rRNAs, 86 tRNAs, 57 (-1) 5' cis-acting regions. New features discovered with TA: 44 new CDSs, 136 new RNA genes, 423 antisense signals (including 4 CDSs and 87 RNA genes), 92 5' cis-acting regions (conrmed for 56), 676 long 5'UTR regions and 125 long 3'UTR regions. 11/25 Exemple of new features discovered with TA Combining gene expression with ChIP/chip (CcpN) 2 962 001 2 972 000 ytbD ytbE dnaI dnaB ytcG speD gapB ytcD ytaG ytaF mutM Sequence features annotation Transcriptome Forward strand Glucose to Malate Backward strand Forward strand Malate to Glucose Backward strand CcpN DNA binding (CHiP/chip) 12/25 Exemple of new features discovered with TA Combining RNA gene expression with ChIP/chip (CcpN) 1 528 001 1 538 000 SR1 pdhA pdhB pdhC pdhD yktA ykzI yktC ykzC slp speA yktB 13/25 Promoter and terminator predictions From upshifts to promoters TSSs position estimation using TA compared to RNA-Seq data [1]. Frequency 0 50 100 150 −100 −50 0 50 100 Distance between upshifts and TSSs 14/25 [1] Irnov I., et al. (2010). Nucleic Acids Res.. Promoter and terminator predictions From upshifts to promoters Summarizing correlations between promoter activities. Cluster Dendrogram Height 0.0 0.1 0.2 0.3 0.4 0.5 A 'promoter tree' is built by hierarchical clustering using average linkage on the dissimilarity matrix di;j = (1 − ri;j )=2 2 [0; 1] where ri;j is the correlation between activities of promoters i and j. 15/25 Promoter and terminator predictions From upshifts to promoters TSS −35 boxspacer −10 box background PWM2 PWM1 l2 S l1 D Promoters prediction: unsupervised algorithm for modeling of bipartite degenerate motifs [1], clustering of sequences from the 3,242 transcription upshifts. [1] Nicolas P., et al. (2012). Science. 16/25 Promoter and terminator predictions From upshifts to promoters Behavior of the MCMC algorithm, with K = 20 motifs 17/25 Promoter and terminator predictions From upshifts to promoters Comparison with known Sigma factor binding sites DBTBS: a database of transcriptional regulation in Bacillus subtilis DBTBS M19 M14 M4 M3 M7 M5 M16 M8 M11 M13 M17 M9 M1 M15 M10 - M2 M18 M20 M6 M12 - 401 369 349 213 218 170 170 134 127 113 80 43 63 72 48 44 16 11 12 4 5 SigA 59 90 49 1 33 1 22 0 1 0 19 0 1 0 1 1 0 0 0 7 0 SigB 0000000044000000000000 SigD 0000100000100023000000 SigE 0015404010000100000000 SigF 0008000101000010000000 SigG 0000000420000000000000 SigH 0001001100011200000000 SigI 000000000000000010000 SigK 1001038000000010000000 SigL 000000000000000006000 SigM 000000000001000000000 SigW 0010000000033000000000 SigX 000000000002000000000 SigY 000000000002000000000 Sequence logos to represent motifs 18/25 Promoter and terminator predictions From upshifts to promoters Predicted promoters: 758 promoters in DBTBS, 2,935 predicted promoters using algorithm above, 580 promoters in commun, 2,355 new promoters discovered. 46% genes with multiple promoters. 19/25 Promoter and terminator predictions Terminators and downshifts Terminator predictions: 3,510 putative sites from genome-scan with Petrin Software [1], identication of 2,126 high condence down-shift sites, 1,501 putative terminators conrmed by downshifts ( 70% of down-shifts). Three types of terminations: sharp, partial, missed termination. [1] d'Aubenton-Carafa Y., et al. (1990). J. Mol. Biol.. 20/25 Promoter and terminator predictions A few examples 1 070 001 1 078 000 1 230 001 1 234 000 yhaL coiA yhaI yhzF pepF yhzE yhaJ yhaH serC ecsA prsA hinT scoC trpP yizD U930.B U935.A1 U797.H U792.E U794.K U799.A4 U803.E U931.H D627 D536 D540 D544 U932.G D629 D535 D537 D539 D541 D542 D543 D545 U933.A1 D628 U793.A1 U798.A5 U802.A3 U804.A3 U805.A3 U934.E U795.M15 U796.A5 U801.A4 U806.A1 U807.G yhaL yhzE S349 yhaI yhzF ecsA coiA pepF S415 S347 S352 S354 S414 prsA yhaJ scoC yhaH trpP serC hinT yizD S348 S351 S353 S355 S356 S357 a b 21/25 Promoter and terminator predictions A few more examples 2 839 001 2 843 000 1 297 001 1 301 000 694 001 698 000 yrzT ndh yebD yrzF yrbD yjlB uxaC pbuG yrbE yrzH yjlA rex yebC Purine U2147.A1U2150.A4 U1005.B U2148.E U2151.E U1006.A5 U493.A5 U494.W D1421D1423 U2152.M21 D684 D314 D316 D1422 D313 D317 U2149.A7 U1008.H yrzFyrzT yrbD yjlBS451 ndh uxaC pbuG S228 S230 S1053 S450 rex yebC yebD yrzH yrbE S1051 yjlA S1052 S449 a b c 22/25 Perspectives Huge set of expression data (104 conditions) on gene repertoire for B. subtilis: functional annotation (CDSs, RNA genes, etc.). Antisense and transcription accuracy in bacteria: biological function, bias with alternative promoters, majority of signals with missed termination, promoters for antisense less conserved than promoters for CDSs. 23/25 Thank you 1 070 001 1 078 000 1 230 001 1 234 000 yhaL coiA yhaI yhzF pepF yhzE yhaJ yhaH serC ecsA prsA hinT scoC trpP yizD U930.B U935.A1 U797.H U792.E U794.K U799.A4 U803.E U931.H D627 D536 D540 D544 U932.G D629 D535 D537 D539 D541 D542 D543 D545 U933.A1 D628 U793.A1 U798.A5 U802.A3 U804.A3 U805.A3 U934.E U795.M15 U796.A5 U801.A4 U806.A1 U807.G yhaL yhzE S349 yhaI yhzF ecsA coiA pepF S415 S347 S352 S354 S414 prsA yhaJ scoC yhaH trpP serC hinT yizD S348 S351 S353 S355 S356 S357 a b 24/25 Thank you 25/25.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    26 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us