Supplemental Materials Table of Content Supplemental Methods
Total Page:16
File Type:pdf, Size:1020Kb
1 Supplemental Materials 2 Table of content 3 Supplemental Methods ..................................................................................................................................................... 2 4 Protein production ............................................................................................................................................. 2 5 Protein-RNA complex crystallization ......................................................................................................... 2 6 Structure determination and refinement ................................................................................................. 3 7 Selection library generation ........................................................................................................................... 3 8 Motif generation .................................................................................................................................................. 4 9 GO analysis and in vivo enrichment of the motifs ................................................................................. 7 10 Calculation of mutual information .............................................................................................................. 9 11 Supplemental Figures .................................................................................................................................................... 11 12 Supplemental Table ........................................................................................................................................................ 32 13 Supplemental Data .......................................................................................................................................................... 32 14 REFERENCES ..................................................................................................................................................................... 33 15 16 17 1 18 Supplemental Methods 19 20 Protein production 21 22 All constructs described in Supplemental Table S1 were expressed in Rosetta P3 DE LysS 23 E.coli using the autoinduction protocol described in (Jolma et al. 2015). Proteins were then purified 24 using HIS-tag based IMAC purification. Protein production was assessed in parallel by 96-well SDS- 25 PAGE (ePage, Invitrogen; see Supplemental Fig. S16). The success rate of protein production was 26 dependent on the size of the proteins, with most small RBDs expressing well in E.coli. Significantly 27 lower yield of protein was observed for full-length proteins larger than 50 kDa. All proteins were 28 subjected to HTR-SELEX, regardless of protein level expressed. For interim storage, glycerol was 29 added to a final concentration of 10%. Samples were split to single-use aliquots with approximately 30 200 ng RBP in a 5µl volume and frozen at -80°C. Expression and purification of the RNA-binding 31 domain fragment of human ZC3H12B including PIN (residues 179-354) and Zn-finger (residues 355- 32 397) domains was performed as described in (Savitsky et al. 2010). 33 34 Protein-RNA complex crystallization 35 36 The fragment of RNA composed from 21 ribonucleotides used in crystallization trials was 37 obtained from IDT. The RNA (sequence: 5’-A*AUGCGACAGUCGGUAGCAUC-3’) was protected from 38 non-specific RNases by phosphorothioation of the 5’ end (bond containing sulphur indicated by *). The 39 purified and concentrated protein was first mixed with a solution of RNA at a molar ratio 1:1.2 and after 40 one hour on ice it was subjected to the crystallization trials with several crystallization screens from 41 different vendors. The first crystals of 0.04 mm size were obtained in Nuc-Pro HTS screen from Jena 42 Bioscience and they diffracted to 6Å only. The conditions for bigger crystals were optimized in house. 43 Crystals of 0.1 mm size were grown in sitting drops from solution containing 50 mM Sodium cacodylate 44 buffer (pH 6.5), 80 mM MgCl2 and 5% MPD (2-Methyl-2,4-pentanediol). The lifetime of the crystals was 2 45 short (not more than 48 hours after the crystallization was set), suggesting that the RNA species was slowly 46 hydrolyzed, possibly by the ZC3H12B RNase itself. The single data set was collected at the beamline P13 47 at PETRA- III (EMBL, Hamburg, Germany) at 100 K. Data were processed and analyzed with the 48 autoPROC toolbox (Vonrhein et al. 2011) including the STARANISO routine due to the high anisotropy 49 (Tickle 2017). The datasets were indexed and integrated by XDS and scaled together with XSCALE. 50 Statistics of data collection are presented in Supplemental Table S5. 51 52 Structure determination and refinement 53 54 The structure was solved by molecular replacement using the program Phaser (McCoy et al. 55 2007) as implemented in CCP4 (Winn et al. 2011) with the structure of ZC3H12A PIN-domain 56 (pdb:3V33) as a search model. The density of the part of RNA was clear near the active site and the 57 molecule was built manually using COOT (Emsley et al. 2010). The rigid body refinement with 58 REFMAC5 was followed by restrain refinement with REFMAC5, as implemented in CCP4 (Winn et 59 al. 2011) and Phenix.refine (Afonine et al. 2012). The manual rebuilding of the model was performed 60 using COOT. The refinement statistics are presented in Supplemental Table S5. The first five amino 61 acids from N-termini and the Zn-finger part from C-termini were found disordered and were not built 62 in the maps. The four last nucleotides of the 3’- end were not visible in the maps as well, thus, were not 63 built to the map also. The protein model was validated using COOT and MOLPROBITY (Chen et al. 64 2010). The structural figures were prepared using PyMOL(TM) Molecular Graphics System (Version 65 1.8.6.0, Schrödinger, LLC). 66 67 Selection library generation 68 69 To produce a library of RNA sequences for selection (selection ligands), we first constructed 70 dsDNA templates by combining three oligonucleotides together in a three cycle PCR reaction (Phusion, 71 NEB). For information about the barcoded ligand design, see Supplemental Table S1. The ligand 3 72 design was similar to that used in our previous work analyzing TF binding specificities in dsDNA 73 (Jolma et al. 2013) except for the addition of a T7 RNA polymerase promoter in the constant flanking 74 regions of the ligand. RNA was expressed from the DNA-templates using T7 in vitro transcription 75 (Ampliscribe T7 High Yield Transcription Kit, Epicentre or Megascript-kit Ambion) according to 76 manufacturer’s instructions, after which the DNA-template was digested using RNAse-free DNAse I 77 (Epicentre) or the TURBO-DNAse supplied with the Megascript-kit. All RNA-production steps 78 included RiboGuard RNAse-inhibitor (Epicentre). 79 Two different approaches were used to facilitate the folding of RNA molecules. In the protocol 80 used in experiments where the batch identifier starts with letters “EM”, RNA-ligands were heated to 81 +70°C followed by gradual, slow cooling to allow the RNA to fold into minimal energy structures, 82 whereas in batches “AAG” and “AAH” RNA transcription was not followed by such folding protocol. 83 The rationale was that spontaneous co-transcriptional RNA-folding may better reflect folded RNA 84 structures in the in vivo context. In almost all of the cases where the same RBPs were tested with both 85 of the protocols the results were highly similar. 86 87 Motif generation 88 89 The motifs were generated based on Autoseed; Autoseed identifies gapped and ungapped kmers 90 that represent local maximal counts relative to similar sequences within their Huddinge neighborhood 91 (Nitta et al. 2015). It then generates a draft motif using each such kmer as a seed. This procedure makes 92 each generated PWM motif distinctly different from any other motif derived from the same data, by 93 ensuring that the count for each seed is higher than that of any subsequence that is shifted by one base, 94 or within a Hamming distance of one from the seed (see Supplemental Fig. S19B, and Supplementary 95 Figure 1 of Nitta et al., 2015). It is important to note that the resulting motifs are not generated from a 96 single set of aligned sequences, and that therefore the count for the base representing the consensus 97 sequence is constant, whereas the total counts in each column vary (Jolma et al. 2013). 4 98 This initial set of motifs is then refined manually to identify the final seeds (Supplemental 99 Table S2). The manual curation process was necessary to remove artefacts due to selection bottlenecks 100 (low complexity libraries), partial motifs that included constant linker sequences (displayed a strong 101 positional bias on the ligand), and motifs that were recovered from a large number of experiments; the 102 motifs recovered from many experiments were removed because they represent common “aptamer” 103 motifs that are enriched by the HTR-SELEX process itself, for example due to residual presence of 104 E.coli derived RNA-binding proteins, or binding of folded “aptamer” RNAs to the TRX fusion partner, 105 selection beads or plasticware (Supplemental Fig. S19C). To assess initial data, we compared the 106 deduced motifs to known motifs, to replicate experiments (same experiment run again or separate 107 experiment using full-length and RBD clones) and experiments performed with paralogous proteins. 108 We also note that each cycle of HTR-SELEX independently enriches motifs over the input cycle, 109 providing further evidence of reproducibility.