Mouse Lsm10 Knockout Project (CRISPR/Cas9)

https://www.alphaknockout.com Mouse Lsm10 Knockout Project (CRISPR/Cas9) Objective: To create a Lsm10 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Lsm10 gene (NCBI Reference Sequence: NM_138721 ; Ensembl: ENSMUSG00000050188 ) is located on Mouse chromosome 4. 2 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 2 (Transcript: ENSMUST00000055575). Exon 2 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 2 starts from about 0.27% of the coding region. Exon 2 covers 100.0% of the coding region. The size of effective KO region: ~366 bp. The KO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele 5' gRNA region gRNA region 3' 1 2 Legends Exon of mouse Lsm10 Knockout region Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot (up) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats. Overview of the Dot Plot (down) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats. Page 3 of 8 https://www.alphaknockout.com Overview of the GC Content Distribution (up) Window size: 300 bp Sequence 12 Summary: Full Length(2000bp) | A(22.85% 457) | C(23.3% 466) | T(28.5% 570) | G(25.35% 507) Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions. Overview of the GC Content Distribution (down) Window size: 300 bp Sequence 12 Summary: Full Length(2000bp) | A(22.4% 448) | C(24.95% 499) | T(29.15% 583) | G(23.5% 470) Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis. Page 4 of 8 https://www.alphaknockout.com BLAT Search Results (up) QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ----------------------------------------------------------------------------------------------- browser details YourSeq 2000 1 2000 2000 100.0% chr4 + 126095854 126097853 2000 browser details YourSeq 230 9 375 2000 94.9% chr16 + 32649779 32650307 529 browser details YourSeq 228 64 376 2000 95.6% chr4 + 146532686 146533004 319 browser details YourSeq 223 85 561 2000 94.5% chr19 - 7000711 7001217 507 browser details YourSeq 213 85 376 2000 96.6% chr18 - 34662413 34662788 376 browser details YourSeq 203 110 380 2000 95.8% chr13 - 12374137 12374403 267 browser details YourSeq 201 107 375 2000 97.2% chr3 - 88279068 88279340 273 browser details YourSeq 200 185 618 2000 89.3% chr9 + 27696443 27696672 230 browser details YourSeq 200 107 377 2000 94.3% chr11 + 80149139 80149399 261 browser details YourSeq 200 107 375 2000 98.1% chr11 + 74690778 74691441 664 browser details YourSeq 196 107 376 2000 97.2% chr6 - 47968788 47969198 411 browser details YourSeq 196 180 392 2000 94.3% chr4 - 135256247 135256454 208 browser details YourSeq 196 85 366 2000 95.8% chr11 - 104323303 104323809 507 browser details YourSeq 195 183 773 2000 88.8% chr11 + 58255647 58255855 209 browser details YourSeq 194 183 395 2000 97.2% chr5 - 108486114 108486661 548 browser details YourSeq 193 69 374 2000 95.3% chr3 + 95149981 95150287 307 browser details YourSeq 193 172 383 2000 96.7% chr11 + 87452510 87452722 213 browser details YourSeq 192 117 379 2000 93.3% chr9 - 53607526 53608056 531 browser details YourSeq 192 179 392 2000 93.7% chr9 - 12884730 12884936 207 browser details YourSeq 191 183 379 2000 98.5% chr17 - 6333681 6333877 197 Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found. BLAT Search Results (down) QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ----------------------------------------------------------------------------------------------- browser details YourSeq 2000 1 2000 2000 100.0% chr4 + 126098220 126100219 2000 browser details YourSeq 71 1359 1556 2000 79.3% chr3 - 113776282 113776408 127 browser details YourSeq 67 1353 1470 2000 90.4% chrX + 141621247 141621789 543 browser details YourSeq 66 1247 1390 2000 86.9% chr17 - 23389485 23389634 150 browser details YourSeq 63 1247 1469 2000 95.6% chr2 + 100845360 100845858 499 browser details YourSeq 62 1278 1463 2000 93.0% chr9 + 20379101 20379317 217 browser details YourSeq 59 758 821 2000 96.9% chr5 - 86417432 86417496 65 browser details YourSeq 59 1416 1516 2000 95.4% chr6 + 6913526 6913775 250 browser details YourSeq 59 1226 1470 2000 75.4% chr13 + 62373341 62373494 154 browser details YourSeq 58 1353 1469 2000 95.3% chr14 - 110243389 110243677 289 browser details YourSeq 56 1410 1469 2000 94.9% chr8 - 122597831 122597889 59 browser details YourSeq 56 1402 1468 2000 98.4% chr14 + 38165435 38165510 76 browser details YourSeq 56 1402 1469 2000 90.2% chr1 + 10728934 10728998 65 browser details YourSeq 55 1405 1469 2000 93.7% chr6 - 106134454 106134519 66 browser details YourSeq 55 1278 1575 2000 66.3% chr7 + 103823703 103823825 123 browser details YourSeq 54 1413 1469 2000 98.3% chr15 + 70581741 70581799 59 browser details YourSeq 53 1414 1468 2000 100.0% chr11 - 42024697 42024755 59 browser details YourSeq 53 1410 1469 2000 96.5% chr5 + 23313204 23313279 76 browser details YourSeq 52 1411 1469 2000 96.5% chr3 - 127264105 127264165 61 browser details YourSeq 52 1408 1468 2000 96.5% chr10 + 52634416 52634483 68 Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found. Page 5 of 8 https://www.alphaknockout.com Gene and protein information: Lsm10 U7 snRNP-specific Sm-like protein LSM10 [ Mus musculus (house mouse) ] Gene ID: 116748, updated on 10-Oct-2019 Gene summary Official Symbol Lsm10 provided by MGI Official Full Name U7 snRNP-specific Sm-like protein LSM10 provided by MGI Primary source MGI:MGI:2151045 See related Ensembl:ENSMUSG00000050188 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Expression Ubiquitous expression in subcutaneous fat pad adult (RPKM 18.4), mammary gland adult (RPKM 16.0) and 28 other Orthologs tissues See more human all Genomic context Location: 4; 4 D2.2 See Lsm10 in Genome Data Viewer Exon count: 3 Annotation release Status Assembly Chr Location 108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (126096562..126098584) Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (125773897..125775828) Chromosome 4 - NC_000070.6 Page 6 of 8 https://www.alphaknockout.com Transcript information: This gene has 3 transcripts Gene: Lsm10 ENSMUSG00000050188 Description U7 snRNP-specific Sm-like protein LSM10 [Source:MGI Symbol;Acc:MGI:2151045] Location Chromosome 4: 126,096,623-126,098,584 forward strand. GRCm38:CM000997.2 About this gene This gene has 3 transcripts (splice variants), 190 orthologues and is a member of 1 Ensembl protein family. Transcripts Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags Lsm10-201 ENSMUST00000055575.7 943 122aa ENSMUSP00000061913.7 Protein coding CCDS18642 Q3UPL7 Q8QZX5 TSL:1 GENCODE basic APPRIS P1 Lsm10-203 ENSMUST00000179323.1 808 122aa ENSMUSP00000136585.1 Protein coding CCDS18642 Q3UPL7 Q8QZX5 TSL:2 GENCODE basic APPRIS P1 Lsm10-202 ENSMUST00000151831.1 352 93aa ENSMUSP00000119610.1 Protein coding - A8Y5G9 CDS 3' incomplete TSL:2 21.96 kb Forward strand 126.090Mb 126.095Mb 126.100Mb 126.105Mb Genes (Comprehensive set... Oscp1-201 >protein coding Lsm10-201 >protein coding Stk40-202 >protein coding Oscp1-203 >protein coding Lsm10-203 >protein coding Stk40-201 >protein coding Oscp1-202 >retained intron Lsm10-202 >protein coding Stk40-206 >protein coding Stk40-203 >lncRNA Contigs AL627101.25 > AL731780.6 > Regulatory Build 126.090Mb 126.095Mb 126.100Mb 126.105Mb Reverse strand 21.96 kb Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Gene Legend Protein Coding merged Ensembl/Havana Ensembl protein coding Non-Protein Coding processed transcript RNA gene Page 7 of 8 https://www.alphaknockout.com Transcript: ENSMUST00000055575 1.96 kb Forward strand Lsm10-201 >protein coding ENSMUSP00000061... Low complexity (Seg) Superfamily LSM domain superfamily SMART LSM domain, eukaryotic/archaea-type Pfam LSM domain, eukaryotic/archaea-type PANTHER PTHR21196 Gene3D 2.30.30.100 CDD cd01733 All sequence SNPs/i... Sequence variants (dbSNP and all other sources) Variant Legend synonymous variant Scale bar 0 20 40 60 80 100 122 We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC. Page 8 of 8.

Mouse Lsm10 Knockout Project (CRISPR/Cas9)

Supplementary Data

Noelia Díaz Blanco

WO 2019/079361 Al 25 April 2019 (25.04.2019) W 1P O PCT

WO 2012/174282 A2 20 December 2012 (20.12.2012) P O P C T

Characterization and Mapping of Human Genes Encoding Zinc Finger Proteins (Transcription/Chromosome/Sequence-Tagged Site) P

Nº Ref Uniprot Proteína Péptidos Identificados Por MS/MS 1 P01024

Genomic Approach in Idiopathic Intellectual Disability Maria De Fátima E Costa Torres

Staged Assembly of Histone Gene Expression Machinery at Subnuclear Foci in the Abbreviated Cell Cycle of Human Embryonic Stem Cells

Content Based Search in Gene Expression Databases and a Meta-Analysis of Host Responses to Infection

Differences and Similarities Between Drosophila and Mammalian 3¢ End Processing of Histone Pre-Mrnas

NIH Public Access Author Manuscript Nat Rev Genet

NOVA1 Regulates Htert Splicing and Cell Growth in Non-Small Cell Lung Cancer