https://www.alphaknockout.com

Mouse Lsm10 Knockout Project (CRISPR/Cas9)

Objective: To create a Lsm10 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Lsm10 (NCBI Reference Sequence: NM_138721 ; Ensembl: ENSMUSG00000050188 ) is located on Mouse 4. 2 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 2 (Transcript: ENSMUST00000055575). Exon 2 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 0.27% of the coding region. Exon 2 covers 100.0% of the coding region. The size of effective KO region: ~366 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2

Legends Exon of mouse Lsm10 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.85% 457) | C(23.3% 466) | T(28.5% 570) | G(25.35% 507)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.4% 448) | C(24.95% 499) | T(29.15% 583) | G(23.5% 470)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 + 126095854 126097853 2000 browser details YourSeq 230 9 375 2000 94.9% chr16 + 32649779 32650307 529 browser details YourSeq 228 64 376 2000 95.6% chr4 + 146532686 146533004 319 browser details YourSeq 223 85 561 2000 94.5% chr19 - 7000711 7001217 507 browser details YourSeq 213 85 376 2000 96.6% chr18 - 34662413 34662788 376 browser details YourSeq 203 110 380 2000 95.8% chr13 - 12374137 12374403 267 browser details YourSeq 201 107 375 2000 97.2% chr3 - 88279068 88279340 273 browser details YourSeq 200 185 618 2000 89.3% chr9 + 27696443 27696672 230 browser details YourSeq 200 107 377 2000 94.3% chr11 + 80149139 80149399 261 browser details YourSeq 200 107 375 2000 98.1% chr11 + 74690778 74691441 664 browser details YourSeq 196 107 376 2000 97.2% chr6 - 47968788 47969198 411 browser details YourSeq 196 180 392 2000 94.3% chr4 - 135256247 135256454 208 browser details YourSeq 196 85 366 2000 95.8% chr11 - 104323303 104323809 507 browser details YourSeq 195 183 773 2000 88.8% chr11 + 58255647 58255855 209 browser details YourSeq 194 183 395 2000 97.2% chr5 - 108486114 108486661 548 browser details YourSeq 193 69 374 2000 95.3% chr3 + 95149981 95150287 307 browser details YourSeq 193 172 383 2000 96.7% chr11 + 87452510 87452722 213 browser details YourSeq 192 117 379 2000 93.3% chr9 - 53607526 53608056 531 browser details YourSeq 192 179 392 2000 93.7% chr9 - 12884730 12884936 207 browser details YourSeq 191 183 379 2000 98.5% chr17 - 6333681 6333877 197

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 + 126098220 126100219 2000 browser details YourSeq 71 1359 1556 2000 79.3% chr3 - 113776282 113776408 127 browser details YourSeq 67 1353 1470 2000 90.4% chrX + 141621247 141621789 543 browser details YourSeq 66 1247 1390 2000 86.9% chr17 - 23389485 23389634 150 browser details YourSeq 63 1247 1469 2000 95.6% chr2 + 100845360 100845858 499 browser details YourSeq 62 1278 1463 2000 93.0% chr9 + 20379101 20379317 217 browser details YourSeq 59 758 821 2000 96.9% chr5 - 86417432 86417496 65 browser details YourSeq 59 1416 1516 2000 95.4% chr6 + 6913526 6913775 250 browser details YourSeq 59 1226 1470 2000 75.4% chr13 + 62373341 62373494 154 browser details YourSeq 58 1353 1469 2000 95.3% chr14 - 110243389 110243677 289 browser details YourSeq 56 1410 1469 2000 94.9% chr8 - 122597831 122597889 59 browser details YourSeq 56 1402 1468 2000 98.4% chr14 + 38165435 38165510 76 browser details YourSeq 56 1402 1469 2000 90.2% chr1 + 10728934 10728998 65 browser details YourSeq 55 1405 1469 2000 93.7% chr6 - 106134454 106134519 66 browser details YourSeq 55 1278 1575 2000 66.3% chr7 + 103823703 103823825 123 browser details YourSeq 54 1413 1469 2000 98.3% chr15 + 70581741 70581799 59 browser details YourSeq 53 1414 1468 2000 100.0% chr11 - 42024697 42024755 59 browser details YourSeq 53 1410 1469 2000 96.5% chr5 + 23313204 23313279 76 browser details YourSeq 52 1411 1469 2000 96.5% chr3 - 127264105 127264165 61 browser details YourSeq 52 1408 1468 2000 96.5% chr10 + 52634416 52634483 68

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Lsm10 U7 snRNP-specific Sm-like protein LSM10 [ Mus musculus (house mouse) ] Gene ID: 116748, updated on 10-Oct-2019

Gene summary

Official Symbol Lsm10 provided by MGI Official Full Name U7 snRNP-specific Sm-like protein LSM10 provided by MGI Primary source MGI:MGI:2151045 See related Ensembl:ENSMUSG00000050188 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Expression Ubiquitous expression in subcutaneous fat pad adult (RPKM 18.4), mammary gland adult (RPKM 16.0) and 28 other Orthologs tissues See more human all

Genomic context

Location: 4; 4 D2.2 See Lsm10 in Genome Data Viewer

Exon count: 3

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (126096562..126098584)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (125773897..125775828)

Chromosome 4 - NC_000070.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Lsm10 ENSMUSG00000050188

Description U7 snRNP-specific Sm-like protein LSM10 [Source:MGI Symbol;Acc:MGI:2151045] Location Chromosome 4: 126,096,623-126,098,584 forward strand. GRCm38:CM000997.2 About this gene This gene has 3 transcripts (splice variants), 190 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Lsm10-201 ENSMUST00000055575.7 943 122aa ENSMUSP00000061913.7 Protein coding CCDS18642 Q3UPL7 Q8QZX5 TSL:1 GENCODE basic APPRIS P1

Lsm10-203 ENSMUST00000179323.1 808 122aa ENSMUSP00000136585.1 Protein coding CCDS18642 Q3UPL7 Q8QZX5 TSL:2 GENCODE basic APPRIS P1

Lsm10-202 ENSMUST00000151831.1 352 93aa ENSMUSP00000119610.1 Protein coding - A8Y5G9 CDS 3' incomplete TSL:2

21.96 kb Forward strand 126.090Mb 126.095Mb 126.100Mb 126.105Mb (Comprehensive set... Oscp1-201 >protein coding Lsm10-201 >protein coding Stk40-202 >protein coding

Oscp1-203 >protein coding Lsm10-203 >protein coding Stk40-201 >protein coding

Oscp1-202 >retained intron Lsm10-202 >protein coding Stk40-206 >protein coding

Stk40-203 >lncRNA

Contigs AL627101.25 > AL731780.6 > Regulatory Build

126.090Mb 126.095Mb 126.100Mb 126.105Mb Reverse strand 21.96 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000055575

1.96 kb Forward strand

Lsm10-201 >protein coding

ENSMUSP00000061... Low complexity (Seg) Superfamily LSM domain superfamily SMART LSM domain, eukaryotic/archaea-type Pfam LSM domain, eukaryotic/archaea-type

PANTHER PTHR21196

Gene3D 2.30.30.100

CDD cd01733

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend synonymous variant

Scale bar 0 20 40 60 80 100 122

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8