https://www.alphaknockout.com

Mouse Lsm1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Lsm1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Lsm1 (NCBI Reference Sequence: NM_026032 ; Ensembl: ENSMUSG00000037296 ) is located on Mouse 8. 4 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 4 (Transcript: ENSMUST00000038421). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Lsm1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-238N9 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 29.07% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 1451 bp, and the size of intron 3 for 3'-loxP site insertion: 8135 bp. The size of effective cKO region: ~616 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 4 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Lsm1 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7116bp) | A(28.36% 2018) | C(22.82% 1624) | T(29.71% 2114) | G(19.11% 1360)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 + 25790428 25793427 3000 browser details YourSeq 317 177 1299 3000 92.8% chr15 - 84521180 84878057 356878 browser details YourSeq 208 169 1549 3000 89.1% chr1 + 126866446 127155755 289310 browser details YourSeq 159 1115 1320 3000 93.1% chr4 - 142165765 142165976 212 browser details YourSeq 159 1134 1318 3000 95.0% chr1 - 156552287 156552896 610 browser details YourSeq 152 1135 1319 3000 95.8% chr6 - 118463428 118463618 191 browser details YourSeq 151 1137 1462 3000 92.2% chr5 - 110604726 110605056 331 browser details YourSeq 151 1132 1320 3000 94.7% chr19 + 28660706 28660899 194 browser details YourSeq 151 1100 1288 3000 96.4% chr11 + 95885762 95886313 552 browser details YourSeq 149 1132 1288 3000 96.2% chr16 - 10996042 10996197 156 browser details YourSeq 148 1135 1299 3000 93.1% chr11 - 106758674 106758833 160 browser details YourSeq 148 1133 1323 3000 90.4% chr11 - 67023099 67023281 183 browser details YourSeq 148 1118 1288 3000 92.2% chr16 + 93917140 93917307 168 browser details YourSeq 145 1118 1288 3000 93.0% chr19 - 41827628 41828167 540 browser details YourSeq 145 1134 1288 3000 95.5% chr17 - 83579273 83579426 154 browser details YourSeq 144 1135 1299 3000 91.9% chr17 - 83714834 83714993 160 browser details YourSeq 144 1135 1292 3000 94.3% chr12 - 3860516 3860672 157 browser details YourSeq 144 1134 1299 3000 92.5% chr10 - 84958944 84959106 163 browser details YourSeq 144 1135 1319 3000 89.2% chr7 + 126738757 126738941 185 browser details YourSeq 144 1133 1288 3000 94.9% chr7 + 44402485 44402639 155

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 + 25794044 25797043 3000 browser details YourSeq 1003 199 2414 3000 84.7% chr12 + 110320931 110322917 1987 browser details YourSeq 974 448 2562 3000 84.8% chr11 - 99768147 99770191 2045 browser details YourSeq 894 535 2405 3000 83.1% chr10 + 88238934 88240537 1604 browser details YourSeq 891 242 2547 3000 86.4% chr17 + 8707756 8709662 1907 browser details YourSeq 886 197 1943 3000 86.2% chr10 - 90980677 90982352 1676 browser details YourSeq 879 423 1730 3000 88.1% chr2 - 152966445 152967819 1375 browser details YourSeq 865 195 1769 3000 84.5% chr1 - 190842063 190843621 1559 browser details YourSeq 833 467 2363 3000 83.5% chr12 + 54938096 54939722 1627 browser details YourSeq 831 539 2410 3000 83.3% chr12 + 28444683 28446321 1639 browser details YourSeq 826 211 2093 3000 85.8% chr7 - 57302154 57303964 1811 browser details YourSeq 824 157 1916 3000 85.7% chr10 + 24748151 24749787 1637 browser details YourSeq 821 212 2412 3000 85.6% chr18 - 35074497 35076458 1962 browser details YourSeq 810 451 2093 3000 85.5% chr9 - 37288240 37290046 1807 browser details YourSeq 802 679 2566 3000 83.8% chr15 - 7355230 7356952 1723 browser details YourSeq 794 184 1926 3000 84.3% chr9 + 54258486 54260128 1643 browser details YourSeq 792 155 2086 3000 86.0% chr9 + 15803117 15805169 2053 browser details YourSeq 790 719 2402 3000 86.0% chr2 + 33297038 33298520 1483 browser details YourSeq 784 211 1926 3000 84.6% chr9 - 81426998 81428644 1647 browser details YourSeq 784 679 2370 3000 83.7% chr10 + 5932951 5934448 1498

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Lsm1 LSM1 homolog, mRNA degradation associated [ Mus musculus (house mouse) ] Gene ID: 67207, updated on 12-Aug-2019

Gene summary

Official Symbol Lsm1 provided by MGI Official Full Name LSM1 homolog, mRNA degradation associated provided by MGI Primary source MGI:MGI:1914457 See related Ensembl:ENSMUSG00000037296 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as CASM; 2810025O06Rik Expression Ubiquitous expression in CNS E18 (RPKM 6.5), CNS E11.5 (RPKM 6.1) and 28 other tissues See more Orthologs human all

Genomic context

Location: 8; 8 A2 See Lsm1 in Genome Data Viewer

Exon count: 5

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (25785318..25803975)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (26896063..26914447)

Chromosome 8 - NC_000074.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Lsm1 ENSMUSG00000037296

Description LSM1 homolog, mRNA degradation associated [Source:MGI Symbol;Acc:MGI:1914457] Gene Synonyms 2810025O06Rik, U6 small nuclear RNA associated Location : 25,785,288-25,803,975 forward strand. GRCm38:CM001001.2 About this gene This gene has 4 transcripts (splice variants), 206 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 13 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Lsm1- ENSMUST00000038421.7 2663 133aa ENSMUSP00000041022.6 Protein coding CCDS22202 Q544C9 TSL:1 201 Q8VC85 GENCODE basic APPRIS P1

Lsm1- ENSMUST00000211168.1 343 21aa ENSMUSP00000147832.1 Protein coding - A0A1B0GS83 TSL:3 203 GENCODE basic

Lsm1- ENSMUST00000211670.1 600 43aa ENSMUSP00000147348.1 Nonsense mediated - A0A1B0GR26 TSL:3 204 decay

Lsm1- ENSMUST00000210647.1 672 No - Retained intron - - TSL:1 202 protein

38.69 kb Forward strand 25.78Mb 25.79Mb 25.80Mb 25.81Mb (Comprehensive set... Lsm1-201 >protein coding Star-202 >protein coding

Lsm1-202 >retained intron Star-201 >protein coding

Lsm1-203 >protein coding

Lsm1-204 >nonsense mediated decay

Contigs AC156990.11 > < AC162367.5 AC122752.10 >

Genes < Bag4-201protein coding (Comprehensive set...

< Bag4-203retained intron

< Bag4-202retained intron

Regulatory Build

25.78Mb 25.79Mb 25.80Mb 25.81Mb Reverse strand 38.69 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000038421

18.69 kb Forward strand

Lsm1-201 >protein coding

ENSMUSP00000041... Coiled-coils (Ncoils) Superfamily LSM domain superfamily SMART LSM domain, eukaryotic/archaea-type Pfam LSM domain, eukaryotic/archaea-type PANTHER PTHR15588:SF8

PTHR15588 Gene3D 2.30.30.100 CDD Sm-like protein Lsm1

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend synonymous variant

Scale bar 0 20 40 60 80 100 133

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7