https://www.alphaknockout.com

Mouse Smim20 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Smim20 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Smim20 (NCBI Reference Sequence: NM_001145433 ; Ensembl: ENSMUSG00000061461 ) is located on Mouse 5. 3 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 3 (Transcript: ENSMUST00000147148). Exon 1 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Smim20 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-440H20 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 1 covers 55.56% of the coding region. Start codon is in exon 1, and stop codon is in exon 3. The size of intron 1 for 3'-loxP site insertion: 9784 bp. The size of effective cKO region: ~375 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

gRNA region

Wildtype allele A gRNA region T

5' G 3'

1 3

Targeting vector A T G

Targeted allele A T G

Constitutive KO allele (After Cre recombination)

Legends Homology arm Exon of mouse Smim20 cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(6615bp) | A(24.34% 1610) | C(23.93% 1583) | T(27.12% 1794) | G(24.61% 1628)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr5 + 53264245 53267244 3000 browser details YourSeq 380 798 1251 3000 94.2% chr6 + 96820699 96823053 2355 browser details YourSeq 370 799 1251 3000 92.4% chr2 - 19167385 19167790 406 browser details YourSeq 367 783 1246 3000 93.9% chr13 + 83920990 83921449 460 browser details YourSeq 365 804 1246 3000 93.0% chr4 + 124495429 124495828 400 browser details YourSeq 364 800 1354 3000 90.7% chr8 - 46646281 46646712 432 browser details YourSeq 363 804 1253 3000 92.4% chr6 + 91083129 91083550 422 browser details YourSeq 363 804 1246 3000 91.5% chr5 + 18353677 18354114 438 browser details YourSeq 360 804 1246 3000 92.5% chr1 + 21899971 21900368 398 browser details YourSeq 359 804 1246 3000 92.2% chr9 - 62093529 62093926 398 browser details YourSeq 359 803 1246 3000 92.3% chr2 - 3497466 3497864 399 browser details YourSeq 359 804 1251 3000 91.6% chr17 - 16109992 16110393 402 browser details YourSeq 359 804 1246 3000 92.2% chr12 + 100966950 100967347 398 browser details YourSeq 358 802 1251 3000 91.3% chr9 - 71815065 71815463 399 browser details YourSeq 358 804 1246 3000 92.2% chr10 - 83265088 83265482 395 browser details YourSeq 358 795 1246 3000 91.4% chr19 + 37735112 37735528 417 browser details YourSeq 358 804 1261 3000 90.9% chr17 + 85206520 85206929 410 browser details YourSeq 357 804 1246 3000 91.9% chr6 + 56701519 56701915 397 browser details YourSeq 356 804 1246 3000 91.7% chr3 - 130841222 130841616 395 browser details YourSeq 356 803 1246 3000 91.7% chr10 - 130466080 130466474 395

Note: The 3000 bp section upstream of Exon 1 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr5 + 53267610 53270609 3000 browser details YourSeq 134 2681 2912 3000 81.3% chr7 - 124836091 124836328 238 browser details YourSeq 125 2696 2914 3000 79.2% chr11 - 65665967 65666182 216 browser details YourSeq 116 2704 2941 3000 79.5% chr12 - 90874063 90874299 237 browser details YourSeq 114 2696 3000 3000 77.4% chr10 - 88837533 88837800 268 browser details YourSeq 109 2714 3000 3000 79.8% chr1 - 139184075 139184334 260 browser details YourSeq 107 2682 2905 3000 82.1% chr18 + 46912423 46912807 385 browser details YourSeq 105 2729 3000 3000 84.5% chr2 - 117876059 117876316 258 browser details YourSeq 99 2682 2910 3000 85.2% chr11 - 41323915 41324145 231 browser details YourSeq 99 2715 2911 3000 87.0% chr19 + 40861069 40861262 194 browser details YourSeq 99 2714 3000 3000 73.7% chr12 + 77311266 77311470 205 browser details YourSeq 98 2696 2878 3000 80.0% chr6 + 114649854 114650036 183 browser details YourSeq 94 2779 2939 3000 83.3% chr12 + 98896093 98896251 159 browser details YourSeq 94 2684 2939 3000 89.9% chr1 + 36734598 36734857 260 browser details YourSeq 92 2672 3000 3000 93.6% chr1 + 190520523 190521295 773 browser details YourSeq 90 2716 2911 3000 84.9% chr10 - 43247191 43247380 190 browser details YourSeq 90 2713 2875 3000 77.8% chr5 + 9075895 9076052 158 browser details YourSeq 89 2724 2911 3000 80.8% chr6 - 24670695 24670876 182 browser details YourSeq 89 2724 2910 3000 85.4% chr1 + 16796967 16797147 181 browser details YourSeq 88 2720 2906 3000 77.1% chr9 - 10449449 10449634 186

Note: The 3000 bp section downstream of Exon 1 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Smim20 small integral membrane protein 20 [ Mus musculus (house mouse) ] Gene ID: 66278, updated on 12-Aug-2019

Gene summary

Official Symbol Smim20 provided by MGI Official Full Name small integral membrane protein 20 provided by MGI Primary source MGI:MGI:1913528 See related Ensembl:ENSMUSG00000061461 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as N28078; 1110067B18Rik; 1810013D10Rik Expression Ubiquitous expression in heart adult (RPKM 41.2), kidney adult (RPKM 31.5) and 28 other tissues See more Orthologs human all

Genomic context

Location: 5; 5 C1 See Smim20 in Genome Data Viewer

Exon count: 3

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (53267106..53278540)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (53658345..53669779)

Chromosome 5 - NC_000071.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Smim20 ENSMUSG00000061461

Description small integral membrane protein 20 [Source:MGI Symbol;Acc:MGI:1913528] Gene Synonyms 1110067B18Rik, 1810013D10Rik Location Chromosome 5: 53,267,083-53,278,540 forward strand. GRCm38:CM000998.2 About this gene This gene has 6 transcripts (splice variants), 191 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Smim20- ENSMUST00000147148.4 922 69aa ENSMUSP00000121637.2 Protein coding CCDS51502 D3Z7Q2 TSL:1 201 GENCODE basic APPRIS P1

Smim20- ENSMUST00000204218.2 1231 46aa ENSMUSP00000145521.1 Nonsense mediated - D6RCX0 TSL:3 204 decay

Smim20- ENSMUST00000204465.1 931 46aa ENSMUSP00000144695.1 Nonsense mediated - D6RCX0 TSL:1 205 decay

Smim20- ENSMUST00000204839.1 2076 No - Retained intron - - TSL:NA 206 protein

Smim20- ENSMUST00000203623.1 845 No - Retained intron - - TSL:2 202 protein

Smim20- ENSMUST00000204089.1 212 No - Retained intron - - TSL:5 203 protein

31.46 kb Forward strand

53.26Mb 53.27Mb 53.28Mb (Comprehensive set... Smim20-202 >retained intron Smim20-203 >retained introGnm45495-201 >lncRNA

Smim20-204 >nonsense mediated decay

Smim20-201 >protein coding

Smim20-205 >nonsense mediated decay

Smim20-206 >retained intron

Contigs < AC122522.4 < AC134463.3 Regulatory Build

53.26Mb 53.27Mb 53.28Mb Reverse strand 31.46 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000147148

11.40 kb Forward strand

Smim20-201 >protein coding

ENSMUSP00000121... Transmembrane heli... Pfam Small integral membrane protein 20 PANTHER Small integral membrane protein 20

All sequence SNPs/i... Sequence variants (dbSNP and all other sources) R

Variant Legend missense variant

Scale bar 0 6 12 18 24 30 36 42 48 54 60 69

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7