https://www.alphaknockout.com

Mouse Smchd1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Smchd1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Smchd1 (NCBI Reference Sequence: NM_028887 ; Ensembl: ENSMUSG00000024054 ) is located on Mouse 17. 48 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 48 (Transcript: ENSMUST00000127430). Exon 5 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Smchd1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-238I21 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis.

Note: Females homozygous for an ENU-induced allele die at midgestation showing placental defects and hypomethylation at X-linked that are normally subject to X-inactivation, whereas homozygous males are viable. Females homozygous for a gene trap allele die before E13.5, whereas males remain healthy.

Exon 5 starts from about 8.44% of the coding region. The knockout of Exon 5 will result in frameshift of the gene. The size of intron 4 for 5'-loxP site insertion: 682 bp, and the size of intron 5 for 3'-loxP site insertion: 6758 bp. The size of effective cKO region: ~1042 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 4 5 48 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Smchd1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7131bp) | A(26.77% 1909) | C(20.33% 1450) | T(31.95% 2278) | G(20.95% 1494)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr17 - 71455986 71458985 3000 browser details YourSeq 630 1 808 3000 93.5% chr10 + 128085588 128090273 4686 browser details YourSeq 270 1196 2174 3000 95.4% chr5 + 30979551 31270506 290956 browser details YourSeq 197 279 790 3000 90.0% chrX + 58225235 58225928 694 browser details YourSeq 166 1177 1369 3000 95.5% chr8 - 105816427 105816618 192 browser details YourSeq 166 1196 1386 3000 95.2% chr3 + 127582044 127582296 253 browser details YourSeq 164 1204 1383 3000 97.2% chr17 - 88249451 88249701 251 browser details YourSeq 164 1196 1371 3000 95.4% chr12 + 56237339 56237512 174 browser details YourSeq 162 1196 1374 3000 96.0% chr6 - 32865813 32865997 185 browser details YourSeq 162 1063 1380 3000 84.4% chr14 - 26464721 26464937 217 browser details YourSeq 160 1195 1371 3000 96.0% chr2 - 150529654 150529832 179 browser details YourSeq 160 1196 1370 3000 96.6% chr10 - 115895233 115895408 176 browser details YourSeq 160 1196 1390 3000 90.9% chr15 + 91688454 91688629 176 browser details YourSeq 159 1196 1370 3000 97.1% chr2 + 80611606 80611780 175 browser details YourSeq 159 1201 1371 3000 95.3% chr15 + 100525734 100525903 170 browser details YourSeq 159 1196 1373 3000 95.5% chr10 + 24777267 24777444 178 browser details YourSeq 158 1197 1377 3000 94.9% chr18 - 58758208 58758388 181 browser details YourSeq 158 1196 1371 3000 94.3% chr4 + 33309559 33309733 175 browser details YourSeq 158 1196 1375 3000 94.9% chr2 + 84973345 84973551 207 browser details YourSeq 157 1204 1377 3000 95.4% chr4 + 108613779 108613955 177

Note: The 3000 bp section upstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr17 - 71452355 71455354 3000 browser details YourSeq 196 1104 3000 3000 91.9% chr1 + 37055570 37087001 31432 browser details YourSeq 150 1051 1233 3000 93.1% chr5 + 149391024 149391207 184 browser details YourSeq 149 1050 1235 3000 88.4% chr12 - 72699382 72699563 182 browser details YourSeq 145 1066 1234 3000 92.9% chr12 + 33446349 33446517 169 browser details YourSeq 143 1052 1237 3000 89.6% chr1 + 136448461 136448714 254 browser details YourSeq 142 671 1210 3000 86.4% chr19 - 50455860 50456375 516 browser details YourSeq 139 1049 1218 3000 92.2% chr4 - 126718776 126718946 171 browser details YourSeq 139 1048 1226 3000 86.7% chr10 - 59660939 59661110 172 browser details YourSeq 137 2441 3000 3000 82.9% chrX + 169893308 169893476 169 browser details YourSeq 132 1047 1211 3000 87.9% chr7 - 29904865 29905023 159 browser details YourSeq 132 2835 3000 3000 92.0% chr4 + 34134969 34135131 163 browser details YourSeq 131 1086 1233 3000 94.6% chr10 - 78016026 78016174 149 browser details YourSeq 129 1086 1233 3000 94.0% chr10 - 116200634 116200784 151 browser details YourSeq 129 2835 2999 3000 93.0% chr13 + 37325067 37325229 163 browser details YourSeq 128 1086 1233 3000 91.8% chr3 - 96468572 96468717 146 browser details YourSeq 127 2848 3000 3000 95.8% chr5 + 20203386 20203546 161 browser details YourSeq 127 1050 1235 3000 88.7% chr4 + 139932418 139932598 181 browser details YourSeq 127 2837 3000 3000 91.2% chr11 + 21831894 21832055 162 browser details YourSeq 126 2840 3000 3000 91.7% chr2 - 115413285 115413449 165

Note: The 3000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Smchd1 SMC hinge domain containing 1 [ Mus musculus (house mouse) ] Gene ID: 74355, updated on 8-Oct-2019

Gene summary

Official Symbol Smchd1 provided by MGI Official Full Name SMC hinge domain containing 1 provided by MGI Primary source MGI:MGI:1921605 See related Ensembl:ENSMUSG00000024054 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as MommeD1; AW554188; mKIAA0650; 4931400A14Rik Expression Ubiquitous expression in CNS E11.5 (RPKM 9.1), placenta adult (RPKM 7.4) and 24 other tissues See more Orthologs human all

Genomic context

Location: 17; 17 E1.3 See Smchd1 in Genome Data Viewer

Exon count: 48

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 17 NC_000083.6 (71344489..71475366, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 17 NC_000083.5 (71693829..71824683, complement)

Chromosome 17 - NC_000083.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 12 transcripts

Gene: Smchd1 ENSMUSG00000024054

Description SMC hinge domain containing 1 [Source:MGI Symbol;Acc:MGI:1921605] Gene Synonyms 4931400A14Rik, MommeD1 Location Chromosome 17: 71,344,489-71,475,343 reverse strand. GRCm38:CM001010.2 About this gene This gene has 12 transcripts (splice variants), 227 orthologues, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Smchd1- ENSMUST00000127430.1 7052 2007aa ENSMUSP00000121835.1 Protein coding CCDS28958 Q6P5D8 TSL:5 201 GENCODE basic APPRIS P1

Smchd1- ENSMUST00000233839.1 595 119aa ENSMUSP00000156815.1 Nonsense mediated - A0A3B2W494 CDS 5' 211 decay incomplete

Smchd1- ENSMUST00000233872.1 381 87aa ENSMUSP00000156880.1 Nonsense mediated - A0A3B2WD81 CDS 5' 212 decay incomplete

Smchd1- ENSMUST00000182107.1 4107 No - Retained intron - - TSL:NA 206 protein

Smchd1- ENSMUST00000182747.1 2790 No - Retained intron - - TSL:NA 208 protein

Smchd1- ENSMUST00000182205.1 2510 No - Retained intron - - TSL:NA 207 protein

Smchd1- ENSMUST00000182049.1 2497 No - Retained intron - - TSL:NA 205 protein

Smchd1- ENSMUST00000147111.1 871 No - Retained intron - - TSL:5 204 protein

Smchd1- ENSMUST00000183046.1 820 No - Retained intron - - TSL:NA 209 protein

Smchd1- ENSMUST00000136071.1 411 No - Retained intron - - TSL:3 202 protein

Smchd1- ENSMUST00000138193.7 1485 No - lncRNA - - TSL:1 203 protein

Smchd1- ENSMUST00000232914.1 680 No - lncRNA - - - 210 protein

Page 6 of 8 https://www.alphaknockout.com

150.85 kb Forward strand 71.34Mb 71.36Mb 71.38Mb 71.40Mb 71.42Mb 71.44Mb 71.46Mb 71.48Mb Genes 4930471L23Rik-201 >lncRNA Gm49916-201 >processed pseudogene Gm18738-201 >processed pseudogene (Comprehensive set...

Contigs AC126942.4 > AC107664.8 > Genes (Comprehensive set... < Gm4566-202lncRNA < Smchd1-206retained intron < Smchd1-202retained intron < Gm4707-201translated processed pseudogene

< Gm4566-201lncRNA < Gm37639-201TEC < Gm38220-201TEC < Smchd1-212nonsense mediated decay

< Smchd1-201protein coding

< Smchd1-203lncRNA < Smchd1-210lncRNA< Smchd1-209retained intron < Gm4707-202protein coding

< Smchd1-205retained intron < Smchd1-208retained intron

< Smchd1-211nonsense mediated decay

< Smchd1-204retained intron

< Smchd1-207retained intron

Regulatory Build

71.34Mb 71.36Mb 71.38Mb 71.40Mb 71.42Mb 71.44Mb 71.46Mb 71.48Mb Reverse strand 150.85 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene pseudogene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000127430

< Smchd1-201protein coding

Reverse strand 130.85 kb

ENSMUSP00000121... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Histidine kinase/HSP90-like ATPase superfamily SMCs flexible hinge superfamily

SMART SMCs flexible hinge

Pfam PF13589 SMCs flexible hinge

PANTHER Structural maintenance of flexible hinge domain-containing protein 1 Gene3D Histidine kinase/HSP90-like ATPase superfamily 3.30.70.1620

1.20.1060.20 CDD cd16937

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1200 1400 1600 1800 2007

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8