https://www.alphaknockout.com

Mouse Smyd4 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Smyd4 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Smyd4 (NCBI Reference Sequence: NM_001102611 ; Ensembl: ENSMUSG00000018809 ) is located on Mouse 11. 11 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 11 (Transcript: ENSMUST00000044530). Exon 5 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Smyd4 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-113D4 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit testicular degeneration and atrophy.

Exon 5 starts from about 15.48% of the coding region. The knockout of Exon 5 will result in frameshift of the gene. The size of intron 4 for 5'-loxP site insertion: 2571 bp, and the size of intron 5 for 3'-loxP site insertion: 8373 bp. The size of effective cKO region: ~1656 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 5 11 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Smyd4 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8156bp) | A(28.27% 2306) | C(20.73% 1691) | T(29.01% 2366) | G(21.98% 1793)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr11 + 75386823 75389822 3000 browser details YourSeq 224 131 2662 3000 92.8% chr11 + 61559931 61841033 281103 browser details YourSeq 221 117 2660 3000 84.4% chr6 - 49038284 49277715 239432 browser details YourSeq 172 130 315 3000 96.8% chr10 + 127249870 127302073 52204 browser details YourSeq 169 114 316 3000 91.7% chr1 + 36236651 36236853 203 browser details YourSeq 167 101 312 3000 89.7% chr7 - 133145576 133145781 206 browser details YourSeq 167 128 316 3000 94.2% chr13 - 101703639 101703827 189 browser details YourSeq 167 131 316 3000 95.2% chr10 + 127458322 127458510 189 browser details YourSeq 164 127 315 3000 94.6% chr17 + 48562746 48562938 193 browser details YourSeq 162 129 316 3000 93.5% chr1 - 63180560 63180746 187 browser details YourSeq 162 131 316 3000 95.6% chr18 + 78111870 78112055 186 browser details YourSeq 161 130 316 3000 93.1% chr10 - 126356464 126356650 187 browser details YourSeq 160 131 315 3000 92.7% chr2 - 46515402 46515583 182 browser details YourSeq 160 131 315 3000 93.6% chr14 + 122237051 122237237 187 browser details YourSeq 158 116 321 3000 88.9% chr10 - 13143180 13143383 204 browser details YourSeq 158 133 316 3000 93.0% chr8 + 110018925 110019108 184 browser details YourSeq 158 133 316 3000 93.0% chr19 + 40409122 40409305 184 browser details YourSeq 157 130 312 3000 93.9% chr4 + 135264696 135264881 186 browser details YourSeq 157 121 303 3000 91.7% chr11 + 86446572 86446751 180 browser details YourSeq 156 131 313 3000 94.8% chr7 - 3263301 3263500 200

Note: The 3000 bp section upstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr11 + 75391479 75394478 3000 browser details YourSeq 189 2022 2583 3000 81.9% chr14 + 18040962 18041499 538 browser details YourSeq 188 2721 3000 3000 88.3% chr8 - 83958619 83959040 422 browser details YourSeq 183 2721 3000 3000 93.0% chr1 + 13030565 13030903 339 browser details YourSeq 177 2725 3000 3000 92.3% chr15 - 102387667 102388186 520 browser details YourSeq 168 2716 3000 3000 92.9% chr3 + 88263711 88264012 302 browser details YourSeq 165 2721 3000 3000 90.3% chr17 + 80266356 80266797 442 browser details YourSeq 155 2305 2582 3000 88.6% chr17 + 5459668 5459945 278 browser details YourSeq 154 2727 3000 3000 87.9% chr11 + 102158561 102158822 262 browser details YourSeq 145 2738 2999 3000 92.0% chr11 + 105148456 105149046 591 browser details YourSeq 131 2727 3000 3000 95.2% chr17 - 28214754 28215243 490 browser details YourSeq 128 2734 3000 3000 94.5% chr5 + 103675219 103675654 436 browser details YourSeq 124 2442 3000 3000 82.3% chr6 + 116950456 116950611 156 browser details YourSeq 123 2808 3000 3000 97.7% chr7 + 16732965 16733167 203 browser details YourSeq 122 2779 3000 3000 96.3% chr11 - 80092931 80093157 227 browser details YourSeq 118 2772 3000 3000 94.7% chr7 + 110167510 110167758 249 browser details YourSeq 118 2774 2966 3000 87.3% chr3 + 90383397 90383559 163 browser details YourSeq 117 2875 3000 3000 94.3% chr3 - 107603240 107603361 122 browser details YourSeq 117 2875 3000 3000 94.3% chr3 - 88307478 88307599 122 browser details YourSeq 117 2874 3000 3000 96.1% chr18 - 35924938 35925064 127

Note: The 3000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Smyd4 SET and MYND domain containing 4 [ Mus musculus (house mouse) ] Gene ID: 319822, updated on 12-Aug-2019

Gene summary

Official Symbol Smyd4 provided by MGI Official Full Name SET and MYND domain containing 4 provided by MGI Primary source MGI:MGI:2442796 See related Ensembl:ENSMUSG00000018809 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as G430029E23Rik Expression Ubiquitous expression in CNS E18 (RPKM 2.8), kidney adult (RPKM 2.7) and 28 other tissues See more Orthologs human all

Genomic context

Location: 11; 11 B5 See Smyd4 in Genome Data Viewer

Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (75348433..75405707)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (75161935..75219207)

Chromosome 11 - NC_000077.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Smyd4 ENSMUSG00000018809

Description SET and MYND domain containing 4 [Source:MGI Symbol;Acc:MGI:2442796] Gene Synonyms G430029E23Rik Location Chromosome 11: 75,348,433-75,405,705 forward strand. GRCm38:CM001004.2 About this gene This gene has 4 transcripts (splice variants), 206 orthologues, 5 paralogues, is a member of 1 Ensembl protein family and is associated with 4 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Smyd4-201 ENSMUST00000044530.2 3517 799aa ENSMUSP00000047505.2 Protein coding CCDS48848 Q8BTK5 TSL:1 GENCODE basic APPRIS P1

Smyd4-204 ENSMUST00000157055.1 1172 No protein - Retained intron - - TSL:1

Smyd4-202 ENSMUST00000135774.1 527 No protein - lncRNA - - TSL:2

Smyd4-203 ENSMUST00000145888.1 516 No protein - lncRNA - - TSL:5

77.27 kb Forward strand 75.34Mb 75.36Mb 75.38Mb 75.40Mb (Comprehensive set... Smyd4-201 >protein coding

Smyd4-204 >retained intron Smyd4-203 >lncRNA Smyd4-202 >lncRNA

Contigs AL603834.6 > AL591496.13 > Genes < Rpa1-202protein coding < Serpinf1-208nonsense mediated decay (Comprehensive set...

< Rpa1-201protein coding < Serpinf1-207nonsense mediated decay

< Rpa1-203lncRNA < Serpinf1-205retained intron

< Rpa1-205retained intron < Serpinf1-201protein coding

< Rpa1-204lncRNA < Serpinf1-204protein coding

< Serpinf1-203protein coding

< Serpinf1-202protein coding

Regulatory Build

75.34Mb 75.36Mb 75.38Mb 75.40Mb Reverse strand 77.27 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000044530

57.27 kb Forward strand

Smyd4-201 >protein coding

ENSMUSP00000047... Coiled-coils (Ncoils) Superfamily SSF144232

Tetratricopeptide-like helical domain superfamily

SSF82199 Pfam SET domain

Zinc finger, MYND-type PROSITE profiles Zinc finger, MYND-type

SET domain PROSITE patterns Zinc finger, MYND-type PANTHER PTHR46165

PTHR46165:SF2 Gene3D 3.30.60.180

Tetratricopeptide-like helical domain superfamily

2.170.270.10

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant splice region variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 799

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7