https://www.alphaknockout.com

Mouse Spdef Knockout Project (CRISPR/Cas9)

Objective: To create a Spdef knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Spdef (NCBI Reference Sequence: NM_013891 ; Ensembl: ENSMUSG00000024215 ) is located on Mouse 17. 6 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 6 (Transcript: ENSMUST00000025054). Exon 2~6 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null allele have reduced numbers of intestinal and respiratory mucosa goblet cells. Increased inflammation of the gastric antrum has also been seen.

Exon 2 starts from about 0.1% of the coding region. Exon 2~6 covers 100.0% of the coding region. The size of effective KO region: ~5498 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6

Legends Exon of mouse Spdef Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.55% 471) | C(26.45% 529) | T(22.55% 451) | G(27.45% 549)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.45% 449) | C(27.15% 543) | T(23.2% 464) | G(27.2% 544)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr17 - 27720371 27722370 2000 browser details YourSeq 165 12 209 2000 92.4% chr11 + 53532449 53532654 206 browser details YourSeq 159 21 208 2000 90.8% chr16 + 23165971 23166155 185 browser details YourSeq 156 16 207 2000 89.2% chr10 + 80920935 80921120 186 browser details YourSeq 155 5 211 2000 86.0% chr4 + 118454788 118454990 203 browser details YourSeq 153 29 210 2000 90.6% chr2 - 173045439 173045618 180 browser details YourSeq 151 24 210 2000 89.9% chr18 - 67300175 67300356 182 browser details YourSeq 151 19 226 2000 86.7% chr9 + 72452236 72452433 198 browser details YourSeq 147 22 210 2000 86.5% chr10 - 70577743 70577927 185 browser details YourSeq 144 9 200 2000 86.1% chr10 + 93572365 93572546 182 browser details YourSeq 144 16 211 2000 84.8% chr1 + 59862696 59862880 185 browser details YourSeq 143 9 209 2000 86.9% chr6 + 120789029 120789226 198 browser details YourSeq 143 27 211 2000 88.4% chr17 + 86458706 86458889 184 browser details YourSeq 142 24 209 2000 86.3% chr3 - 131258090 131258264 175 browser details YourSeq 142 11 210 2000 89.1% chr2 - 119917763 119917972 210 browser details YourSeq 142 17 216 2000 87.8% chr12 - 70914298 70914497 200 browser details YourSeq 141 21 210 2000 85.6% chr10 - 31226715 31226898 184 browser details YourSeq 141 27 210 2000 88.0% chr5 + 105863992 105864171 180 browser details YourSeq 141 16 211 2000 87.0% chr17 + 83871103 83871293 191 browser details YourSeq 140 1 211 2000 86.6% chr2 - 104873844 104874071 228

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr17 - 27712871 27714870 2000 browser details YourSeq 111 1448 1596 2000 90.7% chr8 - 41438028 41438197 170 browser details YourSeq 105 1458 1604 2000 89.8% chr2 + 10311563 10312070 508 browser details YourSeq 103 1464 1596 2000 90.0% chr12 - 92421336 92421469 134 browser details YourSeq 103 1456 1596 2000 87.7% chr11 + 117854790 117854937 148 browser details YourSeq 99 1457 1596 2000 87.8% chr3 + 93136884 93137028 145 browser details YourSeq 98 1465 1596 2000 88.6% chr2 - 100638381 100638527 147 browser details YourSeq 98 1465 1596 2000 88.6% chr2 - 100471871 100472017 147 browser details YourSeq 98 1457 1596 2000 81.7% chr11 + 57584687 57584820 134 browser details YourSeq 98 1455 1584 2000 88.4% chr10 + 95431044 95431179 136 browser details YourSeq 97 1469 1596 2000 88.9% chr11 - 3940417 3940547 131 browser details YourSeq 97 1457 1595 2000 83.6% chr10 - 35287721 35287853 133 browser details YourSeq 96 1471 1596 2000 88.8% chr19 + 5025023 5025174 152 browser details YourSeq 96 1464 1595 2000 83.9% chr1 + 135223748 135223875 128 browser details YourSeq 96 1464 1596 2000 84.4% chr1 + 86001951 86002081 131 browser details YourSeq 95 1453 1584 2000 86.7% chr4 + 14866600 14866729 130 browser details YourSeq 93 1456 1584 2000 89.9% chr16 + 81129231 81129367 137 browser details YourSeq 92 1456 1583 2000 88.3% chr17 - 80495509 80495638 130 browser details YourSeq 92 1481 1629 2000 82.6% chr17 - 26966364 26966489 126 browser details YourSeq 92 1490 1638 2000 78.3% chr8 + 33952688 33952811 124

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Spdef SAM pointed domain containing ets transcription factor [ Mus musculus (house mouse) ] Gene ID: 30051, updated on 24-Oct-2019

Gene summary

Official Symbol Spdef provided by MGI Official Full Name SAM pointed domain containing ets transcription factor provided by MGI Primary source MGI:MGI:1353422 See related Ensembl:ENSMUSG00000024215 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Pse; PDEF Expression Biased expression in colon adult (RPKM 65.2), stomach adult (RPKM 24.2) and 4 other tissues See more Orthologs human all

Genomic context

Location: 17; 17 A3.3 See Spdef in Genome Data Viewer Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 17 NC_000083.6 (27714446..27731003, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 17 NC_000083.5 (27851392..27865896, complement)

Chromosome 17 - NC_000083.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Spdef ENSMUSG00000024215

Description SAM pointed domain containing ets transcription factor [Source:MGI Symbol;Acc:MGI:1353422] Gene Synonyms PDEF, Pse Location Chromosome 17: 27,714,352-27,728,955 reverse strand. GRCm38:CM001010.2 About this gene This gene has 6 transcripts (splice variants), 226 orthologues, 27 paralogues, is a member of 1 Ensembl protein family and is associated with 13 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Spdef-201 ENSMUST00000025054.9 1921 325aa ENSMUSP00000025054.2 Protein coding CCDS28568 Q9WTP3 TSL:1 GENCODE basic APPRIS P2

Spdef-205 ENSMUST00000167489.1 1708 325aa ENSMUSP00000127056.1 Protein coding CCDS28568 Q9WTP3 TSL:5 GENCODE basic APPRIS P2

Spdef-202 ENSMUST00000114870.8 1568 309aa ENSMUSP00000110520.2 Protein coding - A0A3F2YNL9 TSL:5 GENCODE basic APPRIS ALT2

Spdef-204 ENSMUST00000138970.2 620 126aa ENSMUSP00000117743.1 Protein coding - B2KF87 CDS 3' incomplete TSL:5

Spdef-206 ENSMUST00000233880.1 1200 No protein - Retained intron - - -

Spdef-203 ENSMUST00000127622.2 1045 No protein - Retained intron - - TSL:2

Page 7 of 9 https://www.alphaknockout.com

34.60 kb Forward strand 27.71Mb 27.72Mb 27.73Mb Pacsin1-204 >protein coding Gm15458-201 >lncRNA (Comprehensive set...

Pacsin1-201 >protein coding

Pacsin1-211 >protein coding

Pacsin1-208 >protein coding

Pacsin1-202 >protein coding

Pacsin1-206 >protein coding

Pacsin1-207 >lncRNA

Pacsin1-203 >protein coding

Pacsin1-210 >retained intron

Pacsin1-209 >retained intron

Contigs AC131800.4 > Genes (Comprehensive set... < Spdef-201protein coding

< Spdef-206retained intron < Spdef-204protein coding

< Spdef-205protein coding

< Spdef-203retained intron

< Spdef-202protein coding

Regulatory Build

27.71Mb 27.72Mb 27.73Mb Reverse strand 34.60 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000025054

< Spdef-201protein coding

Reverse strand 14.60 kb

ENSMUSP00000025... MobiDB lite Superfamily Sterile alpha motif/pointed domain superfamily Winged helix DNA-binding domain superfamily

SMART Pointed domain Ets domain

Prints Ets domain Pfam Pointed domain Ets domain

PROSITE profiles Pointed domain Ets domain

PROSITE patterns Ets domain Ets domain

PANTHER PTHR11849

PTHR11849:SF182 Gene3D Sterile alpha motif/pointed domain superfamily

Winged helix-like DNA-binding domain superfamily CDD cd08532

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 325

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9