https://www.alphaknockout.com

Mouse Wfdc1 Knockout Project (CRISPR/Cas9)

Objective: To create a Wfdc1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Wfdc1 (NCBI Reference Sequence: NM_023395 ; Ensembl: ENSMUSG00000023336 ) is located on Mouse 8. 7 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 6 (Transcript: ENSMUST00000024107). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous mice for a null allele exhibit decreased susceptibility to influenza A virus infection and enhanced wound healing.

Exon 2 starts from about 19.12% of the coding region. Exon 2~5 covers 72.2% of the coding region. The size of effective KO region: ~4524 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 7

Legends Exon of mouse Wfdc1 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.65% 493) | C(25.55% 511) | T(27.2% 544) | G(22.6% 452)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.7% 494) | C(41.8% 836) | T(26.05% 521) | G(7.45% 149)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr8 + 119677269 119679268 2000 browser details YourSeq 182 653 994 2000 90.4% chr12 - 85647384 85989556 342173 browser details YourSeq 175 646 1005 2000 87.4% chr2 - 115789289 115789807 519 browser details YourSeq 163 714 1099 2000 81.6% chr15 + 39159665 39160031 367 browser details YourSeq 153 342 802 2000 80.2% chr10 - 85953782 85954145 364 browser details YourSeq 148 150 773 2000 82.3% chr15 + 77868328 77868846 519 browser details YourSeq 144 599 806 2000 87.3% chr7 + 67584320 67584512 193 browser details YourSeq 142 598 802 2000 90.3% chr11 + 58111608 58111969 362 browser details YourSeq 137 625 802 2000 90.6% chr4 - 82964194 82964373 180 browser details YourSeq 133 599 802 2000 82.8% chr4 + 125088436 125088615 180 browser details YourSeq 131 624 802 2000 89.8% chr16 + 31497367 31497549 183 browser details YourSeq 128 819 1137 2000 89.1% chr8 - 46944550 46944881 332 browser details YourSeq 127 645 806 2000 86.8% chr19 - 45948303 45948461 159 browser details YourSeq 127 640 802 2000 89.6% chr1 + 89027973 89028140 168 browser details YourSeq 126 653 814 2000 88.9% chr11 - 5812775 5812936 162 browser details YourSeq 124 617 802 2000 87.4% chr9 - 44919677 44919940 264 browser details YourSeq 124 648 800 2000 90.9% chr11 + 69528465 69528619 155 browser details YourSeq 121 490 1091 2000 80.9% chr2 - 92128572 92129146 575 browser details YourSeq 118 647 802 2000 86.0% chr19 + 29670760 29670911 152 browser details YourSeq 117 858 1079 2000 89.3% chr13 + 53299246 53299490 245

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr8 + 119683793 119685792 2000 browser details YourSeq 680 594 1783 2000 92.9% chr5 + 118336235 118338093 1859 browser details YourSeq 601 266 1751 2000 92.7% chr5 + 118336323 118338093 1771 browser details YourSeq 557 613 1510 2000 92.8% chr13 + 103756157 103757130 974 browser details YourSeq 527 433 1366 2000 93.9% chr4 - 142110991 142112032 1042 browser details YourSeq 507 746 1626 2000 91.3% chr13 + 103756170 103756991 822 browser details YourSeq 499 448 1614 2000 91.4% chr10 - 79641164 79642256 1093 browser details YourSeq 493 668 1503 2000 91.7% chr6 + 137308991 137309711 721 browser details YourSeq 481 880 1715 2000 93.7% chr4 - 142111064 142112032 969 browser details YourSeq 474 675 1699 2000 92.2% chr5 + 118336220 118337349 1130 browser details YourSeq 462 610 1451 2000 91.4% chr1 + 164539481 164540258 778 browser details YourSeq 453 423 1321 2000 93.7% chr13 + 103756181 103757129 949 browser details YourSeq 449 418 1614 2000 94.7% chr5 + 118336343 118337939 1597 browser details YourSeq 444 366 1319 2000 89.8% chr10 - 79641164 79642132 969 browser details YourSeq 439 666 1562 2000 87.1% chr16 + 93654601 93655173 573 browser details YourSeq 430 390 1482 2000 94.1% chr5 - 113656888 113658071 1184 browser details YourSeq 415 687 1618 2000 91.8% chr11 + 33727864 33728568 705 browser details YourSeq 414 708 1626 2000 95.5% chr4 - 142111006 142112032 1027 browser details YourSeq 414 401 1172 2000 92.9% chr13 + 103756195 103757128 934 browser details YourSeq 406 382 1198 2000 93.1% chr5 + 118336775 118338013 1239

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Wfdc1 WAP four- core domain 1 [ Mus musculus (house mouse) ] Gene ID: 67866, updated on 12-Aug-2019

Gene summary

Official Symbol Wfdc1 provided by MGI Official Full Name WAP four-disulfide core domain 1 provided by MGI Primary source MGI:MGI:1915116 See related Ensembl:ENSMUSG00000023336 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as ps20; 2310058A03Rik Expression Broad expression in bladder adult (RPKM 63.2), colon adult (RPKM 55.9) and 18 other tissues See more Orthologs human all

Genomic context

Location: 8; 8 E1 See Wfdc1 in Genome Data Viewer Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (119666365..119688020)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (122190265..122211920)

Chromosome 8 - NC_000074.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Wfdc1 ENSMUSG00000023336

Description WAP four-disulfide core domain 1 [Source:MGI Symbol;Acc:MGI:1915116] Gene Synonyms 2310058A03Rik, ps20 Location Chromosome 8: 119,666,365-119,688,222 forward strand. GRCm38:CM001001.2 About this gene This gene has 2 transcripts (splice variants), 183 orthologues, is a member of 1 Ensembl protein family and is associated with 13 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Wfdc1-201 ENSMUST00000024107.6 1371 211aa ENSMUSP00000024107.5 Protein coding CCDS22711 Q3UQ76 Q9ESH5 TSL:1 GENCODE basic APPRIS P2

Wfdc1-202 ENSMUST00000212901.1 921 201aa ENSMUSP00000148437.1 Protein coding - Q9CQZ8 TSL:1 GENCODE basic APPRIS ALT2

41.86 kb Forward strand 119.66Mb 119.67Mb 119.68Mb 119.69Mb (Comprehensive set... Wfdc1-201 >protein coding

Wfdc1-202 >protein coding

Contigs AC104882.30 > Genes < Gm32352-201lncRNA (Comprehensive set...

Regulatory Build

119.66Mb 119.67Mb 119.68Mb 119.69Mb Reverse strand 41.86 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000024107

21.86 kb Forward strand

Wfdc1-201 >protein coding

ENSMUSP00000024... MobiDB lite Low complexity (Seg) Cleavage site (Sign... Superfamily Elafin-like superfamily SMART WAP-type 'four-disulfide core' domain Pfam WAP-type 'four-disulfide core' domain PROSITE profiles WAP-type 'four-disulfide core' domain PANTHER WAP four-disulfide core domain protein 1 Gene3D Elafin-like superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 180 211

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8