https://www.alphaknockout.com

Mouse Reg4 Knockout Project (CRISPR/Cas9)

Objective: To create a Reg4 knockout Mouse model (C57BL/6N) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Reg4 (NCBI Reference Sequence: NM_026328 ; Ensembl: ENSMUSG00000027876 ) is located on Mouse 3. 6 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 6 (Transcript: ENSMUST00000029469). Exon 3~4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 14.44% of the coding region. Exon 3~4 covers 49.47% of the coding region. The size of effective KO region: ~1502 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 6

Legends Exon of mouse Reg4 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1719 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.7% 534) | C(23.4% 468) | T(27.3% 546) | G(22.6% 452)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1719bp) | A(28.8% 495) | C(20.36% 350) | T(31.01% 533) | G(19.84% 341)

Note: The 1719 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 + 98227785 98229784 2000 browser details YourSeq 84 1566 1726 2000 83.8% chr5 - 100755853 100756116 264 browser details YourSeq 62 1605 1702 2000 91.0% chr6 - 147120346 147120505 160 browser details YourSeq 58 223 689 2000 69.2% chrX - 16628319 16628547 229 browser details YourSeq 56 1642 1721 2000 87.2% chr5 - 118475572 118475650 79 browser details YourSeq 56 1240 1693 2000 93.8% chr16 + 13427267 13427791 525 browser details YourSeq 54 368 514 2000 75.0% chr9 - 68909101 68909234 134 browser details YourSeq 54 1642 1707 2000 91.0% chr18 + 35742235 35742300 66 browser details YourSeq 53 1656 1754 2000 89.6% chr1 - 187852571 187852671 101 browser details YourSeq 53 480 589 2000 73.4% chr1 - 125518562 125518622 61 browser details YourSeq 52 1643 1704 2000 92.0% chr13 - 49314096 49314157 62 browser details YourSeq 52 1501 1686 2000 93.3% chr11 - 102651137 102651491 355 browser details YourSeq 51 1642 1701 2000 93.4% chr5 - 129498506 129498566 61 browser details YourSeq 51 1640 1704 2000 89.3% chr12 - 76017522 76017586 65 browser details YourSeq 51 1640 1704 2000 89.3% chr2 + 39040605 39040669 65 browser details YourSeq 51 1640 1724 2000 80.0% chr1 + 171569157 171569241 85 browser details YourSeq 50 1616 1707 2000 94.7% chr4 + 101454978 101455390 413 browser details YourSeq 49 1642 1706 2000 87.7% chr2 - 170609173 170609237 65 browser details YourSeq 48 1550 1609 2000 98.2% chr15 - 16023338 16023428 91 browser details YourSeq 47 1640 1702 2000 92.8% chr9 - 78470326 78470476 151

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1719 1 1719 1719 100.0% chr3 + 98231287 98233005 1719 browser details YourSeq 72 479 599 1719 94.0% chr11 + 120604043 120604529 487 browser details YourSeq 70 755 884 1719 77.0% chrX + 101997659 101997788 130 browser details YourSeq 70 607 763 1719 91.0% chr1 + 192172502 192173193 692 browser details YourSeq 62 752 888 1719 76.5% chr5 + 79493397 79493531 135 browser details YourSeq 57 803 910 1719 76.9% chr1 + 34231998 34232103 106 browser details YourSeq 56 753 861 1719 76.2% chr3 - 116462316 116462425 110 browser details YourSeq 55 752 903 1719 77.5% chr10 - 58795183 58795333 151 browser details YourSeq 55 458 921 1719 67.7% chr7 + 25822730 25823005 276 browser details YourSeq 52 750 853 1719 89.6% chr4 - 57265716 57265819 104 browser details YourSeq 52 752 885 1719 82.9% chr9 + 108722693 108722824 132 browser details YourSeq 52 585 677 1719 90.8% chr9 + 85436375 85436486 112 browser details YourSeq 51 810 890 1719 81.5% chr5 + 33376564 33376644 81 browser details YourSeq 51 587 689 1719 92.1% chr14 + 61535462 61535580 119 browser details YourSeq 50 564 890 1719 69.7% chr7 - 16217485 16217720 236 browser details YourSeq 49 759 835 1719 81.9% chr8 - 112270551 112270627 77 browser details YourSeq 49 810 886 1719 81.9% chr2 + 167500382 167500458 77 browser details YourSeq 48 752 835 1719 78.6% chr3 - 121958361 121958444 84 browser details YourSeq 48 810 885 1719 81.6% chr1 - 121544385 121544460 76 browser details YourSeq 47 751 833 1719 78.4% chr19 - 46684239 46684321 83

Note: The 1719 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Reg4 regenerating islet-derived family, member 4 [ Mus musculus (house mouse) ] Gene ID: 67709, updated on 12-Aug-2019

Gene summary

Official Symbol Reg4 provided by MGI Official Full Name regenerating islet-derived family, member 4 provided by MGI Primary source MGI:MGI:1914959 See related Ensembl:ENSMUSG00000027876 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GISP; RELP; 2010002L15Rik Expression Biased expression in colon adult (RPKM 46.3), large intestine adult (RPKM 32.9) and 2 other tissuesS ee more Orthologs human all

Genomic context

Location: 3; 3 F2.2 See Reg4 in Genome Data Viewer Exon count: 6

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (98222138..98236748)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (98026079..98040671)

Chromosome 3 - NC_000069.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Reg4 ENSMUSG00000027876

Description regenerating islet-derived family, member 4 [Source:MGI Symbol;Acc:MGI:1914959] Gene Synonyms 2010002L15Rik, RELP Location Chromosome 3: 98,222,156-98,236,748 forward strand. GRCm38:CM000996.2 About this gene This gene has 1 transcript (splice variant), 111 orthologues, 6 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Reg4-201 ENSMUST00000029469.4 1021 157aa ENSMUSP00000029469.4 Protein coding CCDS17661 Q9D8G5 TSL:1 GENCODE basic APPRIS P1

34.59 kb Forward strand

98.22Mb 98.23Mb 98.24Mb (Comprehensive set... Reg4-201 >protein coding

Contigs AC121771.3 > Regulatory Build

98.22Mb 98.23Mb 98.24Mb Reverse strand 34.59 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000029469

14.59 kb Forward strand

Reg4-201 >protein coding

ENSMUSP00000029... Cleavage site (Sign... Superfamily C-type lectin fold SMART C-type lectin-like Prints PR01504 Pfam C-type lectin-like PROSITE profiles C-type lectin-like PANTHER PTHR45710

PTHR45710:SF6 Gene3D C-type lectin-like/link domain superfamily CDD cd03594

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 157

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8