http://www.alphaknockout.com/

Mouse Cpa4 Knockout Project (CRISPR/Cas9)

Objective: To create a Cpa4 knockout mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cpa4 ( NCBI Reference Sequence: NM_027926 ; Ensembl: ENSMUSG00000039070 ) is located on mouse 6. 11 exons are identified , with the ATG start codon in exon 1 and the TAG stop codon in exon 11 (Transcript: ENSMUST00000049251). Exon 2~10 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 5.48% of the coding region. Exon 2~10 covers 79.92% of the coding region. The size of effective KO region: ~12130 bp.

Page 1 of 8 http://www.alphaknockout.com/

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11

Legends Exon of mouse Cpa4 Knockout region

Page 2 of 8 http://www.alphaknockout.com/

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 10 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 http://www.alphaknockout.com/

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.35% 487) | C(25.3% 506) | G(22.4% 448) | T(27.95% 559)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.85% 457) | C(21.85% 437) | G(24.1% 482) | T(31.2% 624)

Note: The 2000 bp section downstream of Exon 10 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 http://www.alphaknockout.com/

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr6 + 30571864 30573863 2000 browser details YourSeq 40 50 106 2000 82.7% chr14 + 77701971 77702025 55 browser details YourSeq 38 247 308 2000 80.7% chr13 - 13562330 13562391 62 browser details YourSeq 38 241 327 2000 81.5% chr1 - 192549092 192549177 86 browser details YourSeq 37 51 114 2000 89.4% chr14 - 69131761 69131824 64 browser details YourSeq 35 51 114 2000 83.1% chr2 + 18389632 18389696 65 browser details YourSeq 35 73 116 2000 92.5% chr11 + 91301672 91301716 45 browser details YourSeq 34 50 91 2000 90.5% chr4 - 4616331 4616372 42 browser details YourSeq 34 50 99 2000 82.5% chr2 + 64163740 64163786 47 browser details YourSeq 33 48 86 2000 92.4% chr11 + 9184829 9184867 39 browser details YourSeq 32 50 89 2000 90.0% chr3 + 21864481 21864520 40 browser details YourSeq 31 68 98 2000 100.0% chr6 - 96921428 96921458 31 browser details YourSeq 30 50 85 2000 88.3% chr6 + 35400421 35400455 35 browser details YourSeq 29 65 117 2000 91.5% chr1 - 51832801 51832854 54 browser details YourSeq 29 65 99 2000 91.5% chr9 + 117693675 117693709 35 browser details YourSeq 28 44 75 2000 87.1% chr17 - 74031729 74031759 31 browser details YourSeq 28 86 116 2000 96.8% chr7 + 88232769 88232801 33 browser details YourSeq 28 337 367 2000 96.8% chr18 + 31181076 31181107 32 browser details YourSeq 26 65 96 2000 90.7% chr12 - 11394711 11394742 32 browser details YourSeq 26 81 117 2000 93.4% chr11 + 22721833 22721871 39

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr6 + 30585994 30587993 2000 browser details YourSeq 61 312 471 2000 77.2% chr9 + 21307488 21307636 149 browser details YourSeq 51 311 399 2000 80.8% chr5 - 116648509 116648581 73 browser details YourSeq 47 312 635 2000 94.4% chr11 - 80920048 80920494 447 browser details YourSeq 46 312 397 2000 81.2% chr7 + 39838031 39838106 76 browser details YourSeq 43 376 647 2000 63.3% chr10 + 72165802 72165894 93 browser details YourSeq 42 312 423 2000 70.6% chr1 - 5075051 5075126 76 browser details YourSeq 41 333 420 2000 97.8% chr1 - 35635861 35636050 190 browser details YourSeq 40 314 358 2000 95.5% chr5 + 28808818 28808864 47 browser details YourSeq 40 310 360 2000 93.7% chr11 + 45589842 45590087 246 browser details YourSeq 39 314 356 2000 97.7% chr18 - 21519605 21519658 54 browser details YourSeq 39 311 357 2000 88.4% chr15 + 101822447 101822491 45 browser details YourSeq 38 312 394 2000 93.2% chr4 - 107043840 107044363 524 browser details YourSeq 37 347 399 2000 95.3% chr1 - 128571864 128571918 55 browser details YourSeq 34 318 354 2000 100.0% chrY - 2740685 2740956 272 browser details YourSeq 34 342 640 2000 92.5% chr3 - 96280675 96281012 338 browser details YourSeq 32 316 359 2000 97.1% chr2 + 156026413 156026490 78 browser details YourSeq 32 311 352 2000 92.2% chr1 + 88472903 88472956 54 browser details YourSeq 29 376 423 2000 72.8% chr1 + 9730084 9730123 40 browser details YourSeq 28 639 666 2000 100.0% chr2 - 110893369 110893396 28

Note: The 2000 bp section downstream of Exon 10 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 http://www.alphaknockout.com/ Gene and protein information: Cpa4 carboxypeptidase A4 [ Mus musculus (house mouse) ] Gene ID: 71791, updated on 12-Aug-2019

Gene summary

Official Symbol Cpa4 provided by MGI Official Full Name carboxypeptidase A4 provided by MGI Primary source MGI:MGI:1919041 See related Ensembl:ENSMUSG00000039070 Gene type protein coding RefSeq status REVIEWED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AV009555; 1110019K20Rik Summary This gene encodes a member of the family of metalloproteases that could be involved in the Expression hyperacetylation pathway. The encoded preproprotein undergoes proteolytic processing that removes the N- terminal activation peptide to generate a functional . This gene is located in a cluster of carboxypeptidase on chromosome 6. [provided by RefSeq, Jul 2016] Orthologs Biased expression in stomach adult (RPKM 3.3), limb E14.5 (RPKM 1.8) and 6 other tissues See more all

Genomic context

Location: 6; 6 A3.3 See Cpa4 in Genome Data Viewer

Exon count: 12

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 6 NC_000072.6 (30568369..30592418)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 6 NC_000072.5 (30518376..30541747)

Chromosome 6 - NC_000072.6

Page 6 of 8 http://www.alphaknockout.com/

Transcript information: This gene has 1 transcript

Gene: Cpa4 ENSMUSG00000039070

Description carboxypeptidase A4 [Source:MGI Symbol;Acc:MGI:1919041] Gene Synonyms 1110019K20Rik Location Chromosome 6: 30,568,369-30,592,418 forward strand. GRCm38:CM000999.2 About this gene This gene has 1 transcript (splice variant), 126 orthologues, 7 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cpa4-201 ENSMUST00000049251.5 2739 420aa ENSMUSP00000048558.5 Protein coding CCDS19975 Q6P8K8 TSL:1 GENCODE basic APPRIS P1

44.05 kb Forward strand

30.56Mb 30.57Mb 30.58Mb 30.59Mb 30.60Mb Genes (Comprehensive set... Cpa2-201 >protein coding Cpa4-201 >protein coding

Cpa2-207 >retained intron

Contigs AC155656.10 >

Genes < Gm13781-201antisense (Comprehensive set...

Regulatory Build

30.56Mb 30.57Mb 30.58Mb 30.59Mb 30.60Mb Reverse strand 44.05 kb

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

processed transcript

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Page 7 of 8 http://www.alphaknockout.com/

Transcript: ENSMUST00000049251

24.05 kb Forward strand

Cpa4-201 >protein coding

ENSMUSP00000048... Cleavage site (Sign... Superfamily SSF54897 SSF53187

SMART Peptidase M14, carboxypeptidase A

Prints Peptidase M14, carboxypeptidase A Pfam Carboxypeptidase, activation peptide

Peptidase M14, carboxypeptidase A PROSITE patterns Peptidase M14, carboxypeptidase A Peptidase M14, carboxypeptidase A

PANTHER PTHR11705:SF71

PTHR11705 Gene3D Metallocarboxypeptidase-like, propeptide

3.40.630.10 CDD Carboxypeptidase A, carboxypeptidase domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 420

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC, VectorBuilder.

Page 8 of 8