https://www.alphaknockout.com

Mouse Cpa4 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Cpa4 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cpa4 (NCBI Reference Sequence: NM_027926 ; Ensembl: ENSMUSG00000039070 ) is located on Mouse 6. 11 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 11 (Transcript: ENSMUST00000049251). Exon 2~3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Cpa4 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-368N11 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 5.48% of the coding region. The knockout of Exon 2~3 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 5401 bp, and the size of intron 3 for 3'-loxP site insertion: 1549 bp. The size of effective cKO region: ~1136 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 11 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Cpa4 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7636bp) | A(24.76% 1891) | C(24.21% 1849) | T(29.28% 2236) | G(21.74% 1660)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr6 + 30570614 30573613 3000 browser details YourSeq 152 742 992 3000 91.8% chr6 - 30571348 30571614 267 browser details YourSeq 61 736 866 3000 82.3% chr7 + 80665290 80665422 133 browser details YourSeq 57 717 860 3000 89.2% chr11 - 70691440 70691590 151 browser details YourSeq 52 780 867 3000 89.3% chr11 + 70485869 70485956 88 browser details YourSeq 51 905 1001 3000 94.8% chr14 + 54427884 54427980 97 browser details YourSeq 50 906 997 3000 86.0% chr11 + 5173356 5173446 91 browser details YourSeq 48 791 866 3000 84.0% chr13 + 64297547 64297613 67 browser details YourSeq 44 902 997 3000 95.9% chr5 + 24962534 24962738 205 browser details YourSeq 42 1268 1348 3000 85.2% chr10 + 64270424 64270501 78 browser details YourSeq 41 902 948 3000 89.2% chr9 - 20477706 20477751 46 browser details YourSeq 40 780 825 3000 93.5% chr15 + 84070939 84070984 46 browser details YourSeq 40 744 825 3000 74.4% chr10 + 93839875 93839956 82 browser details YourSeq 38 1491 1577 3000 81.5% chr1 - 192549092 192549177 86 browser details YourSeq 38 905 989 3000 84.8% chr17 + 36125016 36125098 83 browser details YourSeq 37 738 825 3000 97.5% chr18 - 31726751 31726844 94 browser details YourSeq 37 804 860 3000 82.5% chr11 + 87679545 87679601 57 browser details YourSeq 36 736 823 3000 95.0% chr4 - 109357046 109357343 298 browser details YourSeq 36 902 948 3000 95.0% chr11 - 90573066 90573112 47 browser details YourSeq 36 738 825 3000 70.5% chr10 - 90721263 90721350 88

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr6 + 30574750 30577749 3000 browser details YourSeq 78 2753 2839 3000 97.6% chr12 - 81953455 81953554 100 browser details YourSeq 78 2755 2852 3000 84.9% chr9 + 53779344 53779433 90 browser details YourSeq 76 2774 2852 3000 94.9% chr17 + 77218502 77218578 77 browser details YourSeq 74 2768 2843 3000 98.7% chr3 + 119278270 119278345 76 browser details YourSeq 73 2768 2847 3000 96.3% chr6 - 11597968 11598062 95 browser details YourSeq 73 2770 2845 3000 98.7% chr8 + 5831741 5831836 96 browser details YourSeq 73 2774 2847 3000 100.0% chr1 + 121872785 121873004 220 browser details YourSeq 71 2763 2839 3000 97.4% chr11 - 65400948 65401036 89 browser details YourSeq 70 2770 2839 3000 100.0% chrY - 77159296 77159365 70 browser details YourSeq 70 2770 2839 3000 100.0% chr8 - 89465998 89466067 70 browser details YourSeq 70 2770 2839 3000 100.0% chr17 - 92331568 92331637 70 browser details YourSeq 70 2769 2841 3000 98.7% chr12 - 96649135 96649211 77 browser details YourSeq 70 2768 2839 3000 98.7% chr1 - 106124820 106124891 72 browser details YourSeq 70 2770 2841 3000 100.0% chr14 + 103831878 103831960 83 browser details YourSeq 69 2769 2839 3000 98.6% chrX - 113898816 113898886 71 browser details YourSeq 69 2770 2839 3000 100.0% chr6 - 138957067 138957258 192 browser details YourSeq 69 2769 2839 3000 98.6% chr5 - 119181226 119181296 71 browser details YourSeq 69 2770 2839 3000 100.0% chr17 - 84869971 84870298 328 browser details YourSeq 69 2771 2839 3000 100.0% chr16 - 40359105 40359173 69

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and protein information: Cpa4 carboxypeptidase A4 [ Mus musculus (house mouse) ] Gene ID: 71791, updated on 12-Aug-2019

Gene summary

Official Symbol Cpa4 provided by MGI Official Full Name carboxypeptidase A4 provided by MGI Primary source MGI:MGI:1919041 See related Ensembl:ENSMUSG00000039070 Gene type protein coding RefSeq status REVIEWED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AV009555; 1110019K20Rik Summary This gene encodes a member of the family of metalloproteases that could be involved in the Expression hyperacetylation pathway. The encoded preproprotein undergoes proteolytic processing that removes the N- terminal activation peptide to generate a functional . This gene is located in a cluster of carboxypeptidase on chromosome 6. [provided by RefSeq, Jul 2016] Orthologs Biased expression in stomach adult (RPKM 3.3), limb E14.5 (RPKM 1.8) and 6 other tissues See more all

Genomic context

Location: 6; 6 A3.3 See Cpa4 in Genome Data Viewer

Exon count: 12

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 6 NC_000072.6 (30568369..30592418)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 6 NC_000072.5 (30518376..30541747)

Chromosome 6 - NC_000072.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Cpa4 ENSMUSG00000039070

Description carboxypeptidase A4 [Source:MGI Symbol;Acc:MGI:1919041] Gene Synonyms 1110019K20Rik Location Chromosome 6: 30,568,369-30,592,418 forward strand. GRCm38:CM000999.2 About this gene This gene has 1 transcript (splice variant), 126 orthologues, 7 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cpa4-201 ENSMUST00000049251.5 2739 420aa ENSMUSP00000048558.5 Protein coding CCDS19975 Q6P8K8 TSL:1 GENCODE basic APPRIS P1

44.05 kb Forward strand

30.56Mb 30.57Mb 30.58Mb 30.59Mb 30.60Mb Genes (Comprehensive set... Cpa2-201 >protein coding Cpa4-201 >protein coding

Cpa2-207 >retained intron

Contigs AC155656.10 > Genes < Gm13781-201lncRNA (Comprehensive set...

Regulatory Build

30.56Mb 30.57Mb 30.58Mb 30.59Mb 30.60Mb Reverse strand 44.05 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000049251

24.05 kb Forward strand

Cpa4-201 >protein coding

ENSMUSP00000048... Cleavage site (Sign... Superfamily SSF54897 SSF53187

SMART Peptidase M14, carboxypeptidase A

Prints Peptidase M14, carboxypeptidase A Pfam Peptidase M14, carboxypeptidase A

Carboxypeptidase, activation peptide PROSITE patterns Peptidase M14, carboxypeptidase A Peptidase M14, carboxypeptidase A

PANTHER PTHR11705:SF71

PTHR11705 Gene3D Metallocarboxypeptidase-like, propeptide

3.40.630.10 CDD Carboxypeptidase A, carboxypeptidase domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 420

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7