https://www.alphaknockout.com

Mouse P4ha1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a P4ha1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The P4ha1 (NCBI Reference Sequence: NM_011030 ; Ensembl: ENSMUSG00000019916 ) is located on Mouse 10. 15 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 15 (Transcript: ENSMUST00000009789). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse P4ha1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-417A11 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null mutation display embryonic lethality during organogenesis, capillary ruptures, and impaired basement membrane formation.

Exon 3 starts from about 4.81% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 2139 bp, and the size of intron 3 for 3'-loxP site insertion: 3291 bp. The size of effective cKO region: ~597 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 15 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse P4ha1 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7097bp) | A(29.67% 2106) | C(19.15% 1359) | T(29.76% 2112) | G(21.42% 1520)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr10 + 59336047 59339046 3000 browser details YourSeq 249 1258 2385 3000 89.4% chr5 - 29445662 29729927 284266 browser details YourSeq 168 2099 2826 3000 76.6% chr16 + 56128358 56128908 551 browser details YourSeq 163 916 1440 3000 84.0% chr10 - 24636491 24636909 419 browser details YourSeq 162 1791 2619 3000 78.5% chr13 + 3697953 3698520 568 browser details YourSeq 161 1234 1441 3000 89.9% chr7 - 117680775 117680975 201 browser details YourSeq 160 826 1441 3000 84.4% chr5 - 122413684 122413883 200 browser details YourSeq 160 1255 1441 3000 93.1% chr1 + 182154876 182155066 191 browser details YourSeq 159 1254 1439 3000 93.0% chr9 - 110882877 110883233 357 browser details YourSeq 157 1234 1441 3000 89.1% chrX - 104138473 104138677 205 browser details YourSeq 157 1232 1441 3000 86.2% chr11 + 105440459 105440658 200 browser details YourSeq 156 1254 1441 3000 91.5% chr7 - 100585378 100585565 188 browser details YourSeq 156 1253 1441 3000 90.9% chr12 - 3757583 3757769 187 browser details YourSeq 156 1253 1446 3000 93.9% chr11 - 115391147 115391649 503 browser details YourSeq 155 1255 1441 3000 93.9% chr1 - 181036974 181037173 200 browser details YourSeq 155 1233 1435 3000 87.2% chr9 + 86041864 86042062 199 browser details YourSeq 154 1256 1441 3000 93.3% chr8 + 105581961 105797174 215214 browser details YourSeq 154 1250 1441 3000 89.2% chr2 + 165892164 165892351 188 browser details YourSeq 153 1236 1442 3000 87.6% chr5 - 33263228 33263425 198 browser details YourSeq 153 1255 1485 3000 87.7% chr2 + 170237792 170237998 207

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr10 + 59339644 59342643 3000 browser details YourSeq 232 1189 1649 3000 87.7% chr18 + 90228294 90229111 818 browser details YourSeq 231 1171 1647 3000 87.9% chrX - 68737333 68738149 817 browser details YourSeq 231 1076 1514 3000 82.7% chr14 + 60314149 60314593 445 browser details YourSeq 230 1186 1774 3000 87.9% chr16 + 30110064 30110784 721 browser details YourSeq 228 1075 1849 3000 90.8% chr14 + 20274251 20275164 914 browser details YourSeq 225 1073 1622 3000 84.7% chr12 + 98598194 98599091 898 browser details YourSeq 223 1077 1646 3000 88.3% chr15 - 16840456 16996618 156163 browser details YourSeq 222 1185 1807 3000 87.5% chr5 - 129376426 129377249 824 browser details YourSeq 219 1176 1788 3000 89.1% chr6 + 137107361 137108227 867 browser details YourSeq 213 1080 1514 3000 82.6% chr7 + 99053002 99053438 437 browser details YourSeq 213 1184 1505 3000 89.4% chr15 + 37806655 37807039 385 browser details YourSeq 211 1079 1723 3000 86.1% chr7 - 38176466 38177412 947 browser details YourSeq 205 1187 1658 3000 86.8% chr9 - 54578363 54579061 699 browser details YourSeq 204 1192 1536 3000 86.3% chr3 + 42100798 42101137 340 browser details YourSeq 204 1184 1534 3000 87.2% chr13 + 21606514 21606873 360 browser details YourSeq 203 1080 1536 3000 88.6% chr6 - 140747322 140747812 491 browser details YourSeq 203 1185 1669 3000 84.7% chr11 + 81481514 81482234 721 browser details YourSeq 201 1073 1651 3000 86.4% chr9 - 75691421 75692241 821 browser details YourSeq 201 1209 1811 3000 85.1% chr13 + 102475504 102476308 805

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and protein information: P4ha1 procollagen-proline, 2-oxoglutarate 4-dioxygenase (proline 4-hydroxylase), alpha 1 polypeptide [ Mus musculus (house mouse) ] Gene ID: 18451, updated on 21-Oct-2019

Gene summary

Official Symbol P4ha1 provided by MGI Official Full Name procollagen-proline, 2-oxoglutarate 4-dioxygenase (proline 4-hydroxylase), alpha 1 polypeptide provided by MGI Primary source MGI:MGI:97463 See related Ensembl:ENSMUSG00000019916 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as P4ha; AL022634 Expression Ubiquitous expression in limb E14.5 (RPKM 22.1), placenta adult (RPKM 17.9) and 25 other tissues See more Orthologs human all

Genomic context

Location: 10; 10 B4 See P4ha1 in Genome Data Viewer

Exon count: 16

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 10 NC_000076.6 (59323197..59373304)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 10 NC_000076.5 (58786044..58836052)

Chromosome 10 - NC_000076.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: P4ha1 ENSMUSG00000019916

Description procollagen-proline, 2-oxoglutarate 4-dioxygenase (proline 4-hydroxylase), alpha 1 polypeptide [Source:MGI Symbol;Acc:MGI:97463] Gene Synonyms P4ha Location : 59,323,296-59,373,304 forward strand. GRCm38:CM001003.2 About this gene This gene has 3 transcripts (splice variants), 207 orthologues, 3 paralogues, is a member of 1 Ensembl protein family and is associated with 8 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

P4ha1-201 ENSMUST00000009789.14 4043 534aa ENSMUSP00000009789.8 Protein coding CCDS23863 Q60715 TSL:1 GENCODE basic APPRIS P3

P4ha1-203 ENSMUST00000105466.2 2918 534aa ENSMUSP00000101106.2 Protein coding CCDS83698 Q60715 TSL:1 GENCODE basic APPRIS ALT1

P4ha1-202 ENSMUST00000092512.10 2387 454aa ENSMUSP00000090170.4 Protein coding CCDS83699 E9Q7B0 TSL:1 GENCODE basic

70.01 kb Forward strand

Genes (Comprehensive set... P4ha1-201 >protein coding

P4ha1-202 >protein coding

P4ha1-203 >protein coding

Contigs AC157089.5 > < AC022699.11 Regulatory Build

Reverse strand 70.01 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000009789

50.01 kb Forward strand

P4ha1-201 >protein coding

ENSMUSP00000009... Low complexity (Seg) Cleavage site (Sign... Superfamily Tetratricopeptide-like helical domain superfamily SMART Prolyl 4-hydroxylase, alpha subunit

Pfam Prolyl 4-hydroxylase alpha-subunit, N-terminal Oxoglutarate/iron-dependent dioxygenase

PROSITE profiles Tetratricopeptide repeat-containing domain Oxoglutarate/iron-dependent dioxygenase

Tetratricopeptide repeat PANTHER PTHR10869:SF101

PTHR10869 Gene3D Tetratricopeptide-like helical domain superfamily 2.60.120.620

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant stop retained variant

Scale bar 0 60 120 180 240 300 360 420 534

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7