https://www.alphaknockout.com

Mouse Padi1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Padi1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Padi1 (NCBI Reference Sequence: NM_011059 ; Ensembl: ENSMUSG00000025329 ) is located on Mouse 4. 16 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 16 (Transcript: ENSMUST00000026378). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Padi1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-131M6 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 4.68% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 13087 bp, and the size of intron 2 for 3'-loxP site insertion: 774 bp. The size of effective cKO region: ~681 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 4 16 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Padi1 arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7181bp) | A(24.04% 1726) | C(25.37% 1822) | T(21.19% 1522) | G(29.4% 2111)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr4 - 140832723 140835722 3000 browser details YourSeq 65 12 82 3000 95.8% chr5 - 71722535 71722605 71 browser details YourSeq 43 28 76 3000 95.8% chr1 + 87962883 87962938 56 browser details YourSeq 42 31 76 3000 95.7% chr1 + 78594397 78594442 46 browser details YourSeq 41 2674 2930 3000 57.2% chr7 - 28064655 28064771 117 browser details YourSeq 37 38 76 3000 97.5% chr8 + 122788701 122788739 39 browser details YourSeq 37 43 81 3000 97.5% chr13 + 34101590 34101628 39 browser details YourSeq 33 43 75 3000 100.0% chr5 - 122485881 122485913 33 browser details YourSeq 31 2861 2895 3000 94.3% chr6 - 83781032 83781066 35 browser details YourSeq 29 2706 2734 3000 100.0% chr6 + 8733989 8734017 29 browser details YourSeq 29 46 76 3000 96.8% chr1 + 69140438 69140468 31 browser details YourSeq 26 2571 2598 3000 96.5% chr1 + 35914674 35914701 28 browser details YourSeq 23 2546 2571 3000 83.4% chr2 + 73608488 73608511 24 browser details YourSeq 23 54 76 3000 100.0% chr1 + 60921209 60921231 23 browser details YourSeq 22 54 75 3000 100.0% chr1 + 89044320 89044341 22 browser details YourSeq 20 111 136 3000 88.5% chr3 - 90846351 90846376 26

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr4 - 140829042 140832041 3000 browser details YourSeq 118 1436 1673 3000 96.2% chrX - 74259155 74259562 408 browser details YourSeq 114 1549 1716 3000 94.6% chr9 - 78517083 78517642 560 browser details YourSeq 113 1549 1674 3000 96.0% chr2 + 151992459 151992644 186 browser details YourSeq 112 1550 1675 3000 96.0% chr1 + 87335959 87336255 297 browser details YourSeq 111 1549 1673 3000 95.9% chr5 - 114167574 114167762 189 browser details YourSeq 110 1549 1674 3000 94.4% chr17 + 23627120 23627283 164 browser details YourSeq 109 1549 1673 3000 95.1% chr11 - 98191544 98191726 183 browser details YourSeq 109 1549 1675 3000 95.2% chr11 + 82767933 82768135 203 browser details YourSeq 109 1550 1673 3000 95.1% chr1 + 112576081 112576262 182 browser details YourSeq 108 1549 1672 3000 95.1% chr19 - 3595477 3595660 184 browser details YourSeq 107 1549 1674 3000 93.6% chr13 - 91440614 91440897 284 browser details YourSeq 107 1549 1675 3000 94.4% chr1 - 77033103 77314218 281116 browser details YourSeq 107 1556 1675 3000 95.8% chr1 - 74408099 74408288 190 browser details YourSeq 107 1549 1675 3000 95.0% chr5 + 123724921 123725109 189 browser details YourSeq 107 1549 1676 3000 94.3% chr17 + 45620859 45621050 192 browser details YourSeq 107 1549 1678 3000 92.2% chr16 + 48341805 48341988 184 browser details YourSeq 107 1556 1672 3000 96.6% chr15 + 103210082 103210256 175 browser details YourSeq 107 1553 1685 3000 94.3% chr12 + 102755946 102756131 186 browser details YourSeq 106 1549 1673 3000 95.0% chr11 - 84927727 84927884 158

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Padi1 peptidyl deiminase, type I [ Mus musculus (house mouse) ] Gene ID: 18599, updated on 12-Aug-2019

Gene summary

Official Symbol Padi1 provided by MGI Official Full Name peptidyl arginine deiminase, type I provided by MGI Primary source MGI:MGI:1338893 See related Ensembl:ENSMUSG00000025329 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Pdi1; AV236283 Expression Biased expression in ovary adult (RPKM 15.4), subcutaneous fat pad adult (RPKM 3.0) and 1 other tissueS ee more Orthologs human all

Genomic context

Location: 4 D3; 4 72.62 cM See Padi1 in Genome Data Viewer

Exon count: 16

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (140812981..140845778, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (140368896..140401693, complement)

Chromosome 4 - NC_000070.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Padi1 ENSMUSG00000025329

Description peptidyl arginine deiminase, type I [Source:MGI Symbol;Acc:MGI:1338893] Gene Synonyms Pad type 1, Pdi1 Location Chromosome 4: 140,812,983-140,845,778 reverse strand. GRCm38:CM000997.2 About this gene This gene has 1 transcript (splice variant), 93 orthologues, 4 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Padi1-201 ENSMUST00000026378.3 3754 662aa ENSMUSP00000026378.3 Protein coding CCDS18856 Q544I4 Q9Z185 TSL:1 GENCODE basic APPRIS P1

52.80 kb Forward strand

140.81Mb 140.82Mb 140.83Mb 140.84Mb 140.85Mb Gm13032-201 >lncRNA (Comprehensive set...

Contigs AL807805.7 > Genes (Comprehensive set... < Padi3-201protein coding < Padi1-201protein coding

< Padi3-202protein coding

Regulatory Build

140.81Mb 140.82Mb 140.83Mb 140.84Mb 140.85Mb Reverse strand 52.80 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000026378

< Padi1-201protein coding

Reverse strand 32.80 kb

ENSMUSP00000026... Low complexity (Seg) Superfamily Cupredoxin Protein-arginine deiminase, central domain superfamily

SSF55909 Pfam Protein-arginine deiminase (PAD) N-terminal Protein-arginine deiminase, C-terminal

Protein-arginine deiminase (PAD), central domain PIRSF Protein-arginine deiminase

PANTHER Protein-arginine deiminase

PTHR10837:SF11 Gene3D PAD, N-terminal domain superfamily 3.75.10.10

Protein-arginine deiminase, central domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 600 662

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7