https://www.alphaknockout.com

Mouse Pcdha9 Knockout Project (CRISPR/Cas9)

Objective: To create a Pcdha9 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Pcdha9 (NCBI Reference Sequence: NM_138661 ; Ensembl: ENSMUSG00000103770 ) is located on Mouse 18. 4 are identified, with the ATG start codon in 1 and the TGA stop codon in exon 4 (Transcript: ENSMUST00000115659). Exon 1 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 1 starts from the coding region. Exon 1 covers 84.47% of the coding region. The size of effective KO region: ~2481 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 4

Legends Exon of mouse Pcdha9 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section of Exon 1 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section of Exon 1 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.0% 480) | C(22.85% 457) | T(24.3% 486) | G(28.85% 577)

Note: The 2000 bp section of Exon 1 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.45% 469) | C(24.1% 482) | T(24.1% 482) | G(28.35% 567)

Note: The 2000 bp section of Exon 1 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr18 + 36997882 36999881 2000 browser details YourSeq 991 166 1982 2000 84.8% chr18 + 36967833 37012751 44919 browser details YourSeq 980 166 1980 2000 82.1% chr18 + 36939395 36989076 49682 browser details YourSeq 957 166 1980 2000 82.2% chr18 + 36930365 37007217 76853 browser details YourSeq 907 166 1920 2000 87.8% chr18 + 36974001 37022061 48061 browser details YourSeq 806 166 1785 2000 91.5% chr18 + 36946275 37012551 66277 browser details YourSeq 768 166 1982 2000 91.6% chr18 + 36960517 36962327 1811 browser details YourSeq 582 1220 2000 2000 89.3% chr18 + 36931419 36948103 16685 browser details YourSeq 445 1226 1771 2000 91.0% chr18 + 36975061 36981094 6034 browser details YourSeq 316 166 571 2000 89.0% chr18 + 37020307 37020712 406 browser details YourSeq 290 166 477 2000 96.5% chr18 + 37010935 37011246 312 browser details YourSeq 286 167 480 2000 95.6% chr18 + 37005401 37005714 314 browser details YourSeq 278 167 474 2000 95.2% chr18 + 36980135 36980442 308 browser details YourSeq 112 167 480 2000 76.7% chr18 + 36992545 37766508 773964 browser details YourSeq 73 1842 1982 2000 88.5% chr18 + 36932032 36994363 62332 browser details YourSeq 34 460 571 2000 97.3% chr18 - 4198591 4198706 116 browser details YourSeq 25 891 916 2000 100.0% chr11 - 39916564 39916591 28 browser details YourSeq 25 778 803 2000 100.0% chr1 + 185724471 185724504 34 browser details YourSeq 23 1665 1693 2000 77.0% chr11 - 74858147 74858172 26 browser details YourSeq 22 354 377 2000 95.9% chr18 + 37747447 37747470 24

Note: The 2000 bp section of Exon 1 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr18 + 36998361 37000360 2000 browser details YourSeq 922 45 1924 2000 77.9% chr18 + 36992902 37022547 29646 browser details YourSeq 906 309 1948 2000 78.0% chr18 + 36930987 36948524 17538 browser details YourSeq 845 555 1930 2000 80.9% chr18 + 36953711 37013184 59474 browser details YourSeq 797 639 1930 2000 83.2% chr18 + 36947227 36976247 29021 browser details YourSeq 687 726 1780 2000 87.8% chr18 + 36981173 37013028 31856 browser details YourSeq 646 752 1918 2000 86.7% chr18 + 36961582 36970064 8483 browser details YourSeq 557 689 1441 2000 87.0% chr18 + 37021309 37022061 753 browser details YourSeq 520 741 1864 2000 91.6% chr18 + 36968887 36989439 20553 browser details YourSeq 473 747 1306 2000 92.4% chr18 + 36975061 37012551 37491 browser details YourSeq 225 1363 1931 2000 81.0% chr18 + 36932032 36955090 23059 browser details YourSeq 138 1681 1918 2000 79.0% chr18 + 36962505 36962742 238 browser details YourSeq 25 412 437 2000 100.0% chr11 - 39916564 39916591 28 browser details YourSeq 25 299 324 2000 100.0% chr1 + 185724471 185724504 34 browser details YourSeq 23 1957 1979 2000 100.0% chr11 - 10312830 10312852 23 browser details YourSeq 21 1485 1523 2000 77.0% chr18 + 36994345 36994383 39 browser details YourSeq 20 436 455 2000 100.0% chr10 + 27426837 27426856 20

Note: The 2000 bp section of Exon 1 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Pcdha9 protocadherin alpha 9 [ Mus musculus (house mouse) ] Gene ID: 192161, updated on 11-Sep-2019

Gene summary

Official Symbol Pcdha9 provided by MGI Official Full Name protocadherin alpha 9 provided by MGI Primary source MGI:MGI:2447322 See related Ensembl:ENSMUSG00000103770 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Expression Biased expression in CNS E18 (RPKM 19.3), cortex adult (RPKM 12.0) and 7 other tissues See more Orthologs human all

Genomic context

Location: 18; 18 B2-B3 See Pcdha9 in Genome Data Viewer

Exon count: 4

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 18 NC_000084.6 (36997880..37187657)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 18 NC_000084.5 (37157534..37347311)

Chromosome 18 - NC_000084.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Pcdha9 ENSMUSG00000103770

Description protocadherin alpha 9 [Source:MGI Symbol;Acc:MGI:2447322] Location Chromosome 18: 36,997,880-37,187,657 forward strand. GRCm38:CM001011.2 About this gene This gene has 1 transcript (splice variant), 24 orthologues, 69 paralogues, is a member of 1 Ensembl protein family and is associated with 4 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Pcdha9-201 ENSMUST00000115659.5 5341 979aa ENSMUSP00000111323.3 Protein coding CCDS37777 Q91Y11 TSL:1 GENCODE basic APPRIS P1

209.78 kb Forward strand 37.00Mb 37.05Mb 37.10Mb 37.15Mb Pcdha5-201 >protein coding (Comprehensive set...

Pcdha1-202 >protein coding

Pcdha1-201 >protein coding

Gm36858-201 >unprocessed pseudogene Gm19035-201 >lncRNA Pcdhac2-201 >protein coding

Gm37013-201 >protein coding

Pcdha4-202 >protein coding

Pcdha4-201 >protein coding

Pcdha3-201 >protein coding

Pcdha6-202 >protein coding

Pcdha6-201 >protein coding

Pcdha6-203 >nonsense mediated decay

Pcdha7-201 >protein coding

Gm42416-201 >protein coding

Pcdha2-201 >protein coding

Pcdha2-202 >protein coding

Gm37388-201 >protein coding

Pcdha8-201 >protein coding

Pcdha9-201 >protein coding

Pcdha11-201 >protein coding

Pcdha11-205 >retained intron Gm19035-202 >transcribed processed pseudogene

Pcdha11-206 >retained intron

Pcdha11-202 >protein coding Page 7 of 9

Pcdha11-203 >protein coding

Pcdha11-204 >protein coding

Gm18150-201 >unprocessed pseudogene Pcdhac1-201 >protein coding

Pcdha12-201 >protein coding

Contigs < AC020972.3 AC020973.3 > Genes < Gm10545-201lncRNA < Gm10544-201lncRNA (Comprehensive set...

< Gm10544-203retained intron

< Gm38097-201lncRNA

< Gm10544-202lncRNA

Regulatory Build

37.00Mb 37.05Mb 37.10Mb 37.15Mb Reverse strand 209.78 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene pseudogene processed transcript 209.78 kb Forward strand 37.00Mb 37.05Mb 37.10Mb 37.15Mb Genes Pcdha5-201 >protein coding (Comprehensive set...

Pcdha1-202 >protein coding

Pcdha1-201 >protein coding

Gm36858-201 >unprocessed pseudogene Gm19035-201 >lncRNA Pcdhac2-201 >protein coding

Gm37013-201 >protein coding

Pcdha4-202 >protein coding

Pcdha4-201 >protein coding

Pcdha3-201 >protein coding

Pcdha6-202 >protein coding

Pcdha6-201 >protein coding

Pcdha6-203 >nonsense mediated decay

Pcdha7-201 >protein coding

Gm42416-201 >protein coding

Pcdha2-201 >protein coding

Pcdha2-202 >protein coding

Gm37388-201 >protein coding

Pcdha8-201 >protein coding

Pcdha9-201 >protein coding

Pcdha11-201 >protein coding

Pcdha11-205 >retained intron Gm19035-202 >transcribed processed pseudogene

Pcdha11-206 >retained intron https://www.alphaknockout.com

Pcdha11-202 >protein coding

Pcdha11-203 >protein coding

Pcdha11-204 >protein coding

Gm18150-201 >unprocessed pseudogene Pcdhac1-201 >protein coding

Pcdha12-201 >protein coding

Contigs < AC020972.3 AC020973.3 > Genes < Gm10545-201lncRNA < Gm10544-201lncRNA (Comprehensive set...

< Gm10544-203retained intron

< Gm38097-201lncRNA

< Gm10544-202lncRNA

Regulatory Build

37.00Mb 37.05Mb 37.10Mb 37.15Mb Reverse strand 209.78 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene pseudogene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000115659

189.78 kb Forward strand

Pcdha9-201 >protein coding

ENSMUSP00000111... Transmembrane heli... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Cadherin-like superfamily SMART Cadherin-like Prints Cadherin-like Pfam Cadherin, N-terminal Cadherin-like Cadherin, C-terminal catenin-binding domain

PROSITE profiles PS50268 PROSITE patterns Cadherin conserved site PANTHER PTHR24028:SF124

PTHR24028 Gene3D 2.60.40.60 CDD cd11304

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

frameshift variant missense variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 979

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9