https://www.alphaknockout.com

Mouse Pcgf1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Pcgf1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Pcgf1 (NCBI Reference Sequence: NM_197992 ; Ensembl: ENSMUSG00000069678 ) is located on Mouse 6. 9 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 9 (Transcript: ENSMUST00000092614). Exon 1~9 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Pcgf1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-277H1 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 1~9 covers 100.0% of the coding region. Start codon is in exon 1, and stop codon is in exon 9. The size of effective cKO region: ~2972 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

gRNA region

Wildtype allele A T

5' G gRNA region 3'

1 2 3 4 5 6 7 8 9

Targeting vector A T G

Targeted allele A T G

Constitutive KO allele (After Cre recombination)

Legends Homology arm Exon of mouse Pcgf1 cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8333bp) | A(24.79% 2066) | C(24.58% 2048) | T(24.92% 2077) | G(25.71% 2142)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr6 + 83075423 83078422 3000 browser details YourSeq 190 917 1457 3000 91.0% chr4 + 83555395 83555967 573 browser details YourSeq 181 944 1453 3000 83.1% chr9 + 65918802 65919099 298 browser details YourSeq 176 1209 1452 3000 92.3% chr8 - 117013135 117013589 455 browser details YourSeq 168 909 1453 3000 82.5% chr7 - 126729893 126730240 348 browser details YourSeq 164 944 1457 3000 86.4% chr16 - 32088844 32089327 484 browser details YourSeq 164 1216 1448 3000 92.0% chr11 - 75708858 75709241 384 browser details YourSeq 144 1223 1452 3000 90.4% chr18 - 79691844 79692151 308 browser details YourSeq 140 1291 1457 3000 93.3% chr1 - 16567794 16567963 170 browser details YourSeq 135 1191 1417 3000 92.0% chr9 - 108650872 108651246 375 browser details YourSeq 135 1254 1456 3000 92.0% chr15 - 79331160 79331460 301 browser details YourSeq 131 1207 1457 3000 85.5% chr11 - 5208204 5208395 192 browser details YourSeq 130 1294 1453 3000 92.4% chr3 - 103720388 103720558 171 browser details YourSeq 130 1192 1457 3000 81.4% chr12 - 21438586 21438750 165 browser details YourSeq 130 1289 1457 3000 87.2% chr10 - 38891853 38892012 160 browser details YourSeq 129 1079 1456 3000 80.9% chr5 - 12456765 12456934 170 browser details YourSeq 129 1291 1449 3000 94.6% chr12 - 4896215 4896378 164 browser details YourSeq 128 1199 1457 3000 82.7% chr14 - 84678422 84678577 156 browser details YourSeq 128 1207 1457 3000 83.6% chr16 + 92541392 92541553 162 browser details YourSeq 126 1209 1457 3000 82.6% chr12 - 78870286 78870434 149

Note: The 3000 bp section upstream of Exon 1 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr6 + 83080756 83083755 3000 browser details YourSeq 275 594 1662 3000 89.3% chr13 - 46912673 47002822 90150 browser details YourSeq 215 594 1020 3000 91.3% chr14 - 31935196 31936075 880 browser details YourSeq 206 866 1200 3000 91.0% chr6 + 73039231 73039805 575 browser details YourSeq 201 597 1200 3000 82.5% chr9 - 83094638 83095054 417 browser details YourSeq 183 925 1202 3000 89.0% chr3 - 80375732 80376253 522 browser details YourSeq 174 752 1200 3000 83.6% chr11 - 29364031 29364267 237 browser details YourSeq 167 875 1191 3000 89.7% chr3 - 88935460 88936025 566 browser details YourSeq 167 600 1200 3000 79.7% chr1 - 46281666 46282080 415 browser details YourSeq 159 657 1213 3000 80.8% chr19 - 43919827 43920167 341 browser details YourSeq 158 1489 1891 3000 89.7% chr16 - 91166437 91166866 430 browser details YourSeq 154 1488 1674 3000 91.9% chr2 + 25385030 25385218 189 browser details YourSeq 152 1476 1668 3000 91.0% chr11 - 98860101 98860297 197 browser details YourSeq 152 1455 1667 3000 91.4% chr3 + 88018249 88018668 420 browser details YourSeq 151 597 1200 3000 79.9% chr14 - 30476683 30477088 406 browser details YourSeq 150 594 1196 3000 79.5% chr6 + 87748423 87748883 461 browser details YourSeq 150 1485 1668 3000 91.3% chr1 + 33723127 33723315 189 browser details YourSeq 149 1479 1668 3000 92.1% chr4 + 151998843 151999200 358 browser details YourSeq 149 1481 1668 3000 90.5% chr4 + 6372474 6372668 195 browser details YourSeq 149 619 1191 3000 78.4% chr3 + 120855983 120856367 385

Note: The 3000 bp section downstream of Exon 9 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Pcgf1 polycomb group ring finger 1 [ Mus musculus (house mouse) ] Gene ID: 69837, updated on 12-Aug-2019

Gene summary

Official Symbol Pcgf1 provided by MGI Official Full Name polycomb group ring finger 1 provided by MGI Primary source MGI:MGI:1917087 See related Ensembl:ENSMUSG00000069678 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Nspc1; AU024121; 2010002K04Rik Expression Ubiquitous expression in CNS E14 (RPKM 15.0), testis adult (RPKM 12.7) and 28 other tissues See more Orthologs human all

Genomic context

Location: 6 C3; 6 35.94 cM See Pcgf1 in Genome Data Viewer

Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 6 NC_000072.6 (83077552..83080855)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 6 NC_000072.5 (83028384..83030849)

Chromosome 6 - NC_000072.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Pcgf1 ENSMUSG00000069678

Description polycomb group ring finger 1 [Source:MGI Symbol;Acc:MGI:1917087] Gene Synonyms 2010002K04Rik, Nspc1 Location Chromosome 6: 83,077,869-83,080,855 forward strand. GRCm38:CM000999.2 About this gene This gene has 8 transcripts (splice variants), 192 orthologues, 7 paralogues, is a member of 1 Ensembl protein family and is associated with 20 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Pcgf1- ENSMUST00000092614.8 866 247aa ENSMUSP00000090277.2 Protein coding CCDS20270 A0A0R4J141 TSL:1 201 GENCODE basic

Pcgf1- ENSMUST00000165164.8 918 259aa ENSMUSP00000130614.2 Protein coding - Q8R023 TSL:1 202 GENCODE basic APPRIS P1

Pcgf1- ENSMUST00000177177.7 854 176aa ENSMUSP00000135291.1 Protein coding - H3BK85 TSL:1 207 GENCODE basic

Pcgf1- ENSMUST00000176100.1 549 52aa ENSMUSP00000135882.1 Protein coding - H3BLR0 CDS 3' 205 incomplete TSL:3

Pcgf1- ENSMUST00000176089.1 492 108aa ENSMUSP00000135268.1 Protein coding - H3BK63 TSL:5 204 GENCODE basic

Pcgf1- ENSMUST00000176027.7 731 106aa ENSMUSP00000135664.1 Nonsense mediated - H3BL61 TSL:5 203 decay

Pcgf1- ENSMUST00000204211.1 580 No - Retained intron - - TSL:5 208 protein

Pcgf1- ENSMUST00000176372.1 426 No - Retained intron - - TSL:5 206 protein

Page 6 of 8 https://www.alphaknockout.com

22.99 kb Forward strand

83.07Mb 83.08Mb 83.09Mb (Comprehensive set... Pcgf1-205 >protein coding Lbx2-201 >protein coding

Pcgf1-202 >protein coding Lbx2-202 >lncRNA

Pcgf1-201 >protein coding

Pcgf1-207 >protein coding

Pcgf1-203 >nonsense mediated decay

Pcgf1-204 >protein coding

Pcgf1-208 >retained intron

Pcgf1-206 >retained intron

Contigs AC104324.21 > Genes < Tlx2-202protein coding < Gm37092-201TEC < Mir3470a-201miRNA (Comprehensive set...

< Tlx2-201protein coding

Regulatory Build

83.07Mb 83.08Mb 83.09Mb Reverse strand 22.99 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000092614

2.46 kb Forward strand

Pcgf1-201 >protein coding

ENSMUSP00000090... Low complexity (Seg) Superfamily SSF57850 SMART Zinc finger, RING-type Pfam PF13923 RAWUL domain

PROSITE profiles Zinc finger, RING-type PROSITE patterns Zinc finger, RING-type, conserved site PANTHER PTHR10825:SF29

PTHR10825 Gene3D Zinc finger, RING/FYVE/PHD-type 3.10.20.90

CDD cd16733 cd17081

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 247

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8