https://www.alphaknockout.com

Mouse Gcfc2 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Gcfc2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gcfc2 (NCBI Reference Sequence: NM_177884 ; Ensembl: ENSMUSG00000035125 ) is located on Mouse 6. 17 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 17 (Transcript: ENSMUST00000043195). Exon 5 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Gcfc2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-442L23 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 5 starts from about 29.69% of the coding region. The knockout of Exon 5 will result in frameshift of the gene. The size of intron 4 for 5'-loxP site insertion: 2994 bp, and the size of intron 5 for 3'-loxP site insertion: 3358 bp. The size of effective cKO region: ~616 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 5 17 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Gcfc2 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7116bp) | A(26.48% 1884) | C(19.83% 1411) | T(32.39% 2305) | G(21.3% 1516)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr6 + 81932657 81935656 3000 browser details YourSeq 173 1493 1667 3000 99.5% chrX - 36466504 36466678 175 browser details YourSeq 104 351 827 3000 80.0% chr5 + 147746729 147747173 445 browser details YourSeq 91 709 899 3000 88.9% chr14 - 70822160 70822354 195 browser details YourSeq 74 355 831 3000 71.7% chr17 - 48742224 48742662 439 browser details YourSeq 72 761 909 3000 87.5% chr11 + 69315796 69315949 154 browser details YourSeq 64 709 867 3000 81.4% chr5 + 117186629 117186957 329 browser details YourSeq 59 725 867 3000 87.4% chr13 - 76014115 76014257 143 browser details YourSeq 59 711 830 3000 84.8% chr1 + 166108468 166108748 281 browser details YourSeq 58 730 830 3000 82.4% chr1 - 79662300 79662391 92 browser details YourSeq 55 728 830 3000 86.6% chr6 - 82809716 82809816 101 browser details YourSeq 54 709 830 3000 92.2% chr1 - 192660416 192660570 155 browser details YourSeq 54 734 844 3000 89.8% chr1 + 87303822 87303934 113 browser details YourSeq 53 707 775 3000 85.3% chr7 + 91524710 91524777 68 browser details YourSeq 53 1145 1236 3000 79.4% chr13 + 43692395 43692487 93 browser details YourSeq 52 732 842 3000 86.2% chrX - 36032664 36032776 113 browser details YourSeq 52 698 775 3000 83.4% chr19 - 31156124 31156201 78 browser details YourSeq 52 1148 1350 3000 93.3% chr12 + 79417985 79418412 428 browser details YourSeq 50 709 780 3000 84.8% chrX - 163213165 163213236 72 browser details YourSeq 50 1201 1264 3000 89.1% chr5 - 129667498 129667561 64

Note: The 3000 bp section upstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr6 + 81936273 81939272 3000 browser details YourSeq 138 360 558 3000 86.2% chr10 - 93496906 93497078 173 browser details YourSeq 131 383 616 3000 86.6% chr16 - 24959408 24959640 233 browser details YourSeq 130 355 558 3000 81.6% chr15 - 42876589 42876759 171 browser details YourSeq 124 360 558 3000 84.8% chr12 + 95099058 95099236 179 browser details YourSeq 123 382 558 3000 89.0% chr16 + 88055032 88055201 170 browser details YourSeq 121 393 558 3000 88.1% chr3 - 88705253 88705420 168 browser details YourSeq 118 383 558 3000 86.1% chr18 + 27993938 27994121 184 browser details YourSeq 118 383 558 3000 82.0% chr11 + 58202233 58202391 159 browser details YourSeq 117 407 558 3000 90.6% chr1 - 171432537 171432693 157 browser details YourSeq 117 37 558 3000 78.5% chr13 + 38051846 38052032 187 browser details YourSeq 117 381 558 3000 81.8% chr13 + 29001380 29001549 170 browser details YourSeq 116 408 558 3000 89.4% chr11 - 20809404 20809559 156 browser details YourSeq 116 409 558 3000 93.4% chr5 + 122409518 122409679 162 browser details YourSeq 116 382 547 3000 83.5% chr18 + 22898691 22898838 148 browser details YourSeq 115 392 558 3000 84.6% chr6 - 51521896 51522059 164 browser details YourSeq 115 407 575 3000 88.4% chr3 + 62568991 62569169 179 browser details YourSeq 115 414 558 3000 91.0% chr15 + 79207112 79207260 149 browser details YourSeq 115 381 558 3000 85.7% chr12 + 68156597 68156766 170 browser details YourSeq 115 383 558 3000 88.3% chr10 + 38382942 38383128 187

Note: The 3000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Gcfc2 GC-rich sequence DNA binding factor 2 [ Mus musculus (house mouse) ] Gene ID: 330361, updated on 12-Aug-2019

Gene summary

Official Symbol Gcfc2 provided by MGI Official Full Name GC-rich sequence DNA binding factor 2 provided by MGI Primary source MGI:MGI:2141656 See related Ensembl:ENSMUSG00000035125 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GCF2; Tcf9; AW146020; A130099G21 Expression Ubiquitous expression in CNS E11.5 (RPKM 5.4), limb E14.5 (RPKM 4.3) and 27 other tissues See more Orthologs human all

Genomic context

Location: 6; 6 C3 See Gcfc2 in Genome Data Viewer

Exon count: 22

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 6 NC_000072.6 (81910562..81959098)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 6 NC_000072.5 (81873663..81909092)

Chromosome 6 - NC_000072.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Gcfc2 ENSMUSG00000035125

Description GC-rich sequence DNA binding factor 2 [Source:MGI Symbol;Acc:MGI:2141656] Gene Synonyms AW146020 Location Chromosome 6: 81,923,669-81,959,915 forward strand. GRCm38:CM000999.2 About this gene This gene has 8 transcripts (splice variants), 227 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Gcfc2- ENSMUST00000043195.10 4192 769aa ENSMUSP00000035644.4 Protein coding CCDS20260 Q8BKT3 TSL:1 201 GENCODE basic APPRIS P1

Gcfc2- ENSMUST00000203959.1 630 175aa ENSMUSP00000144868.1 Protein coding - A0A0N4SUY0 CDS 5' 207 incomplete TSL:3

Gcfc2- ENSMUST00000152996.7 3519 263aa ENSMUSP00000138136.1 Nonsense mediated - S4R198 TSL:1 206 decay

Gcfc2- ENSMUST00000132301.1 2429 No - Retained intron - - TSL:1 204 protein

Gcfc2- ENSMUST00000127949.1 1953 No - Retained intron - - TSL:1 202 protein

Gcfc2- ENSMUST00000129678.1 641 No - Retained intron - - TSL:3 203 protein

Gcfc2- ENSMUST00000147673.1 612 No - Retained intron - - TSL:3 205 protein

Gcfc2- ENSMUST00000204691.1 453 No - Retained intron - - TSL:NA 208 protein

Page 6 of 8 https://www.alphaknockout.com

56.25 kb Forward strand 81.92Mb 81.93Mb 81.94Mb 81.95Mb 81.96Mb (Comprehensive set... Gcfc2-201 >protein coding

Gcfc2-206 >nonsense mediated decay

Gcfc2-204 >retained intron Gcfc2-207 >protein coding

Gcfc2-208 >retained intron Gcfc2-202 >retained intronGcfc2-203 >retained intron

Gcfc2-205 >retained intron

Contigs < AC129024.4 Genes < Mrpl19-201protein coding (Comprehensive set...

< Mrpl19-202retained intron

< Mrpl19-203lncRNA

Regulatory Build

81.92Mb 81.93Mb 81.94Mb 81.95Mb 81.96Mb Reverse strand 56.25 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000043195

36.25 kb Forward strand

Gcfc2-201 >protein coding

ENSMUSP00000035... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Pfam GC-rich sequence DNA-binding factor-like domain PANTHER GC-rich sequence DNA-binding factor

PTHR12214:SF3

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend inframe deletion missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 769

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8