https://www.alphaknockout.com

Mouse Cog4 Knockout Project (CRISPR/Cas9)

Objective: To create a Cog4 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cog4 (NCBI Reference Sequence: NM_133973 ; Ensembl: ENSMUSG00000031753 ) is located on Mouse 8. 19 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 19 (Transcript: ENSMUST00000034203). Exon 2~6 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 6.79% of the coding region. Exon 2~6 covers 28.58% of the coding region. The size of effective KO region: ~8175 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 19

Legends Exon of mouse Cog4 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 494 bp section downstream of Exon 6 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(29.1% 582) | C(18.5% 370) | T(32.2% 644) | G(20.2% 404)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(494bp) | A(29.55% 146) | C(19.64% 97) | T(31.98% 158) | G(18.83% 93)

Note: The 494 bp section downstream of Exon 6 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr8 + 110847869 110849868 2000 browser details YourSeq 76 213 341 2000 82.7% chr11 - 106966604 106966734 131 browser details YourSeq 69 250 350 2000 84.7% chr13 - 64965985 64966083 99 browser details YourSeq 68 217 338 2000 87.1% chr10 - 109421438 109421600 163 browser details YourSeq 68 250 341 2000 87.0% chr6 + 35160732 35160823 92 browser details YourSeq 64 250 341 2000 84.8% chr17 - 30467170 30467261 92 browser details YourSeq 64 250 341 2000 84.8% chr3 + 112628602 112628693 92 browser details YourSeq 64 250 341 2000 84.8% chr16 + 66931840 66931931 92 browser details YourSeq 64 250 343 2000 84.1% chr13 + 91440978 91441071 94 browser details YourSeq 63 196 341 2000 88.7% chr10 + 62927149 62927483 335 browser details YourSeq 62 209 316 2000 89.9% chr5 - 29451523 29451631 109 browser details YourSeq 61 212 355 2000 81.4% chr2 - 17963872 17964005 134 browser details YourSeq 59 258 341 2000 90.6% chr13 - 61660435 61660522 88 browser details YourSeq 55 268 341 2000 88.8% chr13 - 27668700 27668774 75 browser details YourSeq 55 216 350 2000 81.7% chr4 + 41340748 41340867 120 browser details YourSeq 55 260 341 2000 89.9% chr15 + 79474730 79474814 85 browser details YourSeq 54 268 343 2000 85.6% chr1 - 71757875 71757950 76 browser details YourSeq 53 263 341 2000 88.5% chr1 - 160866130 160866208 79 browser details YourSeq 48 250 316 2000 86.6% chr11 + 69074720 69074790 71 browser details YourSeq 48 264 341 2000 86.4% chr10 + 32345410 32345487 78

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 494 1 494 494 100.0% chr8 + 110858044 110858537 494 browser details YourSeq 28 196 229 494 83.4% chr13 + 45967032 45967062 31 browser details YourSeq 23 465 494 494 85.2% chr15 - 53416492 53416520 29 browser details YourSeq 22 176 198 494 100.0% chr10 - 111757556 111757583 28 browser details YourSeq 22 237 260 494 87.0% chr11 + 9578525 9578547 23 browser details YourSeq 22 471 493 494 100.0% chr10 + 21874140 21874164 25 browser details YourSeq 21 442 462 494 100.0% chr2 + 54639103 54639123 21 browser details YourSeq 20 443 462 494 100.0% chr13 - 100708290 100708309 20 browser details YourSeq 20 444 463 494 100.0% chr1 + 106366527 106366546 20

Note: The 494 bp section downstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Cog4 component of oligomeric golgi complex 4 [ Mus musculus (house mouse) ] Gene ID: 102339, updated on 12-Aug-2019

Gene summary

Official Symbol Cog4 provided by MGI Official Full Name component of oligomeric golgi complex 4 provided by MGI Primary source MGI:MGI:2142808 See related Ensembl:ENSMUSG00000031753 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AW554810; D8Ertd515e Expression Ubiquitous expression in subcutaneous fat pad adult (RPKM 20.1), testis adult (RPKM 19.3) and 28 other tissues See more Orthologs human all

Genomic context

Location: 8 E1; 8 57.76 cM See Cog4 in Genome Data Viewer Exon count: 20

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (110846600..110882234)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (113370924..113406134)

Chromosome 8 - NC_000074.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 14 transcripts

Gene: Cog4 ENSMUSG00000031753

Description component of oligomeric golgi complex 4 [Source:MGI Symbol;Acc:MGI:2142808] Gene Synonyms D8Ertd515e Location Chromosome 8: 110,846,600-110,882,227 forward strand. GRCm38:CM001001.2 About this gene This gene has 14 transcripts (splice variants), 201 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cog4- ENSMUST00000034203.16 2737 785aa ENSMUSP00000034203.9 Protein coding CCDS40477 Q8R1U1 TSL:1 201 GENCODE basic APPRIS P3

Cog4- ENSMUST00000165867.7 2442 712aa ENSMUSP00000128518.1 Protein coding CCDS80931 Q7TSQ9 TSL:1 202 GENCODE basic APPRIS ALT2

Cog4- ENSMUST00000174398.7 2292 763aa ENSMUSP00000133297.1 Protein coding - G3UWH9 CDS 5' incomplete 210 TSL:5

Cog4- ENSMUST00000174679.1 542 180aa ENSMUSP00000133458.1 Protein coding - G3UWX2 CDS 5' and 3' 211 incomplete TSL:3

Cog4- ENSMUST00000172668.7 368 117aa ENSMUSP00000134252.1 Protein coding - G3UYW8 CDS 3' incomplete 205 TSL:3

Cog4- ENSMUST00000172542.1 1280 157aa ENSMUSP00000133283.1 Nonsense mediated - G3XA67 TSL:5 204 decay

Cog4- ENSMUST00000172897.1 609 59aa ENSMUSP00000133583.1 Nonsense mediated - G3UX78 TSL:5 206 decay

Cog4- ENSMUST00000174165.7 590 76aa ENSMUSP00000134306.1 Nonsense mediated - G3UZ15 TSL:5 209 decay

Cog4- ENSMUST00000174723.7 531 96aa ENSMUSP00000133471.1 Nonsense mediated - G3UWY4 TSL:5 213 decay

Cog4- ENSMUST00000173501.1 881 No - Retained intron - - TSL:2 207 protein

Cog4- ENSMUST00000174702.1 577 No - Retained intron - - TSL:2 212 protein

Cog4- ENSMUST00000172497.1 538 No - Retained intron - - TSL:2 203 protein

Cog4- ENSMUST00000173794.1 379 No - Retained intron - - TSL:2 208 protein

Cog4- ENSMUST00000174800.1 480 No - lncRNA - - TSL:5 214 protein

Page 7 of 9 https://www.alphaknockout.com

55.63 kb Forward strand 110.84Mb 110.85Mb 110.86Mb 110.87Mb 110.88Mb 110.89Mb Cog4-202 >protein coding (Comprehensive set...

Cog4-209 >nonsense mediated decay Cog4-207 >retained intron Cog4-203 >retained intron

Cog4-201 >protein coding

Cog4-205 >protein coding Cog4-204 >nonsense mediated decay

Cog4-212 >retained intron Cog4-208 >retained intron

Cog4-206 >nonsense mediated decay

Cog4-214 >lncRNA

Cog4-213 >nonsense mediated decay

Cog4-210 >protein coding

Cog4-211 >protein coding

Contigs < AC132945.3 Genes < Sf3b3-201protein coding < Fcsk-204nonsense mediated decay (Comprehensive set...

< Sf3b3-203retained intron < Fcsk-201protein coding

< Sf3b3-202retained intron < Sf3b3-205retained intron

< Snord111-201snoRNA

< Sf3b3-204retained intron

< Gm22193-201snoRNA

Regulatory Build

110.84Mb 110.85Mb 110.86Mb 110.87Mb 110.88Mb 110.89Mb Reverse strand 55.63 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000034203

35.19 kb Forward strand

Cog4-201 >protein coding

ENSMUSP00000034... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) SMART Conserved oligomeric Golgi complex, subunit 4 Pfam Conserved oligomeric Golgi complex, subunit 4

PANTHER PTHR24016

PTHR24016:SF0 Gene3D 1.10.287.1060 1.20.58.1970

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

stop gained missense variant splice region variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 785

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9