https://www.alphaknockout.com

Mouse Arhgap31 Knockout Project (CRISPR/Cas9)

Objective: To create a Arhgap31 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Arhgap31 (NCBI Reference Sequence: NM_020260 ; Ensembl: ENSMUSG00000022799 ) is located on Mouse 16. 12 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 12 (Transcript: ENSMUST00000023487). Exon 2~3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 2.36% of the coding region. Exon 2~3 covers 5.8% of the coding region. The size of effective KO region: ~3453 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 12

Legends Exon of mouse Arhgap31 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.2% 564) | C(19.7% 394) | T(30.6% 612) | G(21.5% 430)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(32.5% 650) | C(18.5% 370) | T(23.85% 477) | G(25.15% 503)

Note: The 2000 bp section downstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr16 - 38640350 38642349 2000 browser details YourSeq 34 210 392 2000 59.0% chr8 + 54558807 54558876 70 browser details YourSeq 34 1532 1597 2000 92.5% chr2 + 70226619 70226702 84 browser details YourSeq 34 1562 1598 2000 97.3% chr12 + 57765361 57765399 39 browser details YourSeq 32 1562 1625 2000 75.0% chr3 + 85094536 85094599 64 browser details YourSeq 32 1448 1551 2000 94.5% chr15 + 47799875 47799979 105 browser details YourSeq 31 1563 1596 2000 97.1% chr2 + 167762801 167762836 36 browser details YourSeq 30 1578 1623 2000 82.7% chr18 - 62532411 62532456 46 browser details YourSeq 30 1524 1560 2000 91.7% chr15 - 50999694 50999735 42 browser details YourSeq 25 1783 1811 2000 96.3% chr15 - 94642594 94642624 31 browser details YourSeq 25 1571 1599 2000 93.2% chr10 - 53487458 53487486 29 browser details YourSeq 25 1562 1587 2000 100.0% chr5 + 139661765 139661792 28 browser details YourSeq 24 1572 1599 2000 92.9% chr5 - 97975019 97975046 28 browser details YourSeq 24 1018 1044 2000 96.3% chr13 - 58939441 58939468 28 browser details YourSeq 23 1571 1597 2000 92.6% chr5 - 124722985 124723011 27 browser details YourSeq 23 1571 1597 2000 92.6% chr3 - 95037316 95037342 27 browser details YourSeq 23 1572 1598 2000 92.6% chr5 + 139114906 139114932 27

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr16 - 38634897 38636896 2000 browser details YourSeq 253 1073 1585 2000 92.6% chr13 + 74561373 74562115 743 browser details YourSeq 244 1069 1672 2000 91.5% chr1 - 171493687 172044940 551254 browser details YourSeq 242 1303 1585 2000 95.9% chr5 + 139485080 139485443 364 browser details YourSeq 230 1069 1466 2000 93.0% chr5 + 35481868 35542425 60558 browser details YourSeq 227 1303 1716 2000 93.3% chr11 - 72596661 72642388 45728 browser details YourSeq 227 1303 1706 2000 88.9% chr10 + 81099376 81099719 344 browser details YourSeq 226 1303 1699 2000 88.5% chr8 - 106208090 106208417 328 browser details YourSeq 212 1175 1580 2000 90.9% chr2 - 91244239 91244907 669 browser details YourSeq 211 1303 1546 2000 95.3% chr1 - 59908947 60360442 451496 browser details YourSeq 209 1315 1697 2000 92.0% chr19 + 4428237 4428721 485 browser details YourSeq 206 1077 1470 2000 92.0% chrX - 38567661 38568507 847 browser details YourSeq 204 1315 1581 2000 92.3% chrX - 52794888 52795194 307 browser details YourSeq 203 1070 1697 2000 86.9% chr7 + 80691950 80692331 382 browser details YourSeq 203 1315 1705 2000 91.8% chr10 + 41321206 41754430 433225 browser details YourSeq 202 1315 1712 2000 89.1% chr10 + 4352319 4352655 337 browser details YourSeq 199 1070 1470 2000 92.8% chr9 - 59792023 59792522 500 browser details YourSeq 197 1069 1436 2000 95.0% chr7 + 44653197 44653761 565 browser details YourSeq 187 1303 1569 2000 92.4% chr4 + 100800285 100800568 284 browser details YourSeq 184 1263 1680 2000 91.2% chr5 - 122176776 122177387 612

Note: The 2000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Arhgap31 Rho GTPase activating protein 31 [ Mus musculus (house mouse) ] Gene ID: 12549, updated on 27-Aug-2019

Gene summary

Official Symbol Arhgap31 provided by MGI Official Full Name Rho GTPase activating protein 31 provided by MGI Primary source MGI:MGI:1333857 See related Ensembl:ENSMUSG00000022799 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Cdgap; AU041750; 5830477L08Rik; D330026I07Rik Expression Broad expression in lung adult (RPKM 18.2), ovary adult (RPKM 10.6) and 20 other tissues See more Orthologs human all

Genomic context

Location: 16; 16 B4 See Arhgap31 in Genome Data Viewer Exon count: 12

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 16 NC_000082.6 (38598343..38713035, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 16 NC_000082.5 (38598456..38713148, complement)

Chromosome 16 - NC_000082.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Arhgap31 ENSMUSG00000022799

Description Rho GTPase activating protein 31 [Source:MGI Symbol;Acc:MGI:1333857] Gene Synonyms CdGAP Location Chromosome 16: 38,598,340-38,713,274 reverse strand. GRCm38:CM001009.2 About this gene This gene has 3 transcripts (splice variants), 206 orthologues, 3 paralogues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Arhgap31- ENSMUST00000023487.4 7964 1425aa ENSMUSP00000023487.4 Protein coding CCDS28172 A6X8Z5 TSL:1 201 B2RSI0 GENCODE basic APPRIS P1

Arhgap31- ENSMUST00000132697.1 2956 No - Retained - - TSL:1 203 protein intron

Arhgap31- ENSMUST00000124866.1 1557 No - Retained - - TSL:1 202 protein intron

Page 7 of 9 https://www.alphaknockout.com

134.94 kb Forward strand 38.60Mb 38.65Mb 38.70Mb Tmem39a-204 >protein coding (Comprehensive set...

Tmem39a-201 >protein coding

Tmem39a-214 >protein coding

Tmem39a-209 >protein coding

Tmem39a-211 >protein coding

Tmem39a-207 >protein coding

Tmem39a-212 >lncRNA

Tmem39a-206 >retained intron

Contigs < AC154425.2 CT009576.9 > Genes (Comprehensive set... < Arhgap31-201protein coding

< Arhgap31-202retained intron

< Arhgap31-203retained intron

< Gm15530-201processed pseudogene

Regulatory Build

38.60Mb 38.65Mb 38.70Mb Reverse strand 134.94 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000023487

< Arhgap31-201protein coding

Reverse strand 114.94 kb

ENSMUSP00000023... MobiDB lite Low complexity (Seg) Superfamily Rho GTPase activation protein SMART Rho GTPase-activating protein domain Pfam Rho GTPase-activating protein domain PROSITE profiles Rho GTPase-activating protein domain

PANTHER PTHR15729:SF3

PTHR15729 Gene3D Rho GTPase activation protein CDD cd04384

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1200 1425

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9