https://www.alphaknockout.com

Mouse Stard8 Knockout Project (CRISPR/Cas9)

Objective: To create a Stard8 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Stard8 (NCBI Reference Sequence: NM_199018 ; Ensembl: ENSMUSG00000031216 ) is located on Mouse X. 14 exons are identified, with the ATG start codon in exon 4 and the TGA stop codon in exon 14 (Transcript: ENSMUST00000036606). Exon 4~14 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 4 starts from about 0.03% of the coding region. Exon 4~14 covers 100.0% of the coding region. The size of effective KO region: ~8797 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 6 8 9 10 11 1213 14

Legends Exon of mouse Stard8 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.85% 457) | C(26.95% 539) | T(30.0% 600) | G(20.2% 404)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.9% 558) | C(22.55% 451) | T(25.5% 510) | G(24.05% 481)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chrX + 99062402 99064401 2000 browser details YourSeq 114 616 903 2000 85.2% chr19 + 35926390 35926903 514 browser details YourSeq 113 605 755 2000 88.1% chrX - 36006273 36006425 153 browser details YourSeq 111 605 755 2000 90.6% chr11 - 19672503 19672656 154 browser details YourSeq 111 595 753 2000 89.6% chr4 + 128944196 128944371 176 browser details YourSeq 111 602 755 2000 91.3% chr12 + 84792516 85329486 536971 browser details YourSeq 111 592 764 2000 81.7% chr10 + 91541403 91541568 166 browser details YourSeq 110 605 755 2000 88.3% chrX - 74354019 74354170 152 browser details YourSeq 110 605 753 2000 91.2% chr19 + 32328419 32328570 152 browser details YourSeq 109 605 754 2000 86.7% chr11 - 12226948 12227098 151 browser details YourSeq 108 605 765 2000 89.8% chr14 + 62729350 62729511 162 browser details YourSeq 107 612 755 2000 91.6% chr3 - 122193917 122194062 146 browser details YourSeq 106 605 755 2000 90.3% chr2 - 58751241 58751394 154 browser details YourSeq 106 612 754 2000 90.9% chr13 - 41917598 41917744 147 browser details YourSeq 105 489 751 2000 87.3% chr7 - 117177479 117178079 601 browser details YourSeq 105 605 755 2000 88.5% chr1 - 138463572 138463724 153 browser details YourSeq 104 605 757 2000 89.4% chr9 - 54305487 54305641 155 browser details YourSeq 104 605 755 2000 87.9% chr16 - 35371825 35371978 154 browser details YourSeq 103 605 754 2000 88.1% chr5 - 38701513 38701663 151 browser details YourSeq 103 605 763 2000 86.7% chr2 + 158188574 158188735 162

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chrX + 99073199 99075198 2000 browser details YourSeq 139 836 1031 2000 85.9% chr5 - 149892706 149892888 183 browser details YourSeq 139 836 1018 2000 86.6% chr17 + 70493024 70493199 176 browser details YourSeq 139 836 1022 2000 86.0% chr11 + 75130044 75130226 183 browser details YourSeq 138 836 1062 2000 92.1% chr9 - 120182532 120182898 367 browser details YourSeq 138 836 1008 2000 92.2% chr18 + 35232367 35232541 175 browser details YourSeq 137 836 1037 2000 85.4% chr5 + 142830447 142830636 190 browser details YourSeq 136 837 1007 2000 93.7% chr4 + 144004244 144390856 386613 browser details YourSeq 135 836 1030 2000 86.1% chr4 - 28646755 28646949 195 browser details YourSeq 135 853 1066 2000 83.4% chr6 + 31040001 31040175 175 browser details YourSeq 135 837 1031 2000 90.0% chr1 + 118350341 118350885 545 browser details YourSeq 134 836 1022 2000 90.0% chr8 + 124565709 124565901 193 browser details YourSeq 134 836 1031 2000 85.3% chr4 + 133388609 133388796 188 browser details YourSeq 134 837 1032 2000 83.5% chr10 + 58586540 58586720 181 browser details YourSeq 133 836 1031 2000 81.8% chr9 - 21534429 21534610 182 browser details YourSeq 133 836 1031 2000 83.4% chr11 + 59836129 59836311 183 browser details YourSeq 132 836 1030 2000 88.4% chr19 - 3391331 3391834 504 browser details YourSeq 132 836 1031 2000 82.5% chr17 - 65631883 65632058 176 browser details YourSeq 132 836 1030 2000 82.8% chr19 + 10884419 10884601 183 browser details YourSeq 131 836 1031 2000 84.3% chr1 + 16371377 16371557 181

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Stard8 START domain containing 8 [ Mus musculus (house mouse) ] Gene ID: 236920, updated on 14-Aug-2019

Gene summary

Official Symbol Stard8 provided by MGI Official Full Name START domain containing 8 provided by MGI Primary source MGI:MGI:2448556 See related Ensembl:ENSMUSG00000031216 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Dlc3; mKIAA0189 Expression Broad expression in lung adult (RPKM 27.0), kidney adult (RPKM 19.8) and 20 other tissuesS ee more Orthologs human all

Genomic context

Location: X; X C3 See Stard8 in Genome Data Viewer Exon count: 19

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) X NC_000086.7 (99002933..99074728)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) X NC_000086.6 (96237920..96270067)

Chromosome X - NC_000086.7

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Stard8 ENSMUSG00000031216

Description START domain containing 8 [Source:MGI Symbol;Acc:MGI:2448556] Location Chromosome X: 99,003,248-99,074,728 forward strand. GRCm38:CM001013.2 About this gene This gene has 4 transcripts (splice variants), 143 orthologues, 3 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Stard8-201 ENSMUST00000036606.13 4963 1019aa ENSMUSP00000044491.7 Protein coding CCDS30297 Q8K031 TSL:1 GENCODE basic APPRIS P1

Stard8-204 ENSMUST00000149999.7 631 57aa ENSMUSP00000114897.1 Protein coding - B1AZJ1 CDS 3' incomplete TSL:3

Stard8-202 ENSMUST00000127361.1 759 No protein - lncRNA - - TSL:3

Stard8-203 ENSMUST00000145820.1 468 No protein - lncRNA - - TSL:2

91.48 kb Forward strand

99.00Mb 99.02Mb 99.04Mb 99.06Mb 99.08Mb (Comprehensive set... Stard8-204 >protein coding Stard8-202 >lncRNA

Stard8-201 >protein coding

Stard8-203 >lncRNA

Contigs AL672103.10 > AL954636.9 >

Genes < Gm5760-201processed pseudogene (Comprehensive set...

Regulatory Build

99.00Mb 99.02Mb 99.04Mb 99.06Mb 99.08Mb Reverse strand 91.48 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

pseudogene RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000036606

32.15 kb Forward strand

Stard8-201 >protein coding

ENSMUSP00000044... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Rho GTPase activation protein SSF55961

SMART START domain

Rho GTPase-activating protein domain Pfam START domain

Rho GTPase-activating protein domain PROSITE profiles START domain

Rho GTPase-activating protein domain PANTHER PTHR12659

PTHR12659:SF3 Gene3D Rho GTPase activation protein START-like domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend inframe insertion inframe deletion missense variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 900 1019

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8