https://www.alphaknockout.com

Mouse Stard10 Knockout Project (CRISPR/Cas9)

Objective: To create a Stard10 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Stard10 (NCBI Reference Sequence: NM_019990 ; Ensembl: ENSMUSG00000030688 ) is located on Mouse 7. 7 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 7 (Transcript: ENSMUST00000032927). Exon 3~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit altered bile acid homeostasis.

Exon 3 starts from about 23.83% of the coding region. Exon 3~5 covers 42.38% of the coding region. The size of effective KO region: ~1537 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 7

Legends Exon of mouse Stard10 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1600 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.95% 519) | C(24.05% 481) | T(25.3% 506) | G(24.7% 494)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1600bp) | A(20.13% 322) | C(28.13% 450) | T(25.44% 407) | G(26.31% 421)

Note: The 1600 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr7 + 101340499 101342498 2000 browser details YourSeq 133 480 789 2000 94.7% chr2 - 26124516 26125166 651 browser details YourSeq 130 373 588 2000 92.9% chr11 + 80291471 80291802 332 browser details YourSeq 124 477 637 2000 94.4% chr19 + 44412458 44412857 400 browser details YourSeq 121 486 655 2000 92.4% chr2 - 165941470 165941920 451 browser details YourSeq 109 478 593 2000 98.3% chr10 - 75496608 75496762 155 browser details YourSeq 104 479 593 2000 97.4% chr2 + 76559473 76559626 154 browser details YourSeq 103 480 591 2000 97.4% chr7 - 34331941 34332089 149 browser details YourSeq 103 478 591 2000 97.3% chr3 - 85964986 85965138 153 browser details YourSeq 102 478 591 2000 95.7% chr7 - 116073486 116073636 151 browser details YourSeq 102 478 591 2000 98.2% chr4 - 34388456 34388606 151 browser details YourSeq 102 480 593 2000 96.4% chr10 + 43925751 43925901 151 browser details YourSeq 100 520 655 2000 94.7% chr4 + 86723651 86724149 499 browser details YourSeq 98 478 591 2000 97.2% chr11 - 103870636 103870793 158 browser details YourSeq 98 478 593 2000 95.4% chr18 + 17611547 17611697 151 browser details YourSeq 97 478 592 2000 95.4% chr18 + 80389285 80389436 152 browser details YourSeq 95 478 585 2000 94.4% chr10 - 80010602 80010746 145 browser details YourSeq 95 478 593 2000 93.6% chr1 - 32562249 32562401 153 browser details YourSeq 95 480 589 2000 94.6% chr2 + 76232433 76232581 149 browser details YourSeq 93 478 591 2000 94.4% chr19 - 42137976 42138128 153

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1600 1 1600 1600 100.0% chr7 + 101344036 101345635 1600 browser details YourSeq 35 555 599 1600 97.4% chr2 - 155917869 155917924 56 browser details YourSeq 33 563 616 1600 97.2% chr7 - 105561674 105561728 55 browser details YourSeq 29 530 562 1600 83.4% chr8 + 89573315 89573344 30 browser details YourSeq 27 525 552 1600 100.0% chr7 - 76277295 76277328 34 browser details YourSeq 27 528 554 1600 100.0% chr10 + 60044267 60044293 27 browser details YourSeq 25 528 552 1600 100.0% chrX - 105210145 105210169 25 browser details YourSeq 25 528 552 1600 100.0% chr3 + 23370312 23370336 25 browser details YourSeq 25 562 587 1600 100.0% chr2 + 168094421 168094449 29 browser details YourSeq 24 531 554 1600 100.0% chr13 - 101630526 101630549 24 browser details YourSeq 24 531 554 1600 100.0% chr14 + 122698789 122698812 24 browser details YourSeq 23 533 555 1600 100.0% chr5 - 117796020 117796042 23 browser details YourSeq 23 533 555 1600 100.0% chr2 - 144743199 144743221 23 browser details YourSeq 23 561 583 1600 100.0% chr12 - 28603125 28603147 23 browser details YourSeq 23 532 554 1600 100.0% chr1 - 55860519 55860541 23 browser details YourSeq 23 528 553 1600 96.0% chrY + 10804716 10804749 34 browser details YourSeq 23 528 551 1600 100.0% chr5 + 92311741 92311766 26 browser details YourSeq 23 559 581 1600 100.0% chr15 + 37727729 37727751 23 browser details YourSeq 22 531 552 1600 100.0% chr4 - 72012536 72012557 22 browser details YourSeq 22 296 317 1600 100.0% chr12 - 20367380 20367401 22

Note: The 1600 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Stard10 START domain containing 10 [ Mus musculus (house mouse) ] Gene ID: 56018, updated on 10-Oct-2019

Gene summary

Official Symbol Stard10 provided by MGI Official Full Name START domain containing 10 provided by MGI Primary source MGI:MGI:1860093 See related Ensembl:ENSMUSG00000030688 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as PCTP2; Pctpl; CGI-52; PC-TP2; TISP-81; AV048538; NY-C0-28; Sdccag28; SdccagG28 Expression Biased expression in liver adult (RPKM 468.9), colon adult (RPKM 328.7) and 14 other tissues See more Orthologs human all

Genomic context

Location: 7; 7 E2 See Stard10 in Genome Data Viewer Exon count: 9

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 7 NC_000073.6 (101317086..101346626)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 7 NC_000073.5 (108469833..108494826)

Chromosome 7 - NC_000073.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Stard10 ENSMUSG00000030688

Description START domain containing 10 [Source:MGI Symbol;Acc:MGI:1860093] Gene Synonyms CGI-52, NY-C0-28, PC-TP2, PCTP2, Pctpl, Sdccag28, TISP-81 Location Chromosome 7: 101,317,086-101,346,626 forward strand. GRCm38:CM001000.2 About this gene This gene has 11 transcripts (splice variants), 191 orthologues, 2 paralogues, is a member of 1 Ensembl protein family and is associated with 21 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Stard10- ENSMUST00000032927.13 1400 291aa ENSMUSP00000032927.7 Protein coding CCDS21509 Q0VG22 TSL:1 201 Q9JMD3 GENCODE basic APPRIS P1

Stard10- ENSMUST00000164479.8 1375 291aa ENSMUSP00000133002.2 Protein coding CCDS21509 Q0VG22 TSL:1 203 Q9JMD3 GENCODE basic APPRIS P1

Stard10- ENSMUST00000210192.1 1258 291aa ENSMUSP00000148114.1 Protein coding CCDS21509 Q0VG22 TSL:5 211 Q9JMD3 GENCODE basic APPRIS P1

Stard10- ENSMUST00000163799.8 1723 364aa ENSMUSP00000129408.2 Protein coding - E9PVP0 TSL:1 202 GENCODE basic

Stard10- ENSMUST00000167888.8 1049 265aa ENSMUSP00000127962.2 Protein coding - G3UW37 CDS 5' 204 incomplete TSL:1

Stard10- ENSMUST00000174291.7 903 283aa ENSMUSP00000133985.1 Protein coding - G3UY87 CDS 5' 210 incomplete TSL:5

Stard10- ENSMUST00000172630.7 789 150aa ENSMUSP00000134138.1 Protein coding - G3UYM0 CDS 3' 205 incomplete TSL:2

Stard10- ENSMUST00000173270.7 670 160aa ENSMUSP00000133955.1 Protein coding - G3UY59 TSL:5 207 GENCODE basic

Stard10- ENSMUST00000172662.1 430 87aa ENSMUSP00000134156.1 Protein coding - G3UYN6 CDS 5' 206 incomplete TSL:1

Stard10- ENSMUST00000174140.1 649 49aa ENSMUSP00000134430.1 Nonsense mediated - G3UZB9 CDS 5' 209 decay incomplete TSL:5

Stard10- ENSMUST00000174083.1 731 No - Retained intron - - TSL:2 208 protein

Page 7 of 9 https://www.alphaknockout.com

49.54 kb Forward strand 101.31Mb 101.32Mb 101.33Mb 101.34Mb 101.35Mb (Comprehensive set... Stard10-202 >protein coding Arap1-204 >protein coding

Stard10-203 >protein coding

Stard10-201 >protein coding

Stard10-211 >protein coding

Stard10-205 >protein coding

Stard10-210 >protein coding

Stard10-204 >protein coding

Stard10-206 >protein coding

Stard10-207 >protein coding

Stard10-208 >retained intron

Gm45344-201 >processed pseudogene

Stard10-209 >nonsense mediated decay

Contigs < AC107638.18 < AC129079.5 Genes < Gm20476-201lncRNA < Gm45620-201lncRNA (Comprehensive set...

< Gm45548-201TEC

Regulatory Build

101.31Mb 101.32Mb 101.33Mb 101.34Mb 101.35Mb Reverse strand 49.54 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000032927

25.20 kb Forward strand

Stard10-201 >protein coding

protein_pic

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9