https://www.alphaknockout.com

Mouse Fam83h Knockout Project (CRISPR/Cas9)

Objective: To create a Fam83h knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Fam83h (NCBI Reference Sequence: NM_001168253 ; Ensembl: ENSMUSG00000046761 ) is located on Mouse 15. 5 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 5 (Transcript: ENSMUST00000170153). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 0.03% of the coding region. Exon 2~5 covers 100.0% of the coding region. The size of effective KO region: ~4686 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5

Legends Exon of mouse Fam83h Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.75% 475) | C(25.7% 514) | T(21.8% 436) | G(28.75% 575)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.95% 459) | C(29.1% 582) | T(25.75% 515) | G(22.2% 444)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr15 - 76006548 76008547 2000 browser details YourSeq 79 891 990 2000 90.0% chr15 - 76006187 76006297 111 browser details YourSeq 40 1013 1146 2000 68.2% chr14 + 11506071 11506141 71 browser details YourSeq 35 1012 1067 2000 84.7% chr3 - 65466626 65466690 65 browser details YourSeq 32 1017 1051 2000 87.9% chr2 - 80934283 80934315 33 browser details YourSeq 32 1014 1054 2000 77.2% chr13 - 104184386 104184420 35 browser details YourSeq 28 1013 1046 2000 91.2% chr8 + 26778439 26778472 34 browser details YourSeq 27 1022 1051 2000 85.8% chr15 - 58056154 58056181 28 browser details YourSeq 27 1022 1051 2000 96.7% chr11 - 62803739 62803772 34 browser details YourSeq 26 1014 1044 2000 85.8% chr11 - 49599863 49599891 29 browser details YourSeq 26 1014 1045 2000 93.4% chr14 + 74920050 74920087 38 browser details YourSeq 24 1021 1046 2000 100.0% chr3 - 148794933 148794962 30 browser details YourSeq 23 1542 1569 2000 80.8% chr14 - 77948999 77949024 26 browser details YourSeq 21 1350 1370 2000 100.0% chrX - 155757106 155757126 21 browser details YourSeq 21 1031 1051 2000 100.0% chr11 - 18854163 18854183 21 browser details YourSeq 21 83 103 2000 100.0% chr4 + 124190716 124190736 21 browser details YourSeq 21 1547 1567 2000 100.0% chr14 + 20563956 20563976 21

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr15 - 75999860 76001859 2000 browser details YourSeq 39 1540 1594 2000 85.5% chr6 - 83157041 83157095 55 browser details YourSeq 38 1524 1584 2000 85.2% chr1 - 10858572 10858636 65 browser details YourSeq 37 1540 1612 2000 74.4% chr12 - 80973993 80974048 56 browser details YourSeq 36 1551 1597 2000 93.1% chr11 - 106993377 106993426 50 browser details YourSeq 36 1525 1575 2000 91.0% chr1 - 153783820 153783878 59 browser details YourSeq 31 1525 1565 2000 91.9% chr11 + 75553660 75553704 45 browser details YourSeq 30 917 948 2000 100.0% chrX - 144688143 144688346 204 browser details YourSeq 30 1542 1584 2000 94.2% chr1 + 87401810 87401855 46 browser details YourSeq 28 1589 1616 2000 100.0% chr6 - 115945899 115945926 28 browser details YourSeq 23 1589 1611 2000 100.0% chr8 + 19181520 19181542 23 browser details YourSeq 22 1540 1561 2000 100.0% chr8 + 53612220 53612241 22 browser details YourSeq 21 214 234 2000 100.0% chr4 - 67322126 67322146 21 browser details YourSeq 21 210 232 2000 95.7% chr1 - 161847524 161847546 23 browser details YourSeq 21 1447 1469 2000 95.7% chr1 + 30920254 30920276 23

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Fam83h family with sequence similarity 83, member H [ Mus musculus (house mouse) ] Gene ID: 105732, updated on 17-Aug-2019

Gene summary

Official Symbol Fam83h provided by MGI Official Full Name family with sequence similarity 83, member H provided by MGI Primary source MGI:MGI:2145900 See related Ensembl:ENSMUSG00000046761 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AA409316 Expression Broad expression in colon adult (RPKM 22.6), small intestine adult (RPKM 16.0) and 16 other tissues See more Orthologs human all

Genomic context

Location: 15; 15 D3 See Fam83h in Genome Data Viewer Exon count: 8

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 15 NC_000081.6 (76001092..76014336, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 15 NC_000081.5 (75831522..75844766, complement)

Chromosome 15 - NC_000081.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Fam83h ENSMUSG00000046761

Description family with sequence similarity 83, member H [Source:MGI Symbol;Acc:MGI:2145900] Location Chromosome 15: 76,001,093-76,014,336 reverse strand. GRCm38:CM001008.2 About this gene This gene has 3 transcripts (splice variants), 242 orthologues, 7 paralogues, is a member of 1 Ensembl protein family and is associated with 11 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Fam83h-202 ENSMUST00000170153.1 4534 1209aa ENSMUSP00000126453.1 Protein coding CCDS27559 Q148V8 TSL:1 GENCODE basic APPRIS P2

Fam83h-201 ENSMUST00000060807.11 4502 1209aa ENSMUSP00000059839.5 Protein coding CCDS27559 Q148V8 TSL:1 GENCODE basic APPRIS P2

Fam83h-203 ENSMUST00000238313.1 4994 1409aa ENSMUSP00000158845.1 Protein coding - - GENCODE basic APPRIS ALT2

33.24 kb Forward strand 76.00Mb 76.01Mb 76.02Mb Mapk15-201 >protein coding Iqank1-202 >retained intron (Comprehensive set...

Mapk15-204 >lncRNA Mapk15-205 >protein coding Iqank1-204 >protein coding

Mapk15-202 >retained intron Iqank1-201 >lncRNA

Mapk15-203 >lncRNA

Contigs < AC116487.14

Genes (Comprehensive set... < Fam83h-201protein coding

< Fam83h-203protein coding

< Fam83h-202protein coding

Regulatory Build

76.00Mb 76.01Mb 76.02Mb Reverse strand 33.24 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000170153

< Fam83h-202protein coding

Reverse strand 8.41 kb

ENSMUSP00000126... MobiDB lite Low complexity (Seg) Superfamily SSF56024

Pfam FAM83, N-terminal

PANTHER PTHR16181

PTHR16181:SF8 Gene3D 3.30.870.10

CDD FAM83H, N-terminal phospholipase D-like domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 200 400 600 800 1000 1209

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8