https://www.alphaknockout.com

Mouse Tmem106c Knockout Project (CRISPR/Cas9)

Objective: To create a Tmem106c knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tmem106c (NCBI Reference Sequence: NM_201359 ; Ensembl: ENSMUSG00000052369 ) is located on Mouse 15. 8 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 8 (Transcript: ENSMUST00000064200). Exon 2~8 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 0.13% of the coding region. Exon 2~8 covers 100.0% of the coding region. The size of effective KO region: ~4867 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8

Legends Exon of mouse Tmem106c Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.75% 475) | C(25.3% 506) | T(25.05% 501) | G(25.9% 518)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.0% 480) | C(25.35% 507) | T(26.35% 527) | G(24.3% 486)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr15 + 97962851 97964850 2000 browser details YourSeq 132 95 689 2000 79.0% chr9 - 53542846 53543090 245 browser details YourSeq 132 536 713 2000 89.3% chr2 + 145955648 145955831 184 browser details YourSeq 128 545 716 2000 88.8% chr11 - 74732800 74732970 171 browser details YourSeq 127 545 716 2000 90.0% chr10 - 61403512 61403890 379 browser details YourSeq 126 544 713 2000 90.9% chr14 + 102397628 102397798 171 browser details YourSeq 126 545 716 2000 88.9% chr1 + 36163767 36163946 180 browser details YourSeq 125 536 713 2000 83.6% chrX + 25501643 25501809 167 browser details YourSeq 125 549 716 2000 87.5% chr2 + 71390408 71390737 330 browser details YourSeq 124 548 713 2000 87.5% chrX - 144270550 144270711 162 browser details YourSeq 123 545 707 2000 90.2% chr4 + 139849381 139849544 164 browser details YourSeq 122 400 689 2000 88.7% chr8 - 32769572 32770137 566 browser details YourSeq 121 545 687 2000 92.4% chr8 - 88244061 88244203 143 browser details YourSeq 121 545 689 2000 91.8% chr12 + 84509301 84509445 145 browser details YourSeq 120 549 716 2000 90.6% chr1 - 54965170 54965342 173 browser details YourSeq 119 543 689 2000 90.5% chr1 - 9841486 9841632 147 browser details YourSeq 119 545 689 2000 91.1% chr4 + 143030059 143030203 145 browser details YourSeq 118 542 689 2000 89.9% chr9 - 106597930 106598077 148 browser details YourSeq 117 545 713 2000 89.4% chrX - 101181833 101182008 176 browser details YourSeq 117 545 689 2000 90.4% chr1 - 80299696 80299840 145

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr15 + 97969718 97971717 2000 browser details YourSeq 139 1845 2000 2000 94.9% chrX + 6858350 6858507 158 browser details YourSeq 137 1849 2000 2000 95.4% chr19 + 21537646 21537799 154 browser details YourSeq 136 1849 2000 2000 96.0% chr17 - 82389282 82389434 153 browser details YourSeq 136 1848 2000 2000 95.4% chr2 + 33888926 33889078 153 browser details YourSeq 135 1832 2000 2000 87.7% chr8 - 94724245 94724406 162 browser details YourSeq 135 1849 2000 2000 94.8% chr5 - 63543892 63544045 154 browser details YourSeq 135 1847 2000 2000 93.4% chr17 - 53464865 53465017 153 browser details YourSeq 135 1849 2000 2000 94.8% chr6 + 35185430 35185582 153 browser details YourSeq 135 1842 2000 2000 93.1% chr5 + 117340337 117340502 166 browser details YourSeq 135 1849 2000 2000 94.8% chr4 + 35213982 35214135 154 browser details YourSeq 135 1848 1999 2000 94.8% chr13 + 92262611 92262764 154 browser details YourSeq 134 1849 2000 2000 95.4% chr7 - 96164071 96164223 153 browser details YourSeq 134 1849 2000 2000 94.1% chr2 - 3497712 3497863 152 browser details YourSeq 134 1849 2000 2000 94.1% chr15 - 35322514 35322665 152 browser details YourSeq 134 1848 2000 2000 94.2% chr14 - 45265181 45265335 155 browser details YourSeq 134 1848 2000 2000 93.4% chr14 + 106164371 106164522 152 browser details YourSeq 134 1845 2000 2000 91.7% chr10 + 59299196 59299350 155 browser details YourSeq 133 1848 2000 2000 94.7% chr8 - 120675421 120675573 153 browser details YourSeq 133 1847 2000 2000 93.6% chr3 - 53775410 53775565 156

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Tmem106c transmembrane protein 106C [ Mus musculus () ] Gene ID: 380967, updated on 24-Oct-2019

Gene summary

Official Symbol Tmem106c provided by MGI Official Full Name transmembrane protein 106C provided by MGI Primary source MGI:MGI:1196384 See related Ensembl:ENSMUSG00000052369 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI046681; BC046621; D15Ertd405e Expression Ubiquitous expression in genital fat pad adult (RPKM 24.2), bladder adult (RPKM 18.3) and 28 other tissues See more Orthologs all

Genomic context

Location: 15 F1; 15 53.96 cM See Tmem106c in Genome Data Viewer Exon count: 8

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 15 NC_000081.6 (97963177..97970286)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 15 NC_000081.5 (97794710..97800706)

Chromosome 15 - NC_000081.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 9 transcripts

Gene: Tmem106c ENSMUSG00000052369

Description transmembrane protein 106C [Source:MGI Symbol;Acc:MGI:1196384] Gene Synonyms D15Ertd405e Location Chromosome 15: 97,964,200-97,970,275 forward strand. GRCm38:CM001008.2 About this gene This gene has 9 transcripts (splice variants), 166 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tmem106c- ENSMUST00000064200.8 1623 260aa ENSMUSP00000069764.7 Protein coding CCDS27785 Q80VP8 TSL:1 201 GENCODE basic APPRIS P1

Tmem106c- ENSMUST00000229428.1 1455 260aa ENSMUSP00000154819.1 Protein coding CCDS27785 Q80VP8 GENCODE basic 202 APPRIS P1

Tmem106c- ENSMUST00000231144.1 734 166aa ENSMUSP00000155384.1 Protein coding - A0A2R8VHS8 CDS 3' 209 incomplete

Tmem106c- ENSMUST00000229433.1 631 165aa ENSMUSP00000154837.1 Protein coding - A0A2R8VJM7 CDS 3' 203 incomplete

Tmem106c- ENSMUST00000230072.1 401 116aa ENSMUSP00000155091.1 Protein coding - A0A2R8W6L2 CDS 3' 205 incomplete

Tmem106c- ENSMUST00000230005.1 927 No - Retained - - - 204 protein intron

Tmem106c- ENSMUST00000230144.1 727 No - Retained - - - 206 protein intron

Tmem106c- ENSMUST00000231079.1 571 No - Retained - - - 208 protein intron

Tmem106c- ENSMUST00000230361.1 585 No - lncRNA - - - 207 protein

Page 7 of 9 https://www.alphaknockout.com

26.08 kb Forward strand 97.96Mb 97.97Mb 97.98Mb (Comprehensive set... Tmem106c-203 >protein coding

Tmem106c-202 >protein coding

Tmem106c-201 >protein coding

Tmem106c-205 >protein coding

Tmem106c-209 >protein coding

Tmem106c-206 >retained intron

Tmem106c-207 >lncRNA

Tmem106c-204 >retained intron

Tmem106c-208 >retained intron

Contigs AC158787.14 > AC134554.6 > Genes < Col2a1-201protein coding (Comprehensive set...

< Col2a1-202protein coding

< Col2a1-205lncRNA

Regulatory Build

97.96Mb 97.97Mb 97.98Mb Reverse strand 26.08 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000064200

6.05 kb Forward strand

Tmem106c-201 >protein coding

ENSMUSP00000069... Transmembrane heli... Low complexity (Seg) Pfam Protein of unknown function DUF1356, TMEM106 PANTHER PTHR28556:SF5

Protein of unknown function DUF1356, TMEM106

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 260

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9