https://www.alphaknockout.com

Mouse Tmem106a Knockout Project (CRISPR/Cas9)

Objective: To create a Tmem106a knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tmem106a (NCBI Reference Sequence: NM_144830 ; Ensembl: ENSMUSG00000034947 ) is located on Mouse 11. 9 exons are identified, with the ATG start codon in exon 3 and the TGA stop codon in exon 9 (Transcript: ENSMUST00000039581). Exon 3~9 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3~9 covers 100.0% of the coding region. The size of effective KO region: ~8403 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9

Legends Exon of mouse Tmem106a Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.05% 541) | C(23.4% 468) | T(28.35% 567) | G(21.2% 424)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.45% 529) | C(23.05% 461) | T(25.1% 502) | G(25.4% 508)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 + 101581578 101583577 2000 browser details YourSeq 142 1338 1492 2000 96.2% chr19 - 60882143 60882299 157 browser details YourSeq 141 1344 1492 2000 97.4% chr1 - 43931428 43931576 149 browser details YourSeq 140 1341 1498 2000 95.0% chr17 - 27673450 27673780 331 browser details YourSeq 138 1335 1492 2000 96.1% chr8 - 60646801 60646962 162 browser details YourSeq 138 1342 1490 2000 96.7% chr19 - 32173026 32173191 166 browser details YourSeq 137 1340 1492 2000 95.5% chr16 - 37531309 37707330 176022 browser details YourSeq 137 1343 1489 2000 96.6% chr16 + 33186521 33186667 147 browser details YourSeq 136 1341 1492 2000 94.8% chr12 - 37666774 37666925 152 browser details YourSeq 136 1345 1492 2000 96.0% chr17 + 36902606 36902753 148 browser details YourSeq 135 1341 1492 2000 94.8% chr8 + 25344407 25344559 153 browser details YourSeq 135 1338 1490 2000 94.7% chr2 + 153251307 153251473 167 browser details YourSeq 134 1338 1490 2000 94.2% chr11 + 32660869 32661022 154 browser details YourSeq 133 1338 1492 2000 93.6% chr13 - 3572344 3572505 162 browser details YourSeq 133 1338 1492 2000 93.0% chr4 + 130696187 130696341 155 browser details YourSeq 133 1345 1492 2000 95.3% chr11 + 87996467 87996619 153 browser details YourSeq 132 1341 1489 2000 95.4% chr4 - 135852550 135852917 368 browser details YourSeq 132 1346 1492 2000 93.2% chr12 - 78597779 78597923 145 browser details YourSeq 132 1341 1490 2000 94.7% chr17 + 23607715 23607982 268 browser details YourSeq 131 1349 1492 2000 95.9% chr2 - 32465770 32466396 627

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 + 101590452 101592451 2000 browser details YourSeq 174 1754 1976 2000 92.3% chr11 + 101341013 101341519 507 browser details YourSeq 167 1744 1930 2000 92.7% chr1 - 60387421 60387599 179 browser details YourSeq 167 1729 1914 2000 96.7% chr10 + 41408571 41408758 188 browser details YourSeq 166 1726 1914 2000 96.2% chr12 + 62883599 62883788 190 browser details YourSeq 161 1766 1968 2000 92.1% chr12 - 12887484 12888001 518 browser details YourSeq 159 1649 1913 2000 94.4% chr13 - 56552359 56552791 433 browser details YourSeq 158 1649 1913 2000 93.9% chr10 - 63464560 63465064 505 browser details YourSeq 156 1745 1930 2000 90.7% chr1 - 178831960 178832136 177 browser details YourSeq 154 1740 1914 2000 94.3% chr1 - 9522354 9522582 229 browser details YourSeq 153 1744 1914 2000 95.3% chr10 + 24726210 24726426 217 browser details YourSeq 153 1744 1930 2000 88.8% chr1 + 17390293 17390461 169 browser details YourSeq 152 1745 1914 2000 92.1% chr12 + 76184074 76184237 164 browser details YourSeq 151 1745 1915 2000 93.5% chr17 + 28610103 28610272 170 browser details YourSeq 150 1726 1913 2000 88.4% chr13 - 21269993 21270164 172 browser details YourSeq 149 1745 1904 2000 97.5% chr16 - 20492066 20492228 163 browser details YourSeq 149 1750 1915 2000 95.8% chr11 - 116037170 116037338 169 browser details YourSeq 149 1744 1914 2000 92.3% chr8 + 108604535 108604689 155 browser details YourSeq 149 1744 1932 2000 88.6% chr11 + 112388536 112388716 181 browser details YourSeq 148 1744 1914 2000 92.7% chr12 - 100685871 100686036 166

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Tmem106a transmembrane protein 106A [ Mus musculus () ] Gene ID: 217203, updated on 24-Oct-2019

Gene summary

Official Symbol Tmem106a provided by MGI Official Full Name transmembrane protein 106A provided by MGI Primary source MGI:MGI:1922056 See related Ensembl:ENSMUSG00000034947 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI043106; BC022145; 0610008L10Rik Expression Broad expression in adult (RPKM 64.7), placenta adult (RPKM 25.6) and 15 other tissues See more Orthologs all

Genomic context

Location: 11; 11 D See Tmem106a in Genome Data Viewer Exon count: 9

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (101582242..101591788)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (101443556..101453099)

Chromosome 11 - NC_000077.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Tmem106a ENSMUSG00000034947

Description transmembrane protein 106A [Source:MGI Symbol;Acc:MGI:1922056] Gene Synonyms 0610008L10Rik Location Chromosome 11: 101,582,242-101,591,788 forward strand. GRCm38:CM001004.2 About this gene This gene has 6 transcripts (splice variants), 108 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tmem106a-202 ENSMUST00000100403.8 2428 261aa ENSMUSP00000097971.2 Protein coding CCDS25476 Q8VC04 TSL:5 GENCODE basic APPRIS P1

Tmem106a-201 ENSMUST00000039581.13 2304 261aa ENSMUSP00000045832.7 Protein coding CCDS25476 Q8VC04 TSL:1 GENCODE basic APPRIS P1

Tmem106a-204 ENSMUST00000128614.1 893 156aa ENSMUSP00000122218.1 Protein coding - A2A4M9 CDS 3' incomplete TSL:3

Tmem106a-203 ENSMUST00000107194.7 664 171aa ENSMUSP00000102812.1 Protein coding - A2A4N0 CDS 3' incomplete TSL:2

Tmem106a-205 ENSMUST00000128659.1 1069 No protein - lncRNA - - TSL:1

Tmem106a-206 ENSMUST00000143045.7 1021 No protein - lncRNA - - TSL:3

Page 7 of 9 https://www.alphaknockout.com

29.55 kb Forward strand 101.58Mb 101.59Mb 101.60Mb (Comprehensive set... Nbr1-203 >protein coding Tmem106a-202 >protein coding

Nbr1-202 >protein coding Tmem106a-201 >protein coding

Nbr1-204 >protein coding Tmem106a-206 >lncRNA

Nbr1-208 >nonsense mediated decay Tmem106a-203 >protein coding

Nbr1-206 >protein coding Tmem106a-204 >protein coding

Nbr1-205 >protein coding Tmem106a-205 >lncRNA

Nbr1-207 >protein coding

Nbr1-201 >protein coding

Nbr1-217 >protein coding

Nbr1-219 >retained intron Nbr1-212 >retained intron

Nbr1-216 >retained intron

Contigs AL590994.13 > Genes < Gm11634-201protein coding (Comprehensive set...

< Gm11634-202protein coding

Regulatory Build

101.58Mb 101.59Mb 101.60Mb Reverse strand 29.55 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000039581

9.55 kb Forward strand

Tmem106a-201 >protein coding

ENSMUSP00000045... Transmembrane heli... MobiDB lite Pfam Protein of unknown function DUF1356, TMEM106

PANTHER PTHR28556:SF3

Protein of unknown function DUF1356, TMEM106

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 40 80 120 160 200 261

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9