https://www.alphaknockout.com

Mouse Tom1l2 Knockout Project (CRISPR/Cas9)

Objective: To create a Tom1l2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tom1l2 (NCBI Reference Sequence: NM_153080 ; Ensembl: ENSMUSG00000000538 ) is located on Mouse 11. 15 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 15 (Transcript: ENSMUST00000102683). Exon 2~4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a hypomorphic gene trap allele show malocclusion, kyphosis, hydrocephaly, patchy hair, splenomegaly, high B- and T-cell counts, thrombopenia, impaired humoral responses, a high frequency of infections and tumors, renal cysts, skin lesions, freezing behavior and sporadic bleeding.

Exon 2 starts from about 3.48% of the coding region. Exon 2~4 covers 20.64% of the coding region. The size of effective KO region: ~9966 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 15

Legends Exon of mouse Tom1l2 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.2% 484) | C(20.5% 410) | T(32.45% 649) | G(22.85% 457)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.05% 521) | C(21.45% 429) | T(26.35% 527) | G(26.15% 523)

Note: The 2000 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 - 60280284 60282283 2000 browser details YourSeq 156 295 494 2000 91.1% chr15 + 76021600 76021799 200 browser details YourSeq 155 294 477 2000 95.4% chrX + 134646665 134646857 193 browser details YourSeq 147 323 494 2000 95.1% chrX - 60266922 60267472 551 browser details YourSeq 147 352 1532 2000 92.0% chr9 + 80455187 80567612 112426 browser details YourSeq 147 300 475 2000 93.6% chr16 + 17982702 17982890 189 browser details YourSeq 145 315 475 2000 95.1% chr6 - 30583183 30583343 161 browser details YourSeq 145 297 475 2000 91.8% chr15 - 93515967 93516144 178 browser details YourSeq 145 295 476 2000 91.5% chr11 - 101252059 101252244 186 browser details YourSeq 145 295 467 2000 92.0% chr11 - 70722487 70722659 173 browser details YourSeq 145 319 503 2000 91.5% chr3 + 113768188 113768375 188 browser details YourSeq 145 318 490 2000 94.0% chr11 + 74997454 74997768 315 browser details YourSeq 144 323 494 2000 94.5% chr11 - 68948642 68948814 173 browser details YourSeq 144 296 476 2000 90.0% chrX + 13144799 13144980 182 browser details YourSeq 144 319 505 2000 89.2% chr5 + 36901399 36901595 197 browser details YourSeq 143 295 482 2000 90.9% chr1 + 43676601 43676792 192 browser details YourSeq 142 296 474 2000 92.3% chr19 + 43097316 43097528 213 browser details YourSeq 142 306 495 2000 86.5% chr14 + 108959070 108959246 177 browser details YourSeq 141 306 480 2000 90.9% chr11 - 30473194 30473911 718 browser details YourSeq 140 297 467 2000 91.3% chr5 + 65463937 65464266 330

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 - 60268318 60270317 2000 browser details YourSeq 134 864 1012 2000 95.3% chr1 + 72677026 72677188 163 browser details YourSeq 131 865 1013 2000 95.2% chr2 + 29588817 29589149 333 browser details YourSeq 128 857 1012 2000 95.2% chr2 - 158429099 158429273 175 browser details YourSeq 121 858 1341 2000 83.5% chr16 + 24035194 24035572 379 browser details YourSeq 117 856 984 2000 95.4% chr18 + 78556728 78556856 129 browser details YourSeq 117 857 984 2000 96.1% chr17 + 8328408 8328536 129 browser details YourSeq 116 856 984 2000 95.4% chr8 + 115616331 115616460 130 browser details YourSeq 114 857 984 2000 94.6% chr17 - 47170769 47170896 128 browser details YourSeq 114 833 984 2000 92.5% chr14 - 19724410 19724561 152 browser details YourSeq 114 857 984 2000 94.6% chr7 + 125293112 125293239 128 browser details YourSeq 113 856 984 2000 93.8% chr19 - 41312605 41312733 129 browser details YourSeq 113 866 984 2000 97.5% chr2 + 154637670 154637788 119 browser details YourSeq 113 857 984 2000 92.1% chr12 + 72831122 72831247 126 browser details YourSeq 113 857 985 2000 93.8% chr1 + 156504308 156504436 129 browser details YourSeq 112 857 984 2000 93.8% chr7 - 3182016 3182143 128 browser details YourSeq 112 864 984 2000 94.2% chr2 - 48837258 48837376 119 browser details YourSeq 112 855 984 2000 93.1% chr18 - 8208227 8208356 130 browser details YourSeq 112 865 1012 2000 93.1% chr10 - 116708515 116708663 149 browser details YourSeq 112 843 984 2000 88.5% chr19 + 56931656 56931791 136

Note: The 2000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Tom1l2 target of myb1-like 2 (chicken) [ Mus musculus (house mouse) ] Gene ID: 216810, updated on 24-Oct-2019

Gene summary

Official Symbol Tom1l2 provided by MGI Official Full Name target of myb1-like 2 (chicken) provided by MGI Primary source MGI:MGI:2443306 See related Ensembl:ENSMUSG00000000538 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Srebf1; AU042072; 2900016I08Rik; A730055F12Rik Expression Ubiquitous expression in adrenal adult (RPKM 35.6), cortex adult (RPKM 28.0) and 26 other tissues See more Orthologs human all

Genomic context

Location: 11; 11 B2 See Tom1l2 in Genome Data Viewer Exon count: 21

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (60226714..60352932, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (60040216..60166407, complement)

Chromosome 11 - NC_000077.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Tom1l2 ENSMUSG00000000538

Description target of myb1-like 2 (chicken) [Source:MGI Symbol;Acc:MGI:2443306] Gene Synonyms 2900016I08Rik, A730055F12Rik, myb1-like protein 2 Location Chromosome 11: 60,226,714-60,352,905 reverse strand. GRCm38:CM001004.2 About this gene This gene has 11 transcripts (splice variants), 274 orthologues, 10 paralogues, is a member of 1 Ensembl protein family and is associated with 40 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tom1l2-206 ENSMUST00000102683.10 4999 507aa ENSMUSP00000099744.4 Protein coding CCDS24786 Q5SRX1 TSL:1 GENCODE basic APPRIS P3

Tom1l2-204 ENSMUST00000095254.11 4940 487aa ENSMUSP00000092884.5 Protein coding CCDS36171 Q5SRX1 TSL:1 GENCODE basic APPRIS ALT1

Tom1l2-205 ENSMUST00000102682.4 2256 440aa ENSMUSP00000099743.4 Protein coding CCDS24787 Q5SRX1 TSL:1 GENCODE basic

Tom1l2-203 ENSMUST00000093048.12 4864 462aa ENSMUSP00000090736.6 Protein coding - Q5SXA4 TSL:5 GENCODE basic

Tom1l2-202 ENSMUST00000093046.12 4849 457aa ENSMUSP00000090734.6 Protein coding - Q5SXA5 TSL:5 GENCODE basic

Tom1l2-201 ENSMUST00000064019.14 2151 450aa ENSMUSP00000063414.8 Protein coding - Q5SRX1 TSL:1 GENCODE basic

Tom1l2-207 ENSMUST00000133420.7 790 199aa ENSMUSP00000117623.1 Protein coding - F6ZDJ1 CDS 5' incomplete TSL:3

Tom1l2-209 ENSMUST00000143124.7 729 179aa ENSMUSP00000121936.1 Protein coding - F6RBX1 CDS 5' incomplete TSL:3

Tom1l2-211 ENSMUST00000153920.1 3791 No protein - lncRNA - - TSL:1

Tom1l2-208 ENSMUST00000142225.1 3013 No protein - lncRNA - - TSL:1

Tom1l2-210 ENSMUST00000151284.1 886 No protein - lncRNA - - TSL:2

Page 7 of 9 https://www.alphaknockout.com

146.19 kb Forward strand 60.22Mb 60.24Mb 60.26Mb 60.28Mb 60.30Mb 60.32Mb 60.34Mb 60.36Mb Gm12265-201 >lncRNA Gm12266-201 >processed pseudogene (Comprehensive set...

Drc3-204 >protein coding

Drc3-202 >protein coding

Drc3-201 >protein coding

Drc3-203 >protein coding

Drc3-205 >lncRNA

Contigs AL669954.6 > AL596090.11 > Genes < Srebf1-201protein coding < Tom1l2-201protein coding (Comprehensive set...

< Srebf1-204lncRNA < Tom1l2-205protein coding

< Tom1l2-204protein coding

< Tom1l2-202protein coding

< Tom1l2-203protein coding

< Tom1l2-206protein coding

< Tom1l2-211lncRNA < Tom1l2-208lncRNA

< Tom1l2-210lncRNA < Gm27711-201miRNA

< Tom1l2-209protein coding

< Tom1l2-207protein coding

Regulatory Build

60.22Mb 60.24Mb 60.26Mb 60.28Mb 60.30Mb 60.32Mb 60.34Mb 60.36Mb Reverse strand 146.19 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

pseudogene RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000102683

< Tom1l2-206protein coding

Reverse strand 126.19 kb

ENSMUSP00000099... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily SSF89009

ENTH/VHS SMART VHS domain Pfam VHS domain GAT domain

PROSITE profiles VHS domain GAT domain

PIRSF Target of Myb protein 1 PANTHER PTHR13856

Target of Myb1-like 2 Gene3D ENTH/VHS GAT domain superfamily

CDD cd16996 cd14238

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 507

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9