https://www.alphaknockout.com

Mouse Tonsl Knockout Project (CRISPR/Cas9)

Objective: To create a Tonsl knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tonsl (NCBI Reference Sequence: NM_183091 ; Ensembl: ENSMUSG00000059323 ) is located on Mouse 15. 26 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 26 (Transcript: ENSMUST00000168185). Exon 4~24 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 4 starts from about 6.48% of the coding region. Exon 4~24 covers 85.89% of the coding region. The size of effective KO region: ~9767 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3' 13 16 20

1 4 5 6 7 8 9 10 11 12 1415 17 18 19 21 22 23 24 26

Legends Exon of mouse Tonsl Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 424 bp section upstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 374 bp section downstream of Exon 24 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(424bp) | A(24.06% 102) | C(21.46% 91) | T(26.89% 114) | G(27.59% 117)

Note: The 424 bp section upstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(374bp) | A(18.18% 68) | C(30.21% 113) | T(27.81% 104) | G(23.8% 89)

Note: The 374 bp section downstream of Exon 24 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 424 1 424 424 100.0% chr15 - 76639024 76639447 424 browser details YourSeq 28 341 369 424 100.0% chr7 - 128276504 128276781 278 browser details YourSeq 25 7 38 424 73.1% chr12 + 53621888 53621913 26 browser details YourSeq 23 391 413 424 100.0% chr8 - 60075800 60075822 23 browser details YourSeq 23 391 413 424 100.0% chr4 - 13013810 13013832 23 browser details YourSeq 23 391 413 424 100.0% chr14 - 83731465 83731487 23 browser details YourSeq 23 391 413 424 100.0% chrX + 29663111 29663133 23 browser details YourSeq 23 391 413 424 100.0% chr3 + 159295345 159295367 23 browser details YourSeq 23 391 413 424 100.0% chr15 + 70656322 70656344 23 browser details YourSeq 23 391 413 424 100.0% chr14 + 91803222 91803244 23 browser details YourSeq 23 391 413 424 100.0% chr1 + 112581485 112581507 23 browser details YourSeq 22 391 412 424 100.0% chr10 - 16482736 16482757 22 browser details YourSeq 21 238 268 424 83.9% chr7 - 131946146 131946176 31 browser details YourSeq 21 134 154 424 100.0% chr4 - 149609587 149609607 21 browser details YourSeq 21 393 413 424 100.0% chr13 + 87104611 87104631 21 browser details YourSeq 20 318 337 424 100.0% chr3 - 68740532 68740551 20 browser details YourSeq 20 246 265 424 100.0% chr14 - 104847975 104847994 20 browser details YourSeq 20 280 299 424 100.0% chr17 + 86950966 86950985 20

Note: The 424 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 374 1 374 374 100.0% chr15 - 76628883 76629256 374 browser details YourSeq 27 254 286 374 75.9% chr13 + 47164273 47164301 29 browser details YourSeq 25 128 153 374 100.0% chr10 + 14381482 14381686 205 browser details YourSeq 21 341 363 374 95.7% chr17 + 91811150 91811172 23 browser details YourSeq 20 202 221 374 100.0% chr18 + 71088795 71088814 20 browser details YourSeq 20 191 210 374 100.0% chr14 + 16574812 16574831 20

Note: The 374 bp section downstream of Exon 24 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Tonsl tonsoku-like, DNA repair protein [ Mus musculus (house mouse) ] Gene ID: 72749, updated on 1-Oct-2019

Gene summary

Official Symbol Tonsl provided by MGI Official Full Name tonsoku-like, DNA repair protein provided by MGI Primary source MGI:MGI:1919999 See related Ensembl:ENSMUSG00000059323 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Nfkbil2; 2810439M11Rik Expression Ubiquitous expression in duodenum adult (RPKM 12.6), large intestine adult (RPKM 12.1) and 26 other tissues See more Orthologs human all

Genomic context

Location: 15; 15 D3 See Tonsl in Genome Data Viewer Exon count: 27

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 15 NC_000081.6 (76626237..76639929, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 15 NC_000081.5 (76456667..76470359, complement)

Chromosome 15 - NC_000081.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Tonsl ENSMUSG00000059323

Description tonsoku-like, DNA repair protein [Source:MGI Symbol;Acc:MGI:1919999] Gene Synonyms 2810439M11Rik, Nfkbil2 Location Chromosome 15: 76,626,002-76,639,958 reverse strand. GRCm38:CM001008.2 About this gene This gene has 8 transcripts (splice variants), 174 orthologues, 4 paralogues, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tonsl- ENSMUST00000168185.7 4235 1363aa ENSMUSP00000129597.1 Protein coding CCDS27580 G3UW83 TSL:1 206 GENCODE basic APPRIS P1

Tonsl- ENSMUST00000165163.7 743 86aa ENSMUSP00000131229.1 Protein coding - F7A0T4 CDS 5' 203 incomplete TSL:3

Tonsl- ENSMUST00000166974.1 471 64aa ENSMUSP00000126362.1 Protein coding - E9PZL7 CDS 3' 205 incomplete TSL:2

Tonsl- ENSMUST00000165190.1 2932 417aa ENSMUSP00000131368.1 Nonsense mediated - E9Q2M2 TSL:1 204 decay

Tonsl- ENSMUST00000163990.1 502 No - Retained intron - - TSL:2 202 protein

Tonsl- ENSMUST00000163161.1 491 No - Retained intron - - TSL:3 201 protein

Tonsl- ENSMUST00000168432.1 420 No - Retained intron - - TSL:3 207 protein

Tonsl- ENSMUST00000171478.1 713 No - lncRNA - - TSL:3 208 protein

Page 7 of 9 https://www.alphaknockout.com

33.96 kb Forward strand 76.62Mb 76.63Mb 76.64Mb Contigs < AC157566.6 AC156550.5 > (Comprehensive set... < Slc39a4-201protein coding < Vps28-205retained intron < Tonsl-208lncRNA < Tonsl-201retained intron < Cyhr1-202protein coding

< Slc39a4-204protein coding < Vps28-203retained intron < Tonsl-204nonsense mediated decay < Cyhr1-207retained intron

< Slc39a4-203lncRNA < Vps28-204retained intron < Tonsl-207retained intron < Cyhr1-203protein coding

< Vps28-201protein coding < Tonsl-205protein coding < Cyhr1-205protein coding

< Vps28-202protein coding < Cyhr1-206lncRNA

< Tonsl-203protein coding

< Tonsl-206protein coding

< Tonsl-202retained intron

Regulatory Build

76.62Mb 76.63Mb 76.64Mb Reverse strand 33.96 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000168185

< Tonsl-206protein coding

Reverse strand 13.72 kb

ENSMUSP00000129... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Ankyrin repeat-containing domain superfamily SSF52047

Tetratricopeptide-like helical domain superfamily SMART Tetratricopeptide repeat Ankyrin repeat SM00368 Prints Ankyrin repeat Pfam Ankyrin repeat-containing domain Leucine-rich repeat

Tetratricopeptide repeat PROSITE profiles Ankyrin repeat-containing domain

Ankyrin repeat

Tetratricopeptide repeat-containing domain PANTHER PTHR46358 Gene3D Tetratricopeptide-like helical domain superfamily Ankyrin repeat-containing domain superfamily Leucine-rich repeat domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1363

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9