https://www.alphaknockout.com

Mouse Themis Knockout Project (CRISPR/Cas9)

Objective: To create a Themis knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Themis (NCBI Reference Sequence: NM_178666 ; Ensembl: ENSMUSG00000049109 ) is located on Mouse 10. 6 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 5 (Transcript: ENSMUST00000056097). Exon 4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous null mice have defects in T cell positive selection that leads to very few alpha-beta T cells being found in the periphery.

Exon 4 starts from about 37.37% of the coding region. Exon 4 covers 55.14% of the coding region. The size of effective KO region: ~1052 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 4 6

Legends Exon of mouse Themis Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 4 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(33.25% 665) | C(15.0% 300) | T(34.0% 680) | G(17.75% 355)

Note: The 2000 bp section upstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(32.1% 642) | C(16.35% 327) | T(30.8% 616) | G(20.75% 415)

Note: The 2000 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr10 + 28779692 28781691 2000 browser details YourSeq 30 764 849 2000 94.2% chrX + 53347115 53347202 88 browser details YourSeq 23 749 777 2000 89.7% chr1 - 132087700 132087728 29 browser details YourSeq 23 751 775 2000 87.5% chr2 + 86204200 86204223 24 browser details YourSeq 20 1383 1402 2000 100.0% chr2 + 105299604 105299623 20

Note: The 2000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr10 + 28782744 28784743 2000 browser details YourSeq 124 1259 1382 2000 100.0% chr17 + 94030130 94030253 124 browser details YourSeq 118 1259 1380 2000 98.4% chr9 - 107383449 107383570 122 browser details YourSeq 113 1259 1373 2000 99.2% chr5 + 12983446 12983560 115 browser details YourSeq 112 1258 1380 2000 96.0% chr6 - 28146081 28146204 124 browser details YourSeq 112 1262 1373 2000 100.0% chr4 + 33071707 33071818 112 browser details YourSeq 111 1259 1369 2000 100.0% chr3 + 48906684 48906794 111 browser details YourSeq 110 1259 1371 2000 99.2% chr8 - 42352268 42352380 113 browser details YourSeq 110 1262 1373 2000 99.2% chr1 - 142409106 142409217 112 browser details YourSeq 109 1259 1370 2000 99.1% chr12 - 78064424 78064537 114 browser details YourSeq 108 1259 1366 2000 100.0% chr14 + 81587685 81587792 108 browser details YourSeq 107 1259 1365 2000 100.0% chr6 - 5875411 5875517 107 browser details YourSeq 105 1258 1366 2000 98.2% chr14 - 56279944 56280052 109 browser details YourSeq 105 1259 1363 2000 100.0% chr12 + 90094162 90094266 105 browser details YourSeq 104 1259 1364 2000 99.1% chr19 + 51365077 51365182 106 browser details YourSeq 103 1259 1363 2000 99.1% chr17 - 61027844 61027948 105 browser details YourSeq 103 1259 1361 2000 100.0% chr1 - 160035146 160035248 103 browser details YourSeq 103 1259 1363 2000 99.1% chr9 + 90559869 90559973 105 browser details YourSeq 101 1259 1359 2000 100.0% chr8 - 83219255 83219355 101 browser details YourSeq 101 1259 1366 2000 97.2% chr12 - 28133152 28133260 109

Note: The 2000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Themis thymocyte selection associated [ Mus musculus (house mouse) ] Gene ID: 210757, updated on 24-Oct-2019

Gene summary

Official Symbol Themis provided by MGI Official Full Name thymocyte selection associated provided by MGI Primary source MGI:MGI:2443552 See related Ensembl:ENSMUSG00000049109 Gene type protein coding RefSeq status REVIEWED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Gasp; Spot; Tsepa; thylex; E430004N04Rik Summary This gene encodes a protein that plays a regulatory role in both positive and negative T-cell selection during late thymocyte Expression development. The protein functions through T-cell antigen receptor signaling, and is necessary for proper lineage commitment and maturation of T-cells. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Mar 2015] Orthologs Restricted expression toward thymus adult (RPKM 8.8) See more human all

Genomic context

Location: 10; 10 A4 See Themis in Genome Data Viewer Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 10 NC_000076.6 (28668327..28883820)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 10 NC_000076.5 (28388201..28602555)

Chromosome 10 - NC_000076.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Themis ENSMUSG00000049109

Description thymocyte selection associated [Source:MGI Symbol;Acc:MGI:2443552] Gene Synonyms E430004N04Rik, Gasp, Tsepa Location Chromosome 10: 28,668,360-28,883,818 forward strand. GRCm38:CM001003.2 About this gene This gene has 8 transcripts (splice variants), 153 orthologues, 2 paralogues, is a member of 1 Ensembl protein family and is associated with 32 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Themis- ENSMUST00000056097.10 5082 636aa ENSMUSP00000060129.4 Protein coding CCDS23758 Q8BGW0 TSL:1 201 GENCODE basic APPRIS P2

Themis- ENSMUST00000105516.8 2361 595aa ENSMUSP00000101155.2 Protein coding - Q8BGW0 TSL:1 203 GENCODE basic APPRIS ALT2

Themis- ENSMUST00000060409.12 2278 605aa ENSMUSP00000055315.6 Protein coding - Q8BGW0 TSL:1 202 GENCODE basic APPRIS ALT2

Themis- ENSMUST00000161345.1 636 203aa ENSMUSP00000123894.1 Protein coding - E0CYT7 CDS 3' 205 incomplete TSL:3

Themis- ENSMUST00000159927.7 4827 94aa ENSMUSP00000123919.1 Nonsense mediated - E0CY68 TSL:1 204 decay

Themis- ENSMUST00000162202.7 1252 94aa ENSMUSP00000124451.1 Nonsense mediated - E0CY68 TSL:1 206 decay

Themis- ENSMUST00000219119.1 3461 No - Retained intron - - TSL:NA 208 protein

Themis- ENSMUST00000162343.1 3403 No - Retained intron - - TSL:1 207 protein

Page 7 of 9 https://www.alphaknockout.com

235.46 kb Forward strand 28.70Mb 28.75Mb 28.80Mb 28.85Mb (Comprehensive set... Themis-201 >protein coding

Themis-202 >protein coding

Themis-207 >retained intron

Themis-203 >protein coding

Themis-204 >nonsense mediated decay

Themis-205 >protein coding

Themis-206 >nonsense mediated decay

Themis-208 >retained intron

Contigs AC152983.2 > < AC159472.6 Genes < Gm47834-201processed pseudogene < 4930519F09Rik-201lncRNA (Comprehensive set...

Regulatory Build

28.70Mb 28.75Mb 28.80Mb 28.85Mb Reverse strand 235.46 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

pseudogene processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000056097

215.46 kb Forward strand

Themis-201 >protein coding

ENSMUSP00000060... MobiDB lite Low complexity (Seg) Pfam CABIT domain PANTHER PTHR15215:SF1

Protein THEMIS

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 636

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9