https://www.alphaknockout.com

Mouse Matk Knockout Project (CRISPR/Cas9)

Objective: To create a Matk knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Matk (NCBI Reference Sequence: NM_010768 ; Ensembl: ENSMUSG00000004933 ) is located on Mouse 10. 14 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 14 (Transcript: ENSMUST00000117488). Exon 2~14 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous mice are viable and fertile and appear normal. Unchallenged mutant mice exhibit no hematopoietic defects. SPKLS cell numbers are elevated. IL-7 induced BM cell proliferation and pre-B cell colony formation are enhanced. Antigen induced IFN-gamma secretion is reduced.

Exon 2 starts from about 0.07% of the coding region. Exon 2~14 covers 100.0% of the coding region. The size of effective KO region: ~4624 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Legends Exon of mouse Matk Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(19.1% 382) | C(29.15% 583) | T(23.45% 469) | G(28.3% 566)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.85% 457) | C(30.5% 610) | T(19.1% 382) | G(27.55% 551)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr10 + 81256237 81258236 2000 browser details YourSeq 149 2 914 2000 90.3% chr10 + 80195592 80460159 264568 browser details YourSeq 106 2 286 2000 79.6% chr11 + 86918060 86918295 236 browser details YourSeq 86 3 280 2000 87.1% chrX + 53206711 53207076 366 browser details YourSeq 54 800 902 2000 98.3% chr10 - 12908811 12908929 119 browser details YourSeq 53 207 281 2000 96.7% chr1 - 119067103 119401854 334752 browser details YourSeq 52 207 299 2000 84.5% chr12 + 52498020 52498258 239 browser details YourSeq 51 215 277 2000 91.9% chr17 - 45948879 45948953 75 browser details YourSeq 51 118 259 2000 85.0% chr10 - 80338830 80441256 102427 browser details YourSeq 51 69 259 2000 88.3% chr11 + 88468860 88469153 294 browser details YourSeq 49 124 277 2000 85.6% chr11 - 78725160 78725313 154 browser details YourSeq 48 207 277 2000 84.6% chr14 - 67117668 67117739 72 browser details YourSeq 47 859 933 2000 81.6% chr11 - 53240844 53240916 73 browser details YourSeq 47 807 906 2000 92.8% chr11 + 68704491 68705026 536 browser details YourSeq 46 206 294 2000 88.4% chr15 - 78909225 78909314 90 browser details YourSeq 46 213 280 2000 91.3% chr13 - 37680680 37680753 74 browser details YourSeq 46 214 280 2000 86.0% chr5 + 67257608 67257675 68 browser details YourSeq 45 208 282 2000 81.4% chr11 - 54744250 54744326 77 browser details YourSeq 45 854 914 2000 86.9% chr1 + 74827224 74827284 61 browser details YourSeq 44 223 277 2000 90.8% chr8 - 70093853 70093908 56

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr10 + 81262861 81264860 2000 browser details YourSeq 41 807 957 2000 66.0% chr1 - 192734599 192734667 69 browser details YourSeq 34 885 931 2000 97.3% chr8 + 5634168 5634223 56 browser details YourSeq 28 804 845 2000 69.7% chr17 + 88259843 88259876 34 browser details YourSeq 27 909 943 2000 86.3% chr4 + 84412680 84412712 33 browser details YourSeq 27 1175 1213 2000 75.9% chr11 + 61092046 61092078 33 browser details YourSeq 26 1918 1947 2000 96.5% chr11 + 107236881 107236913 33 browser details YourSeq 24 1925 1949 2000 100.0% chr8 - 85078711 85078737 27 browser details YourSeq 22 1112 1133 2000 100.0% chr5 + 74808930 74808951 22

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Matk megakaryocyte-associated tyrosine kinase [ Mus musculus (house mouse) ] Gene ID: 17179, updated on 10-Oct-2019

Gene summary

Official Symbol Matk provided by MGI Official Full Name megakaryocyte-associated tyrosine kinase provided by MGI Primary source MGI:MGI:99259 See related Ensembl:ENSMUSG00000004933 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as CHK; HYL; Ntk; p56ntk Expression Biased expression in cortex adult (RPKM 63.6), frontal lobe adult (RPKM 48.8) and 6 other tissues See more Orthologs human all

Genomic context

Location: 10 39.72 cM; 10 C1 See Matk in Genome Data Viewer Exon count: 18

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 10 NC_000076.6 (81252935..81262985)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 10 NC_000076.5 (80720290..80725726)

Chromosome 10 - NC_000076.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Matk ENSMUSG00000004933

Description megakaryocyte-associated tyrosine kinase [Source:MGI Symbol;Acc:MGI:99259] Gene Synonyms CHK, Csk homologous kinase, HYL, Ntk Location Chromosome 10: 81,252,935-81,263,365 forward strand. GRCm38:CM001003.2 About this gene This gene has 11 transcripts (splice variants), 172 orthologues, 32 paralogues, is a member of 1 Ensembl protein family and is associated with 4 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Matk- ENSMUST00000105328.9 2228 465aa ENSMUSP00000100965.3 Protein coding CCDS70078 A0A0R4J1P8 TSL:1 201 GENCODE basic APPRIS ALT1

Matk- ENSMUST00000117488.7 1980 505aa ENSMUSP00000113221.1 Protein coding CCDS24048 A0A0R4J1N6 TSL:1 202 GENCODE basic

Matk- ENSMUST00000119547.7 1916 465aa ENSMUSP00000113576.1 Protein coding CCDS70078 A0A0R4J1P8 TSL:1 203 GENCODE basic APPRIS ALT1

Matk- ENSMUST00000121205.7 1841 466aa ENSMUSP00000113043.1 Protein coding CCDS70077 D3Z4T5 TSL:1 205 GENCODE basic APPRIS P4

Matk- ENSMUST00000120265.1 1618 466aa ENSMUSP00000113666.1 Protein coding CCDS70077 D3Z4T5 TSL:5 204 GENCODE basic APPRIS P4

Matk- ENSMUST00000130282.7 495 101aa ENSMUSP00000114233.1 Protein coding - D3YVQ8 CDS 3' 208 incomplete TSL:3

Matk- ENSMUST00000128576.7 2599 162aa ENSMUSP00000122445.1 Nonsense mediated - D6RGA0 TSL:1 207 decay

Matk- ENSMUST00000150605.7 2983 No - Retained intron - - TSL:1 210 protein

Matk- ENSMUST00000151660.7 1761 No - Retained intron - - TSL:1 211 protein

Matk- ENSMUST00000126720.7 1075 No - Retained intron - - TSL:1 206 protein

Matk- ENSMUST00000148735.1 860 No - Retained intron - - TSL:3 209 protein

Page 7 of 9 https://www.alphaknockout.com

30.43 kb Forward strand 81.25Mb 81.26Mb 81.27Mb (Comprehensive set... Zfr2-201 >protein coding Matk-203 >protein coding Apba3-211 >protein coding

Zfr2-204 >retained intron Matk-207 >nonsense mediated decay Apba3-203 >lncRNA

Zfr2-205 >protein coding Matk-201 >protein coding Apba3-201 >protein coding

Zfr2-207 >retained intron Matk-202 >protein coding Apba3-208 >protein coding

Matk-205 >protein coding Apba3-206 >protein coding

Matk-210 >retained intron Apba3-210 >retained intron

Matk-206 >retained intron Apba3-207 >retained intron

Matk-208 >protein coding Apba3-204 >retained intron

Matk-204 >protein coding Apba3-209 >protein coding

Matk-211 >retained intron Apba3-205 >protein coding

Matk-209 >retained intron Apba3-202 >lncRNA

Mir3057-201 >miRNA

Contigs AC155932.6 > Genes < Mrpl54-201protein coding < Tjp3-201protein coding (Comprehensive set...

< Mrpl54-202retained intron

< Tjp3-205protein coding

Regulatory Build

81.25Mb 81.26Mb 81.27Mb Reverse strand 30.43 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000117488

5.68 kb Forward strand

Matk-202 >protein coding

ENSMUSP00000113... MobiDB lite Low complexity (Seg) Superfamily SH3-like domain superfamily Protein kinase-like domain superfamily

SH2 domain superfamily SMART SH3 domain SH2 domain Tyrosine-protein kinase, catalytic domain

Prints SH2 domain Serine-threonine/tyrosine-protein kinase, catalytic domain Pfam SH3 domain SH2 domain Serine-threonine/tyrosine-protein kinase, catalytic domain

PROSITE profiles SH3 domain SH2 domain Protein kinase domain

PROSITE patterns Tyrosine-protein kinase, active site

Protein kinase, ATP binding site PIRSF PIRSF000615 PANTHER PTHR24418:SF399

PTHR24418 Gene3D 2.30.30.40 SH2 domain superfamily 1.10.510.10

CDD CSK-like, SH2 domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 505

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9