https://www.alphaknockout.com

Mouse Hs2st1 Knockout Project (CRISPR/Cas9)

Objective: To create a Hs2st1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hs2st1 (NCBI Reference Sequence: NM_011828 ; Ensembl: ENSMUSG00000040151 ) is located on Mouse 3. 7 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 7 (Transcript: ENSMUST00000043325). Exon 2 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: A mutation in this gene causes bilateral renal agenesis, bone defects, eye development abnormalities and cataracts in homozygous mice.

Exon 2 starts from about 11.7% of the coding region. Exon 2 covers 22.38% of the coding region. The size of effective KO region: ~239 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 7

Legends Exon of mouse Hs2st1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.35% 487) | C(21.95% 439) | T(34.2% 684) | G(19.5% 390)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.7% 534) | C(21.1% 422) | T(31.0% 620) | G(21.2% 424)

Note: The 2000 bp section downstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 - 144465286 144467285 2000 browser details YourSeq 144 164 401 2000 87.9% chr11 - 94888089 94888341 253 browser details YourSeq 136 164 413 2000 84.7% chr11 - 25890878 25891126 249 browser details YourSeq 134 163 402 2000 85.8% chr16 - 52407321 52407563 243 browser details YourSeq 132 172 402 2000 88.4% chr1 - 67790068 67790318 251 browser details YourSeq 128 170 386 2000 88.1% chrX + 105225534 105225752 219 browser details YourSeq 128 162 397 2000 86.4% chr5 + 66821725 66821957 233 browser details YourSeq 125 163 396 2000 88.5% chr5 + 141339476 141339716 241 browser details YourSeq 123 161 401 2000 83.9% chr2 - 123359857 123360373 517 browser details YourSeq 121 174 397 2000 87.5% chr3 + 67649182 67649438 257 browser details YourSeq 120 166 402 2000 91.1% chr15 - 101467555 101467832 278 browser details YourSeq 120 177 413 2000 87.9% chr11 + 91370774 91371022 249 browser details YourSeq 119 170 402 2000 92.4% chr14 - 72191045 72191278 234 browser details YourSeq 119 171 402 2000 87.5% chr2 + 150033346 150033576 231 browser details YourSeq 119 175 402 2000 91.7% chr10 + 14799219 14799450 232 browser details YourSeq 118 171 386 2000 82.4% chrX - 135137139 135137342 204 browser details YourSeq 118 171 386 2000 82.4% chrX + 135534852 135535055 204 browser details YourSeq 117 171 402 2000 84.0% chr18 + 52432823 52433038 216 browser details YourSeq 117 167 368 2000 87.0% chr18 + 14545892 14546231 340 browser details YourSeq 116 164 402 2000 80.4% chr3 - 15145216 15145440 225

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 - 144463047 144465046 2000 browser details YourSeq 58 291 390 2000 77.5% chr19 - 11880225 11880315 91 browser details YourSeq 53 281 381 2000 92.2% chr1 + 133509657 133509773 117 browser details YourSeq 49 91 145 2000 88.5% chr19 + 60562110 60562161 52 browser details YourSeq 49 282 372 2000 77.6% chr19 + 31632669 31632772 104 browser details YourSeq 49 280 382 2000 73.8% chr11 + 78728562 78728664 103 browser details YourSeq 48 282 383 2000 74.0% chr1 + 181245851 181245954 104 browser details YourSeq 48 278 372 2000 75.6% chr1 + 111450326 111450430 105 browser details YourSeq 47 299 375 2000 80.6% chr7 - 127913416 127913492 77 browser details YourSeq 47 299 382 2000 92.8% chr13 + 97479694 97479777 84 browser details YourSeq 47 291 375 2000 94.6% chr10 + 93979865 93979951 87 browser details YourSeq 46 288 383 2000 74.8% chr19 - 48631557 48631654 98 browser details YourSeq 46 280 376 2000 74.8% chr18 - 61115760 61115864 105 browser details YourSeq 46 289 376 2000 96.0% chr6 + 114974290 114974385 96 browser details YourSeq 45 281 372 2000 71.2% chr6 - 48213654 48213743 90 browser details YourSeq 45 280 372 2000 83.1% chr1 - 183347576 183347664 89 browser details YourSeq 45 280 381 2000 88.5% chr11 + 94732051 94732151 101 browser details YourSeq 45 281 376 2000 73.5% chr11 + 63697415 63697518 104 browser details YourSeq 44 299 381 2000 77.2% chr14 + 56653219 56653302 84 browser details YourSeq 44 288 383 2000 73.0% chr14 + 43955388 43955483 96

Note: The 2000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Hs2st1 heparan sulfate 2-O-sulfotransferase 1 [ Mus musculus (house mouse) ] Gene ID: 23908, updated on 27-Aug-2019

Gene summary

Official Symbol Hs2st1 provided by MGI Official Full Name heparan sulfate 2-O-sulfotransferase 1 provided by MGI Primary source MGI:MGI:1346049 See related Ensembl:ENSMUSG00000040151 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 2OST; Hs2st; AW214369; mKIAA0448 Expression Ubiquitous expression in lung adult (RPKM 8.3), whole brain E14.5 (RPKM 6.1) and 28 other tissues See more Orthologs human all

Genomic context

Location: 3; 3 H2 See Hs2st1 in Genome Data Viewer Exon count: 8

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (144429701..144570216, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (144094071..144233180, complement)

Chromosome 3 - NC_000069.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Hs2st1 ENSMUSG00000040151

Description heparan sulfate 2-O-sulfotransferase 1 [Source:MGI Symbol;Acc:MGI:1346049] Gene Synonyms Hs2st Location Chromosome 3: 144,429,706-144,570,181 reverse strand. GRCm38:CM000996.2 About this gene This gene has 3 transcripts (splice variants), 250 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 16 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Hs2st1- ENSMUST00000043325.8 6177 356aa ENSMUSP00000043066.7 Protein coding CCDS17883 Q8R3H7 TSL:1 201 GENCODE basic APPRIS P1

Hs2st1- ENSMUST00000160690.1 598 75aa ENSMUSP00000123816.1 Nonsense mediated - E0CYX6 TSL:3 202 decay

Hs2st1- ENSMUST00000199680.1 2321 No - Retained intron - - TSL:NA 203 protein

160.48 kb Forward strand

144.42Mb 144.44Mb 144.46Mb 144.48Mb 144.50Mb 144.52Mb 144.54Mb 144.56Mb 144.58Mb Selenof-206 >retained intron (Comprehensive set...

Selenof-201 >protein coding

Selenof-202 >protein coding

Selenof-204 >protein coding

Selenof-203 >lncRNA

Contigs AC159976.4 > < AC123880.18 Genes (Comprehensive set... < Gm5857-201processed pseudogene < Gm43707-201lncRNA < Gm43560-201lncRNA < Hs2st1-203retained intron

< Hs2st1-201protein coding

< Hs2st1-202nonsense mediated decay

Regulatory Build

144.42Mb 144.44Mb 144.46Mb 144.48Mb 144.50Mb 144.52Mb 144.54Mb 144.56Mb 144.58Mb Reverse strand 160.48 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene pseudogene

Page 7 of 9 https://www.alphaknockout.com

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000043325

< Hs2st1-201protein coding

Reverse strand 140.48 kb

ENSMUSP00000043... Low complexity (Seg) Coiled-coils (Ncoils) Superfamily P-loop containing nucleoside triphosphate hydrolase Pfam Sulfotransferase PANTHER PTHR12129:SF14

Heparan sulphate 2-O-sulfotransferase Gene3D 3.40.50.300

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant stop retained variant

Scale bar 0 40 80 120 160 200 240 280 356

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9