https://www.alphaknockout.com

Mouse Hdgf Knockout Project (CRISPR/Cas9)

Objective: To create a Hdgf knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hdgf (NCBI Reference Sequence: NM_008231 ; Ensembl: ENSMUSG00000004897 ) is located on Mouse chromosome 3. 6 exons are identified, with the ATG start codon in exon 1 and the TAG in exon 6 (Transcript: ENSMUST00000005017). Exon 1~6 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a targeted disruption of this gene are viable and fertile and display no major morphological, biochemical or behavioral phenotypes except for a significant reduction in rearing activity.

Exon 1 starts from about 0.14% of the coding region. Exon 1~6 covers 100.0% of the coding region. The size of effective KO region: ~8278 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6

Legends Exon of mouse Hdgf Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.55% 511) | C(25.8% 516) | T(21.45% 429) | G(27.2% 544)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.75% 515) | C(25.0% 500) | T(27.3% 546) | G(21.95% 439)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 + 87904629 87906628 2000 browser details YourSeq 306 57 1023 2000 88.2% chr11 + 72554951 72555777 827 browser details YourSeq 299 381 949 2000 89.6% chr9 - 64278066 64278567 502 browser details YourSeq 280 25 934 2000 87.0% chr19 - 3536737 3537385 649 browser details YourSeq 248 75 736 2000 90.0% chr16 - 32318739 32319476 738 browser details YourSeq 241 381 972 2000 86.5% chr1 + 21334986 21335354 369 browser details YourSeq 233 28 972 2000 85.1% chr9 + 64181686 64182071 386 browser details YourSeq 232 381 906 2000 91.2% chr16 - 17221241 17222174 934 browser details YourSeq 232 383 708 2000 90.9% chr7 + 3269861 3270181 321 browser details YourSeq 223 74 585 2000 87.9% chr17 - 29272343 29272831 489 browser details YourSeq 222 381 959 2000 83.2% chr8 + 36672417 36672765 349 browser details YourSeq 222 383 970 2000 84.3% chr12 + 112990100 112990474 375 browser details YourSeq 217 381 990 2000 86.0% chr4 + 155977777 155978106 330 browser details YourSeq 213 92 871 2000 90.9% chr11 + 70243176 70243966 791 browser details YourSeq 211 381 910 2000 92.4% chr11 + 21233422 21234101 680 browser details YourSeq 210 381 950 2000 82.4% chr11 + 94606962 94607345 384 browser details YourSeq 208 14 584 2000 79.8% chr12 - 59263902 59264305 404 browser details YourSeq 207 381 708 2000 92.7% chr6 - 148824716 148825348 633 browser details YourSeq 200 381 949 2000 85.6% chr8 + 13970338 13970682 345 browser details YourSeq 199 474 968 2000 85.0% chr15 - 76665486 76665794 309

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 + 87914907 87916906 2000 browser details YourSeq 1004 1 1219 2000 92.8% chrX - 79629989 79631163 1175 browser details YourSeq 81 1447 1787 2000 87.8% chr5 + 129100325 129100828 504 browser details YourSeq 75 1471 1838 2000 91.3% chr3 + 108281951 108584251 302301 browser details YourSeq 57 1392 1514 2000 78.0% chr1 - 189216157 189216269 113 browser details YourSeq 53 1425 1993 2000 71.5% chr17 + 26930154 26930636 483 browser details YourSeq 51 1442 1514 2000 85.0% chr3 - 53292674 53292746 73 browser details YourSeq 49 1439 1510 2000 91.7% chr2 + 106131020 106131091 72 browser details YourSeq 48 1440 1517 2000 83.7% chr18 + 36094076 36094149 74 browser details YourSeq 48 1432 1514 2000 79.3% chr10 + 97392595 97392680 86 browser details YourSeq 47 1441 1513 2000 82.2% chr3 - 40507493 40507565 73 browser details YourSeq 47 1441 1514 2000 82.5% chr2 + 52443075 52443152 78 browser details YourSeq 47 1445 1520 2000 77.1% chr10 + 76618191 76618264 74 browser details YourSeq 46 1443 1514 2000 86.0% chr1 - 24287946 24288018 73 browser details YourSeq 46 1471 1572 2000 89.7% chr8 + 70050803 70050907 105 browser details YourSeq 46 1444 1510 2000 92.8% chr10 + 121520294 121520365 72 browser details YourSeq 45 1447 1514 2000 90.0% chr11 + 103680859 103680924 66 browser details YourSeq 45 1447 1514 2000 79.0% chr11 + 83528007 83528067 61 browser details YourSeq 44 1465 1514 2000 94.0% chr4 - 150979366 150979415 50 browser details YourSeq 44 1445 1514 2000 84.5% chr12 + 84033257 84033325 69

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Hdgf heparin binding growth factor [ Mus musculus (house mouse) ] Gene ID: 15191, updated on 10-Oct-2019

Gene summary

Official Symbol Hdgf provided by MGI Official Full Name heparin binding growth factor provided by MGI Primary source MGI:MGI:1194494 See related Ensembl:ENSMUSG00000004897 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI118077; D3Ertd299e Expression Ubiquitous expression in liver E14 (RPKM 92.7), CNS E11.5 (RPKM 90.6) and 28 other tissues See more Orthologs human all

Genomic context

Location: 3 F1; 3 38.78 cM See Hdgf in Genome Data Viewer Exon count: 8

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (87906090..87916132)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (87710243..87720054)

Chromosome 3 - NC_000069.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Hdgf ENSMUSG00000004897

Description heparin binding growth factor [Source:MGI Symbol;Acc:MGI:1194494] Gene Synonyms D3Ertd299e Location Chromosome 3: 87,906,321-87,916,132 forward strand. GRCm38:CM000996.2 About this gene This gene has 6 transcripts (splice variants), 185 orthologues, 4 paralogues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein ID Biotype CCDS UniProt Flags

Hdgf- ENSMUST00000005017.14 2245 237aa ENSMUSP00000005017.8 Protein coding CCDS17457 P51859 TSL:1 201 GENCODE basic APPRIS P1

Hdgf- ENSMUST00000159492.7 933 202aa ENSMUSP00000124803.1 Protein coding - E0CXA0 CDS 3' 202 incomplete TSL:3

Hdgf- ENSMUST00000162631.1 763 84aa ENSMUSP00000123832.1 Nonsense mediated - E0CYW7 TSL:3 206 decay

Hdgf- ENSMUST00000160312.1 582 No - Retained intron - - TSL:2 204 protein

Hdgf- ENSMUST00000160198.1 377 No - lncRNA - - TSL:2 203 protein

Hdgf- ENSMUST00000161616.7 365 No - lncRNA - - TSL:3 205 protein

Page 7 of 9 https://www.alphaknockout.com

29.81 kb Forward strand 87.90Mb 87.91Mb 87.92Mb (Comprehensive set... Hdgf-201 >protein coding Mrpl24-201 >protein coding

Hdgf-202 >protein coding Mrpl24-202 >protein coding

Hdgf-206 >nonsense mediated decay Mrpl24-205 >protein coding

Hdgf-205 >lncRNA Mrpl24-206 >retained intron

Hdgf-203 >lncRNA Mrpl24-204 >protein coding

Hdgf-204 >retained intron Mrpl24-203 >protein coding

Contigs AC158233.2 > Genes < Rrnad1-208nonsense mediated decay (Comprehensive set...

< Rrnad1-211nonsense mediated decay

< Rrnad1-202retained intron

< Rrnad1-212retained intron

< Rrnad1-209retained intron

< Rrnad1-203retained intron

< Rrnad1-204retained intron

< Rrnad1-201protein coding

< Rrnad1-206protein coding

< Rrnad1-207protein coding

< Rrnad1-205lncRNA

< Rrnad1-210lncRNA

Regulatory Build

87.90Mb 87.91Mb 87.92Mb Reverse strand 29.81 kb

Regulation Legend CTCF Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000005017

9.81 kb Forward strand

Hdgf-201 >protein coding

ENSMUSP00000005... MobiDB lite Low complexity (Seg) Superfamily SSF63748 SMART PWWP domain Pfam PWWP domain PROSITE profiles PWWP domain PANTHER PTHR12550:SF41

PTHR12550 Gene3D 2.30.30.140 CDD HDGF-related, PWWP domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend inframe insertion synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 180 200 237

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9