US008148129B2

(12) United States Patent (10) Patent No.: US 8,148,129 B2 Frankel et al. (45) Date of Patent: Apr. 3, 2012

(54) GENERATION OF POTENT DOMINANT 6,824,978 B1 1 1/2004 Cox, III et al. NEGATIVE TRANSCRIPTIONAL 6,933,113 B2 8, 2005 Case et al. 6,979,539 B2 12/2005 Cox, III et al. INHIBITORS 7,013,219 B2 3/2006 Case et al. 7,070,934 B2 7/2006 Cox, III et al. (75) Inventors: Alan Frankel, Mill Valley, CA (US); 7,163,824 B2 1/2007 Cox, III et al. Robert Nakamura, San Francisco, CA 7,220,719 B2 5/2007 Case et al. (US); Chandreyee Das, Brookline, MA 7,235,354 B2 6/2007 Case et al. 7,262,054 B2 8/2007 Jamieson et al. (US); Ivan D’Orso, San Francisco, CA 7,273,923 B2 9/2007 Jamieson et al. (US); Jocelyn Grunwell, San Mateo, 2003, OO82552 A1* 5, 2003 Wolffe et al...... 435/6 CA (US) (73) Assignee: The Regents of the University of OTHER PUBLICATIONS California, Oakland, CA (US) Cramer et al., Coupling of Transcription with Alternative Splicing: RNA Pol II Promoters Modulate SF2. ASF and 9G8 Effects on an (*) Notice: Subject to any disclaimer, the term of this Exonic Splicing Enhancer, Molecular Cell, 1999, 4:251-258.* patent is extended or adjusted under 35 Cama-Carvalho et al., Nucleocytoplasmic shuttling of heterodimeric U.S.C. 154(b) by 806 days. splicing factor U2AF, JBC. Published on Dec. 15, 2000 as Manu script M008759200.* (21) Appl. No.: 11/765,592 Rosonina et al., Expression: The Close Coupling of Transcrip tion and Splicing, Current Biology, vol. 12, R319-R321, Apr. 30, (22) Filed: Jun. 20, 2007 20O2.* Peled-Zehavi et al., 2001, Molecular and Cellular Biology, (65) Prior Publication Data 21(15):5232-5241.* Zou et al., Journal of Biological Chemistry, 2000, 275(9):6051 US 2008/OO96813 A1 Apr. 24, 2008 6054.* Related U.S. Application Data * cited by examiner (60) Provisional application No. 60/817,927, filed on Jun. 30, 2006. Primary Examiner — Zachariah Lucas Assistant Examiner — Nicole Kinsey White (51) Int. Cl. (74) Attorney, Agent, or Firm — Morgan, Lewis & Bockius CI2N 7/00 (2006.01) LLP. Annette S. Parent (52) U.S. Cl...... 435/235.1 (58) Field of Classification Search ...... None See application file for complete search history. (57) ABSTRACT (56) References Cited The present invention provides methods and compositions for regulating using transcription factors linked U.S. PATENT DOCUMENTS to that localize to the transcriptional machinery. 6,599,692 B1 7/2003 Case et al. 6,607,882 B1 8/2003 Cox, III et al. 6,777, 185 B2 8, 2004 Case et al. 21 Claims, 22 Drawing Sheets U.S. Patent Apr. 3, 2012 Sheet 1 of 22 US 8,148,129 B2

U.S. Patent Apr. 3, 2012 Sheet 2 of 22 US 8,148,129 B2

(Je?JOdÐIHVLAlg)

L-IS-L+ºº“AIG-L

d oo co N. c\ v- O C O O (Ol X) ?had

U.S. Patent Apr. 3, 2012 Sheet 4 of 22 US 8,148,129 B2

S S. No3

Št3 2 No 3 2iS Lo S. 3. Šy 2 to

3. SN ŠYS3 2Lo SN É O co w N % O SN co C C co o o C CO KC r CN uOle/NOW OO e/\ee U.S. Patent Apr. 3, 2012 Sheet 5 of 22 US 8,148,129 B2

Fig. 2a U.S. Patent Apr. 3, 2012 Sheet 6 of 22 US 8,148,129 B2

i. ... 2 U.S. Patent Apr. 3, 2012 Sheet 7 of 22 US 8,148,129 B2

P: S5P-CTD RNAP II P: Bot: Cy-GFP Mock input +Ab input +Ab input +Ab

P: Cy-GFP P: Blot: P-TEFb. MOck

Input +Ab input +Ab input +Ab U.S. Patent Apr. 3, 2012 Sheet 8 of 22 US 8,148,129 B2

...x: 83.:- U.S. Patent Apr. 3, 2012 Sheet 9 of 22 US 8,148,129 B2

- xx... -- x- -- •

3 x -- -:- •:-

... x 3: « X -- X

at a * & &&. --

s 3

perceri Eiorigatio: Fig. 3c U.S. Patent Apr. 3, 2012 Sheet 10 of 22 US 8,148,129 B2

15O

1 OO

50

T-ReV - H -H H T-U2AF65-GFP - O O.2 1 U.S. Patent Apr. 3, 2012 Sheet 11 of 22 US 8,148,129 B2

y

is:

six & :: (i. a U.S. Patent Apr. 3, 2012 Sheet 12 of 22 US 8,148,129 B2

w |--^

9

001 009 00|| (u/6u) ved U.S. Patent Apr. 3, 2012 Sheet 13 of 22 US 8,148,129 B2

00/ 009 009 009 U.S. Patent Apr. 3, 2012 Sheet 14 of 22 US 8,148,129 B2

99 Hvžn U.S. Patent Apr. 3, 2012 Sheet 15 of 22 US 8,148,129 B2

(Ol X) ?na U.S. Patent Apr. 3, 2012 Sheet 16 of 22 US 8,148,129 B2

g N S’w× S ^ & 3 & 3& *&^SV ac kDa S. 1, 1, 82 64 - <- T-U2AF65-HA

37 -

19 - 15 - <- T-ReV-HA

37 - 26 - <- GFP

115 - 82 - <-Nucleolin Fig. 6 U.S. Patent Apr. 3, 2012 Sheet 17 of 22 US 8,148,129 B2

3

e

:--- U.S. Patent Apr. 3, 2012 Sheet 18 of 22 US 8,148,129 B2

rig. 8 U.S. Patent Apr. 3, 2012 Sheet 19 of 22 US 8,148,129 B2

C & 3 g S'N 3 < n N - S " - - - - - SN. T COLO g wua Sto H-(S N - , cN it(T. - - - - SS d

R w g So I Eul tr N - H- E" 9 o - Cl ver O) NS n d 9 rR SS < ess 352

H Y Š sc

ŠS N Š i RN SY Š S Š

R- D Š SP a Ss Š H. H. Firsts ; : N

uOSSeudx e/\lee U.S. Patent Apr. 3, 2012 Sheet 21 of 22 US 8,148,129 B2

O O O O d 00/ KO 3 S 3 00|| (u/6u) 72d U.S. Patent Apr. 3, 2012 Sheet 22 of 22 US 8,148,129 B2

O O O O O 00/ CO S. S. 00|| (u/6u) 7ad US 8,148,129 B2 1. 2 GENERATION OF POTENT DOMINANT and, employing the replication machinery of the host cells, NEGATIVE TRANSCRIPTIONAL produces new retroviral particles and advances the infection INHIBITORS to other cells. HIV appears to have a particular affinity for the T4 lymphocyte cell, which plays a vital role in the CROSS-REFERENCES TO RELATED body’s immune system. HIV infection of these white blood APPLICATIONS cells depletes this white cell population. Eventually, the immune system is rendered inoperative and ineffective The present application claims the benefit of U.S. Ser. No. against various opportunistic diseases such as, among others, 60/817,927, filed Jun. 30, 2006, herein incorporated by ref. pneumocystic carini pneumonia, Kaposi's sarcoma, and can erence in its entirety. 10 cer of the lymph system. There are currently a number of antiviral drugs available to STATEMENT AS TO RIGHTS TO INVENTIONS combat the infection. These drugs can be divided into four MADE UNDER FEDERALLY SPONSORED classes based on the viral they target and their mode of RESEARCH ORDEVELOPMENT action. In particular, one class of Such antiviral drugs are 15 competitive inhibitors of the aspartyl protease expressed by This invention was made with government Support under HIV. Other agents are nucleoside reverse transcriptase inhibi Grant Nos. R01 AI29135 and R41CA 103407, awarded by the tors that behave as substrate mimics to halt viral clNA syn National Institutes of Health. The government has certain thesis. A class of non-nucleoside reverse transcriptase inhibi rights in this invention. tors inhibit the synthesis of viral clNA via a non-competitive (or uncompetitive) mechanism. Another class are drugs that BACKGROUND OF THE INVENTION block viral fusion. Used alone, these drugs show effectiveness in reducing viral replication. However, the effects are only The regulation of gene expression by transcription factors temporary as the virus readily develops resistance to all is a fundamental aspect of the physiology of all cells, whether known agents. prokaryotic or eukaryotic. In eukaryotic organisms, for 25 As indicated above, a number of critical points in the HIV instance, a variety of transcription factors govern cell growth, life cycle have been identified as possible targets for antiviral differentiation, and death. The appropriate spatial and tem drugs including (1) the initial attachment of the virion to the poral expression of specific transcription factors governs T4 lymphocyte or macrophage site; (2) the transcription of development. As examples, transcription factors such as viral RNA to viral DNA (reverse transcriptase, RT); and (3) and control progression through the cell cycle; home 30 the processing of gag-pol protein by HIV protease. An addi odomain, paired box, and forkhead transcription factors, tional, potentially attractive therapeutic target is transcription among others, are involved in embryonic development; is of the HIV genome. Transcription of the HIV genome is involved with tumor suppression and cell death; steroid hor essential for replication of the virus after integration of viral mone receptors. Such as sex hormone, glucocorticoid, miner DNA into a host cell . However, attempts to alocorticoid, and thyroid hormone receptors have 35 target HIV transcription have been hampered, in part, by the pleiotrophic effects on various aspects of physiology. fact that transcription of the integrated HIV genome utilizes The aberrant expression of transcription factors can lead to the host cell transcriptional machinery as well as viral tran abnormal development and various disease states. The inap Scription factors. Thus, therapies that attempt to target the propriate expression of proto-oncogenes such as c-Myc transcription of the HIV genome may also interfere with through chromosomal translocation can lead to cancers such 40 transcription of normal host cell . Attempts have been as Burkitt's lymphoma. The formation of a PML-RARa made to target specifically HIV transcription by the genera fusion protein has been shown to be responsible for acute tion of dominant negative forms of Tat, a virally encoded promyelocytic leukemia. Loss of p53 expression results in . However, these dominant forms have increased susceptibility to various cancers. The inappropriate been shown to have poor activity at inhibiting HIV transcrip expression or loss of expression of heart specific transcription 45 tion and viral replication. factors such as Tbx1, Tbx5, NRX2.5, Gata4, Sal4, and Eya4 Effective new methods to target underexploited aspects of have been shown to result in congenital heart defects. the HIV lifecycle, such as transcription of the HIV genome Improved methods for regulating gene expression by would be desirable. modulating transcription factor function would result in more optimal treatment of many diseases. 50 BRIEF SUMMARY OF THE INVENTION One disease which might be approached by modulating transcription factor function is acquired immune deficiency The present application demonstrates that potent dominant syndrome (AIDS). Human immunodeficiency virus (HIV) negative regulators of transcription can be generated by link has been identified as the etiological agent responsible for ing a transcription factor to a protein that localizes to the AIDS, a fatal disease characterized by destruction of the 55 transcriptional machinery. immune system and the inability to fight off life threatening In one embodiment, a methodofregulating transcription of opportunistic infections. Recent statistics indicate that as a gene is provided in which a nucleic acid construct is many as 33 million people worldwide are infected with the expressed in a cell in an amount Sufficient for modulation of virus. In addition to the large number of individuals already transcription, where the construct contains a first nucleic acid infected, the virus continues to spread. Estimates from 1998 60 sequence encoding a transcription factor protein or a frag point to close to 6 million new infections in that year alone. In ment thereoflinked to a second nucleic acid sequence encod the same year there were approximately 2.5 million deaths ing a protein or a fragment thereof that localizes to the tran associated with HIV and AIDS. Scriptional machinery. In various aspects, the transcription HIV is a member of the class of viruses known as retrovi factor protein can be viral transcription factors, nuclear proto ruses. The retroviral genome is composed of RNA, which is 65 oncogene or oncogene proteins, nuclear tumor suppressor converted to DNA by reverse transcription. This retroviral proteins, heart specific transcription factors, and immune sys DNA is then stably integrated into a host cell's chromosome tem transcription factors. In some further aspects, the viral US 8,148,129 B2 3 4 transcription factors can be HIV-Tat, HPV-E2, HPV-E7, ora fragment thereoflinked to a second nucleic acid sequence BPV-E2, Adenovirus IVa2, HSV-1 ICP4, EBNA-LP, EBNA encoding a protein or a fragment thereof that localizes to the 2, EBNA-3A, EBNA-3B, EBNA-3C, BZLF-1, CMV-IE-1, transcriptional machinery. CMV-IE2, HHSV-8 K bZIP, HBV Hbx, Poxvirus Vaccinia, In another embodiment, provided is a method of inhibiting VETF, HCV NS5A, T-Ag, Adenovirus E1A, Herpesvirus replication of an immunodeficiency virus by expressing in a VP16, HTLV Tax, Hepadnavirus X protein, or Baculovirus cella nucleic acid construct in an amount Sufficient for modu AcNPV IE-1. In some further aspects, the nuclear proto lation of viral transcription, in which the construct contains a oncogene or oncogene proteins can be Abl, Myc, Myb, Rel, first nucleic acid sequence encoding a Tat protein or a frag Jun, Fos, Spl. Apl. NF-kB, STAT3 or 5, B-catenin, Notch, ment thereoflinked to a second nucleic acid sequence encod GLI, or PML-RARC. In some further aspects, heart specific 10 ing a protein or a fragment thereof that localizes to the tran transcription factors can be NkX 2, 3, 4, or 5, TBX5, GATA4, Scriptional machinery. 5, or 6, or . In some further aspects, the immune cell In another embodiment, provided is a method of inhibiting specific transcription factor can be Ikaros, PU.1, PAX-5, Oct replication of an immunodeficiency virus by expressing in a 2, or BOB.1/OBF.1. cella nucleic acid construct in an amount Sufficient for modu In various embodiments, the transcription factor can be a 15 lation of viral transcription, in which the construct contains a dominant negative transcription factor, or fragment thereof. first nucleic acid sequence encoding a transcription factor In further embodiments, the transcription factor can be either protein or a fragment thereoflinked to a second nucleic acid a transcriptional activator or repressor. In yet further embodi sequence encoding a splicing factor or a fragment thereof. ments, the transcription factor can be an activation domain In another embodiment, provided is a method of treating a (AD) fragment of the transcription factor. In yet further Subject infected with an immunodeficiency virus by admin embodiments, the transcription factor can be Tat or an acti istering a nucleic acid construct in an amount Sufficient for Vation domain fragment or other fragment of Tat. inhibition of viral transcription, in which the construct con In some embodiments, the protein or a fragment thereof tains a first nucleic acid sequence encoding a transcription that localizes to the transcriptional machinery is a protein factor or a fragment thereof linked to a second nucleic acid with nuclear localization, a component of the transcriptional 25 sequence encoding a protein or a fragment thereofthat local machinery, or a protein that functions in co-transcriptional izes to the transcriptional machinery. In some aspects, the processing of RNA. In some aspects, the protein that func treating is with a protein of the embodiments above. tions in co-transcriptional processing of RNA is a capping In another embodiment, provided is a method of inhibiting factor, a splicing factor, a polyadenylation factor, an RNA transcription of a HIV genome in a cell by expressing in the export factor, or a translation factor. In some aspects, the 30 cell a nucleic acid construct in an amount Sufficient for inhi splicing factor is an RS domain containing protein. In yet bition of the transcription of the HIV genome, in which the other aspects, the splicing factoris SF1, U2AF65, or 9G8, and construct contains a first nucleic acid sequence encoding a Tat the polyadenylation factor is CstF1. protein or a fragment thereoflinked to a second nucleic acid In some embodiments, the modulation of transcription is sequence encoding a U2AF65 protein or a fragment thereof. inhibition of transcription by at least 25%, or at least 50%, or 35 In another embodiment, provided is a method of treating a at least 75%, or at least 95%. In some aspects, the modulation Subject with cancer by expressing in the Subject a nucleic acid of transcription is by inhibition of transcriptional initiation, or construct in an amount Sufficient for modulation of transcrip elongation, or termination. In some embodiments, the modu tion, in which the construct contains a first nucleic acid lation of transcription is activation of transcription. sequence encoding a transcription factor protein or a frag In some embodiments, the cell is a T-cell infected with an 40 ment thereoflinked to a second nucleic acid sequence encod immunodeficiency virus that can be HIV. FIV, SIV, or BIV. In ing a protein or a fragment thereof that localizes to the tran yet further embodiments, the cell is a cancer cell, heart cell, or Scriptional machinery. In some aspects, the treating is with a immune system cell. In some aspects, the cancer cell is a protein of the embodiments above. carcinoma, sarcoma, adenocarcinoma, lymphoma, leukemia, In another embodiment, provided is a method of treating or or solid tumors of the kidney, breast, lung, bladder, colon, 45 preventing a disease in a Subject by expressing in the Subject ovarian, prostate, pancreas, stomach, brain, head and neck, a nucleic acid construct in an amount Sufficient for modula skin, uterine, testicular, glioma, esophagus, or liver. In some tion of transcription, in which the construct contains a first aspects, the immune system cell can be a B-cell, T-cell, mac nucleic acid sequence encoding a transcription factor protein rophage, or dendritic cell. ora fragment thereoflinked to a second nucleic acid sequence Also included as embodiments are vectors and cells con 50 encoding a protein or a fragment thereof that localizes to the taining the nucleic acids of the embodiments above, as well transcriptional machinery, where the disease is viral infec as, the proteins encoded by these nucleic acids. In further tion, cancer, heart disease, and inflammation. aspects, a composition comprising the nucleic acid construct In another embodiment, provided is a method of validating or protein of the above embodiments and a physiologically a target by expressing a nucleic acid construct in a cell in an acceptable carrier is provided. 55 amount Sufficient for modulation of transcription of the gene In yet further embodiments, a method of regulating tran for the target, in which the construct contains a first nucleic Scription of a gene is provided by expressing a nucleic acid acid sequence encoding a transcription factor protein or a construct in a cell in an amount Sufficient for modulation of fragment thereof linked to a second nucleic acid sequence transcription, in which the construct contains a first nucleic encoding a protein or a fragment thereof that localizes to the acid sequence encoding a transcription factor protein or a 60 transcriptional machinery, where altered expression of the fragment thereof linked to a second nucleic acid sequence gene for the target provides target confirmation. encoding a splicing factor or a fragment thereof. In still further embodiments, a method of inhibiting repli DESCRIPTION OF THE DRAWINGS cation of an immunodeficiency virus by expressing a nucleic acid construct in a cell in an amount Sufficient for modulation 65 FIG. 1 shows a potent dominant negative Tat inhibitor of viral transcription, in which the construct contains a first identified in a reporter assay. a, Left, Schematic of a dual nucleic acid sequence encoding a transcription factor protein reporter fluorescence assay in which T-BIV (HIV Tat US 8,148,129 B2 5 6 with the BIV TatRBD) is used to activate an HIV LTR-DsRed with the HA- or GFP-tagged proteins indicated (panels 2-5) reporter engineered with BIV TAR RNA in place of HIV or untransfected cells (panel 1), using antibodies directed TAR. The T-SF1 fusion protein is used to activate an HIV against HA, GFP, or RNAP II and monitoring the Pd and Pd LTR-GFP reporter engineered with a BPS RNA site. Right, regions. Mock lanes used normal rabbit IgG for the IP as a HeLa cells were co-transfected with both reporters and T-SF1 specificity control, and input refers to PCR reactions from or T-BIV expressors as indicated and Sorted by flow isolated samples prior to the IP. e. Promoter-spe cytometry. Expression of GFP is shown in green and DsRed cific recruitment of TU2AF65. ChIP assays were carried out in red. Numbers in each quadrant represent fold activation, in HeLa LTR-RRE-IIB-FFL cells transfected with calculated as the number of cells in the quadrant multiplied by T-U2AF65 or a T-NLS control using primers for HIV. gapdh. their average fluorescence, relative to the same values calcu 10 hsp70, p21/CIP. HLA-DRA and cad promoters. Known tran lated for the reporters alone.b, Dose responsecurves of T-SF1 Scription factors that activate each promoter are indicated in activation on an LTR-HTAR-FFL reporter and T-SF1-medi parentheses. The percent of input DNA is shown for each ated inhibition of T-BIV activity on a LTR-BTAR-RL individual ChIP experiment, and the amount of DNA used in reporter. c. Potent inhibition by T-U2AF65 is independent of the GFP lane is twice that for RNAP II. the RNA-protein interaction. Left, dose response curves 15 FIG. 4 shows expression of the Tat dominant negative showing inhibition of T-BIV-mediated activation of a blocks HIV replication and generates a latency-like state. Sup BIV TAR reporter by Tat, and TU2AF65. Right, dose T1 cells stably expressing the Tat domains or fusion proteins response curves showing inhibition of T-Rev-mediated acti indicated were infected with either HIV Tat-TAR-dependent vation of a RREIIB reporter by Tat, and T-U2AF65. The (a) or BIV Tat-TAR-dependent (b) viruses (18) at an m.o. i. of arrows indicate stoichiometric DNA concentrations of inhibi 1 and the kinetics of p24 antigen expression were monitored tor and activator (5 ng). d, Promoter specificity of T-U2AF65. by ELISA. Viruses emerging from the inhibitor-expressing HeLa cells were transiently transfected with reporter, activa cell lines were harvested at day 30 (arrows) and used to tor, and several concentrations of T-U2AF65 plasmids at the re-infect the same cell lines from which they were derived, ratios indicated. For the heat shock response, endogenous and identical replication rates were observed. HSF1 was activated 24 hr post-transfection cells with 50 LM 25 FIG. 5. Tat RBD is dispensable for dominant negative AsNO2 for 12 hr. p53 activity was measured on SAOS2 cells. activity. a, Dose response curves showing inhibition of BIV Activities of all activators were normalized to a cotransfected Tat-TAR-mediated activation by Tat Tat, TU2AF65 and CMV-RL reporter control. T-HIV-U2AF65. The arrow indicates the position of sto FIG. 2 shows contributions of subcellular localization and ichiometric DNA concentrations (5 ng) of inhibitor and acti protein domains to dominant negative activity. a, HeLa cells 30 vator. b. Dose response curves showing inhibition of HIV were transiently co-transfected with an LTR-RREIIB-FFL Tat-TAR-mediated activation by Tat T-BIV reporter plasmid, T-Rev activator, and various inhibitors at T-U2AF65 and T-BIV-U2AF65. 1:0.25 (grey bar) or 1: 1 (black bar) ratios of activator to FIG. 6 shows relative expression levels of Tatactivator and inhibitor. Activation levels are plotted relative to T-Rev with dominant negative. HeLa cells were co-transfected with HA out inhibitor, and confocal images of each GFP-tagged 35 tagged versions of the T-Rev activator and/or the T-U2AF65 inhibitor are shown below the plot, including 3x magnifica inhibitor along with a GFP-expresor to normalize for trans tion images (of boxed cells above) to highlight the subcellular fection efficiency. Nuclear extracts were probed for expres compartments. T-NLS contains the 8 amino acid NLS of sion levels with an anti-HA antibody, an anti-GFP antibody, SV40 T-Ag (PPKKKRKV) (SEQ ID NO 1), b, Relative and an anti-C23 nucleolin antibody to provide a protein load activities of T-U2AF65 RS domain and Tat AD variants, as 40 ing control. determined in panel a, with corresponding confocal images. FIG. 7 shows inhibition activities of other T-fusions. HeLa T-U2AF65ARS tagged with HA contains a deletion of the cells were co-transfected with an HIV LTR-RREIIB-FFL first 90 amino acids of U2AF65 and T-RS contains only reporter plasmid along with the T-Rev activator in the absence residues 2-73 of U2AF65. K41A denotes a Tat AD mutation or presence of the N-terminal T-fusions at sub-stoichiometric that abolishes interactions with cyclin T1 ... Confocal images 45 1:0.25 (black bars) or stoichiometric 1:1 (gray bars) activator: of each HA-tagged inhibitor are shown below the plot, includ inhibitor ratios. The data shown is normalized to activation by ing 3x magnification images (of boxed cells above) to high T-Rev alone. Nuclear DAPI staining and indirect immunof light the Subcellular compartments. luorescence confocal images of the activator and each Tat FIG. 3 shows recruitment of the dominant negative to the fusion protein are shown above, using an anti-Tatantibody HIV promoter via RNAP II blocks transcription elongation.a. 50 and Alexa-488 or Alexa-546 coupled anti-mouse antibodies. T-U2AF65 interacts with RNAP II and P-TEFb. GFP-tagged FIG. 8 shows subnuclear localization of U2AF65, T-U2AF65, T(K41A)-U2AF65 and T-NLS proteins were T-U2AF65, and variants. HeLa cells were transfected with immunoprecipitated from cell extracts and analyzed by West pEGFP-N3 plasmids expressing GFP fused to: U2AF65, ern blot using the indicated antibodies.b, TU2AF65 colocal T-U2AF65 (active dominant negative), T(K41A)-U2AF65 izes with RNAP II and SC35s. Following HeLa cell trans 55 (inactive dominant negative), RS (U2AF65RS domain only), fection, GFP-tagged TU2AF65 was visualized by confocal T-RS (active dominant negative), T(K41A)-RS (inactive microscopy along with immunostained RNAP II and SC35. c. dominant negative), U2AF65ARS, and T-U2AF65ARS. T-U2AF65 blocks transcription elongation. Cells were trans FIG. 9 shows promoter-specificity of the dominant nega fected with Tator T-U2AF65 as indicated, and RNase protec tive. a, Characterization of the SupT1 cell lines by luciferase tion was performed with a promoter proximal (Pp) probe 60 reporter assays. The indicated SupT1 cell lines were co-trans directed to the LTR and a promoter distal (Pd) probe directed fected with an appropriate activator and reporter pairs as to the FFLORF to quantify transcription rates in these regions shown. b. Total RNA was extracted from SupT1-Tat, and of the LTR-HTAR-FFL reporter. d, Recruitment of RNAP II SupT1-T-U2AF65 stable cell populations and relative mRNA and TU2AF65 to the HIV promoter. Left, activation and levels of the nine genes shown were quantitated; B- inhibition levels of a HeLa LTR-RREIIB-FFL reporter cell 65 (actin), glyceraldehyde-3-phosphate dehydrogenase line used for ChIP assays, with the ratio of inhibitor to acti (GAPDH), eukaryotic translation elongation factor 1 gamma vation indicated. Right, ChIP assays from cells transfected (EEF1G), heterogeneous nuclear ribonucleoprotein A 1 US 8,148,129 B2 7 8 (hnRNPA1), TATA box binding protein (TBP), hypoxanthine particular, when Tat or Tat AD is fused to the splicing factors, phosphoribosyltransferase 1 (HPRT1), HLA-DQA1 major SF1 or U2AF65, a potent dominant negative effect is histocompatibility complex, class II, (MHCII), Interleukin 8 observed. While one embodiment of this invention as (IL-8), and androgen (AR). Like HIV, the IL-8, described below in the Examples relates to the inhibition of , and HLA-DQA1 promoters require HIV transcription and viral replication, it will be clear to the PTEF-b. skilled artisan that the methods of the present invention can be FIG. 10 shows re-infection of dominant negative-express used to generate dominant negative forms of other transcrip ing cells with slowly-replicating viruses shows the same tion factors and other classes of proteins. growth kinetics as the initial infection. SupT1 cells express Dominant Negative Tat ing Tat, and T-U2AF65 were re-infected with viral stocks 10 Immediately after HIV infects a cell, the viral RNA is harvested from day 30 of the first set of infections (see arrows copied into DNA, and the proviral genome is transported to in FIGS. 4a and 4b). a, Re-infection using the HIV Tat-TAR dependent virus. b. Re-infection using the BIV Tat-TAR the nucleus where it is integrated into the host genome. Once dependent virus. integrated into the host chromosome, the HIV proviral 15 genome is Subject to regulation by a variety of cellular tran DETAILED DESCRIPTION OF THE INVENTION Scription factors, as well as, by virally encoded factors. Among these virally encoded factors, the trans-activator pro Introduction tein (Tat) provides the primary control of HIV transcription. Transcription of the HIV genome begins at the viral LTR The gene product of a dominant negative mutation inter when the host cell RNA polymerase complex binds to the feres with the function of a normal, wild-type gene product HIV promoter. The HIV LTR, however, is a poor promoter in within the same cell. This usually occurs if the gene product the absence of Tat. In the absence of Tat, only non-processive of the dominant negative mutation can still interact with the (basal) transcription of the HIV genome is observed. How same elements as the wild-type product, but blocks some ever, upon recruitment of Tatto the transcriptional complex at aspect of the wild-type protein’s function. As an example, in 25 the promoter, transcription of the HIV genome is greatly the case of multi-subunit protein complexes, an inactive stimulated. Recruitment of Tat to the HIV promoter is medi dominant negative protein can bind to wild-type components ated at least in part by the binding of Tat to a short RNA of the complex rendering the resulting complex less active or sequence that forms a stem-loop, termed the transactivation inactive. Genetic engineering has allowed the construction of responsive region (TAR), which lies just downstream of the dominant negative forms of many different types of proteins. 30 initiation site for transcription. Transcription of TAR by the In the case of transcription factors, one approach has been to basal transcriptional machinery to form the TAR RNA stem generate transcription factors that lack a gene activation loop allows Tat to join the complex and stimulate transcrip domain but which retain a DNA binding domain. When tion. Upon binding of Tat, it is believed that other cellular expressed in cells, such dominant negative proteins are able to factors are recruited to the transcriptional complex that con bind to their cognate DNA recognition sites thus preventing 35 Vert the complex into a form that is competent for processive the binding of a wild type transcription factor and leading to transcript elongation. reduced expression of a target gene. However, typically, for In one embodiment of this invention, the inventors have dominant negative inhibition to occur, a great excess of domi made a fusion of the Tat protein or a fragment thereof. Such as nant negative protein must be expressed in order to effectively the Tatactivation domain (Tat AD), to proteins that localize to out compete the wild-type protein. 40 the transcriptional machinery. When Tat or Tat AD is fused to A dominant negative approach has previously been used in splicing factors, such as, SF1 or U2AF65, a potent dominant an attempt to inhibit transcription of the HIV genome and thus negative effect is observed. Without limiting themselves to viral replication. When a truncated form of Tat, lacking the any particular mechanism of action, and as explained below basic domain, was tested in transient co-transfection experi in greater detail, the inventors have found that the fused ments, it was found that an 8-30 fold molar excess of the 45 splicing factor proteins act as tethering domains, directing the dominant negative Tat over wild-type Tat was required to Tat fusion protein to RNA polymerase at the HIV-1 promoter inhibit the expression of a reporter gene under the control of thus blocking the activity of incoming wild-type Tat proteins. the HIV-LTR This results in a high local concentration of the inhibiting The inventors have devised a new method of generating fusion protein at the site of action. potent dominant negative transcriptional inhibitors for phar 50 maceutical treatment of diseases, gene therapy, target valida DEFINITIONS tion, disease diagnosis, and mechanistic studies of transcrip tion, among other applications. As discussed above, As used herein, the following terms have the meanings previously described dominant negative transcription factors ascribed to them unless specified otherwise. typically act by competing with other interacting factors or by 55 A "dominant negative' gene product or protein is one that creating defective oligomers, thus requiring a large excess of interferes with the function of another gene product or pro inhibitor while providing only a modest amount of inhibition. tein. The other gene product affected can be the same or The inventors have discovered that linking a protein which different from the dominant negative protein. Dominant localizes to the transcriptional machinery to a transcription negative gene products can be of many forms, including trun factor can effectively target and generate high local concen 60 cations, full length proteins with point mutations or fragments trations of a dominant negative protein, thereby efficiently thereof, or fusions of full length wild type or mutant proteins out-competing wild-type protein when expressed at Stoichio or fragments thereof with other proteins. The level of inhibi metric amounts. In particular, the inventors have made the tion observed can be very low. For example, it may require a unexpected finding that fusion of the Tat protein or a fragment large excess of the dominant negative protein compared to the thereof, such as the Tat activation domain (Tat AD), to a 65 functional protein or proteins involved in a process in order to protein that localizes to the transcriptional machinery, results see an effect. It may be difficult to see effects under normal in a potent inhibitor of transcription of the HIV genome. In biological assay conditions. US 8,148,129 B2 9 10 A “transcription factor is a protein that regulates tran brought into the RNA polymerase complex and can be exem scription. Transcription factors may bind directly to DNA or plified by the order in which the TAFs (TBP Associated RNA or may interact with the transcriptional machinery via Factors) attach to form a polymerase complex on a promoter. protein-protein interactions with no direct nucleic acid con TBP (TATA Binding Protein) and an attached complex of tact to modulate transcription. Transcription factors in gen TAFs, collectively known as TFIID (Transcription Factor for eral are reviewed in Barnes and Adcock, Clin. Exp. Allergy 25 polymerase II D), bind at the TATA box, although not all Suppl. 2:46-9 (1995), Roeder, Methods Enzymol. 273: 165 promoters have the TATA box. TFIIA (three subunits) binds 71 (1996), and Brivanlou and Darnell, Science 1 Feb. 2002: TFIID and DNA, stabilizing the first interactions. TFIIB 813-818 (2002), among other sources. binds between TFIID and the location of Pol II binding in the A "promoter is defined as an array of nucleic acid control 10 sequences that direct transcription. As used herein, a pro near future. TFIIB binds partially sequence specifically, with moter typically includes necessary nucleic acid sequences some preference for BRE. TFIIF and Pol II (two subunits, near the start site of transcription, Such as, in the case of RAP30 and RAP74, showing some similarity to bacterial certain RNA polymerase II type promoters, a TATA element, sigma factors) enter the complex together. TFIIF helps to enhancer, CCAAT box, SP-1 site, etc. As used herein, a pro 15 speed up the polymerization process. TFIIE enters the com moter also optionally includes distal enhancer or repressor plex, and helps to open and close the PolII's Jaw like struc elements, which can be located as much as several thousand ture, which enables movement down the DNA strand. TFIIE base pairs from the start site of transcription. The promoters and TFIIH enter concomitantly. Finally TFIIH binds. TFIIH often have an element that is responsive to transactivation by is a large protein complex that contains among others the a DNA-binding moiety Such as a polypeptide, e.g., a nuclear CDK7/cyclin H kinase complex and a DNA helicase. TFIIH receptor, Gal4, the lac repressor and the like. has three functions: it binds specifically to the template strand A “target site' is the nucleic acid sequence recognized by a to ensure that the correct strand of DNA is transcribed and transcription factor protein. A single target site typically has melts or unwinds the DNA (ATP dependently) to separate the about four to about ten or more base pairs. The target site is in two strands using its Helicase activity. It has a kinase activity any position that allows regulation of gene expression, e.g., 25 that phosphorylates the C-terminal domain (CTD) of Pol II at adjacent to, up- or downstream of the transcription initiation the amino acid serine. This switches the RNA polymerase to site; proximal to an enhancer or other transcriptional regula start producing RNA, which marks the end of initiation and tion element such as a repressor (e.g., SP-1 binding sites, the start of elongation. Finally it is essential for Nucleotide hypoxia response elements, recognition ele Excision Repair (NER) of damaged DNA. TFIIH and TFIIE ments, p53 binding sites, etc.), RNA polymerase pause sites: 30 strongly interact with one another. TFIIE affects TFIIH’s and intron/exon boundaries. catalytic activity. Without TFIIE, TFIIH will not unwind the "Linking” or “fusing as used in this application refers to promoter. Mediator then encases all the transcription factors entities that are directly linked, or linked via an amino acid and the Pol II. Mediator interacts with enhancers, areas very linker, the size and composition of which can vary, or linked far away (upstream or downstream) that help regulate tran via a chemical linker. 35 Scription. The term “transcriptional machinery' generally refers to A “protein that localizes to the transcriptional machinery' the complex of cellular components responsible for making is one that is capable of associating or interacting with the RNA from a DNA template and related co-transcriptional transcriptional machinery as described above or a component RNA processing. The complex responsible for transcription thereof. The association or interaction may be non-covalent in a cell is referred to as RNA polymerase. During transcrip 40 or covalent and may be reversible or non-reversible. tion, a variety of factors join the RNA polymerase complex to Examples of proteins that localize to the transcriptional effect various aspects of transcription and co-transcriptional machinery include nuclear localized proteins, RNA process RNA processing as described below. In eukaryotic cells, three ing proteins, components of the transcriptional machinery, forms of RNA polymerase exist, termed RNA polymerases I, and proteins involved in co-transcriptional processes. Among II, and III. RNA polymerase I synthesizes a pre-rRNA 45S, 45 the co-transcriptional processes that are Subjects of the inven which matures into 28 S, 18S and 5, 8 S rRNAs which form tion are capping, splicing, polyadenylation, RNA export, the major RNA portions of the ribosome. RNA polymerase II translation. synthesizes precursors of mRNAs and most snRNA. Because An RS domain containing protein (also referred to in the of the large variety of cellular genes are transcribed by thus literature as an SR protein) is a protein with a domain that polymerase, RNAP II is subject to the highest level of control, 50 contains multiple arginine and serine di-peptides (single-let requiring a wide range of transcription factors depending on ter code RS) and/or serine and arginine di-peptides (single the promoter. RNA polymerase III is responsible for the syn letter code SR). RS domains are found in a number of cellular thesis oftBNAs, rRNA 5S and other small RNAs found in the proteins, particularly those involved with pre-mRNA splicing nucleus and cytosol. Additionally, other RNA polymerase and RNA processing events. types are found in mitochondria and chloroplasts. 55 A “transcriptional activator' and a “transcriptional repres A 550 kDa complex of 12 subunits, RNAP II is the most sor refer to proteins or effector domains of proteins that have intensively studied type of RNA polymerase. A wide range of the ability to modulate transcription, by binding directly to transcription factors are required for it to bind to its promoters DNA or RNA or by interacting with the transcriptional and to begin transcription. In the process of transcription, machinery via protein-protein interactions with no direct there are three main stages: (1) initiation, which requires 60 nucleic acid contact. Such proteins include, e.g., transcription construction of the RNA polymerase complex on the gene's factors and co-factors (e.g., KRAB, MAD, ERD, SID, promoter; (2) elongation, during which the RNA transcript is nuclear factor kappa B subunit p65, early growth response made from the DNA template; (3) and termination, the step at factor 1, and nuclear hormone receptors, VP16, VP64), endo which the formation of the RNA transcript is completed and nucleases, integrases, recombinases, methyltransferases, his disassembly of the RNA polymerase complex occurs. 65 tone acetyltransferases, histone deacetylases etc. Activators The components of the transcriptional machinery that may and repressors include co-activators and co-repressors (see, be targeted by this invention comprise any factor that is e.g., Utley et al., Nature 394:498-502 (1998)). US 8,148,129 B2 11 12 The terms “modulating transcription’ “inhibiting tran nucleic acid sequence). See, e.g., Ausubel, Supra, for an intro Scription' and “activating transcription of a gene refer to the duction to recombinant techniques. ability of a dominant negative to activate or inhibit transcrip The term “recombinant when used with reference, e.g., to tion of a gene. Activation includes prevention of transcrip a cell, or nucleic acid, protein, or vector, indicates that the cell, tional inhibition (i.e., prevention of repression of gene expres nucleic acid, protein or vector, has been modified by the sion) and inhibition includes prevention of transcriptional introduction of a heterologous nucleic acid or protein or the activation (i.e., prevention of gene activation). alteration of a native nucleic acid or protein, or that the cell is Modulation can be assayed by determining any parameter derived from a cell so modified. Thus, for example, recombi that is indirectly or directly affected by the expression of the nant cells express genes that are not found within the native 10 (naturally occurring) form of the cellor express a second copy target gene. Such parameters include, e.g., changes in RNA or of a native gene that is otherwise normally or abnormally protein levels, changes in protein activity, changes in product expressed, under expressed or not expressed at all. levels, changes in downstream gene expression, changes in An 'expression vector is a nucleic acid construct, gener reporter gene transcription (luciferase, CAT, 3-galactosidase, ated recombinantly or synthetically, with a series of specified B-glucuronidase, GFP (see, e.g., Mistili & Spector, Nature 15 nucleic acid elements that permit transcription of a particular Biotechnology 15:961-964 (1997)); changes in signal trans nucleic acid in a host cell, and optionally integration or rep duction, phosphorylation and dephosphorylation, receptor lication of the expression vector in a host cell. The expression ligand interactions, second messenger concentrations (e.g., vector can be part of a plasmid, virus, or nucleic acid frag cGMP. cAMP, IP3, and Ca"), cell growth, and neovascular ment, of viral or non-viral origin. Typically, the expression ization. These assays can be in vitro, in Vivo, and ex vivo. vector includes an “expression cassette, which comprises a Such functional effects can be measured by any means known nucleic acid to be transcribed operably linked to a promoter. to those skilled in the art, e.g., measurement of RNA or The term expression vector also encompasses naked DNA protein levels, measurement of RNA stability, identification operably linked to a promoter. of downstream or reporter gene expression, e.g., via chemi By “host cell' is meant a cell that contains an expression luminescence, fluorescence, colorimetric reactions, antibody 25 vector or nucleic acid encoding a dominant negative protein binding, inducible markers, ligand binding assays; changes in of the invention. The host cell typically supports the replica intracellular second messengers such as cCMP and inositol tion or expression of the expression vector. Host cells may be triphosphate (IP3); changes in intracellular calcium levels: prokaryotic cells such as E. coli, or eukaryotic cells such as cytokine release, and the like. yeast, fungal, protozoal, higher plant, insect, or amphibian To determine the level of gene expression modulation by a 30 cells, or mammalian cells such as CHO, HeLa, 293, COS-1, dominant negative construct, cells contacted with nucleic and the like, e.g., cultured cells (in vitro), explants and pri acids encoding dominant negative or dominant negative pro mary cultures (in vitro and ex vivo), and cells in vivo. teins are compared to control cells which have not received “Nucleic acid refers to deoxyribonucleotides or ribo this treatment. Control samples are assigned a relative gene nucleotides and polymers thereof in either single- or double expression activity value of 100%. Modulation/inhibition of 35 Stranded form. The term encompasses nucleic acids contain gene expression is achieved when the gene expression activ ing known nucleotide analogs or modified backbone residues ity value relative to the control is about 80%, preferably 50% or linkages, which are synthetic, naturally occurring, and (i.e., 0.5x the activity of the control), more preferably 25%, non-naturally occurring, which have similar binding proper more preferably 5-0%. Modulation/activation of gene expres ties as the reference nucleic acid, and which are metabolized sion is achieved when the gene expression activity value 40 in a manner similar to the reference nucleotides. Examples of relative to the control is 110%, more preferably 150% (i.e., Such analogs include, without limitation, phosphorothioates, 1.5x the activity of the control), more preferably 200-500%, phosphoramidates, methyl phosphonates, chiral-methyl more preferably 1000-2000% or more. phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic The term "heterologous is a relative term, which when acids (PNAs). used with reference to portions of a nucleic acid indicates that 45 Unless otherwise indicated, a particular nucleic acid the nucleic acid comprises two or more Subsequences that are sequence also implicitly encompasses conservatively modi not found in the same relationship to each other in nature. For fied variants thereof (e.g., degenerate codon Substitutions) instance, a nucleic acid that is recombinantly produced typi and complementary sequences, as well as the sequence cally has two or more sequences from unrelated genes Syn explicitly indicated. Specifically, degenerate codon Substitu thetically arranged to make a new functional nucleic acid, 50 tions may be achieved by generating sequences in which the e.g., a promoter from one source and a coding region from third position of one or more selected (or all) codons is sub another source. The two nucleic acids are thus heterologous stituted with mixed-base and/or deoxyinosine residues to each other in this context. When added to a cell, the recom (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et binant nucleic acids would also be heterologous to the endog al., J. Biol. Chem. 260:2605-2608 (1985); Rossolini et al., enous genes of the cell. Thus, in a chromosome, a heterolo 55 Mol. Cell. Probes 8:91-98 (1994)). The term nucleic acid is gous nucleic acid would include annon-native (non-naturally used interchangeably with gene, cDNA, mRNA, oligonucle occurring) nucleic acid that has integrated into the chromo otide, and polynucleotide. Some, or a non-native (non-naturally occurring) extrachro The terms “polypeptide.” “peptide' and “protein’ are used mosomal nucleic acid. In contrast, a naturally translocated interchangeably herein to refer to a polymer of amino acid piece of chromosome would not be considered heterologous 60 residues. The terms also apply to amino acid polymers in in the context of this patent application, as it comprises an which one or more amino acid residues is an artificial chemi endogenous nucleic acid sequence that is native to the cal mimetic of a corresponding naturally occurring amino mutated cell. acid, as well as to naturally occurring amino acid polymers Similarly, a heterologous protein indicates that the protein and non-naturally occurring amino acid polymer. comprises two or more Subsequences that are not found in the 65 The term “amino acid refers to naturally occurring and same relationship to each other in nature (e.g., a “fusion synthetic amino acids, as well as amino acid analogs and protein, where the two Subsequences are encoded by a single amino acid mimetics that function in a manner similar to the US 8,148,129 B2 13 14 naturally occurring amino acids. Naturally occurring amino 7) Serine (S), Threonine (T); and acids are those encoded by the genetic code, as well as those 8) Cysteine (C), Methionine (M) amino acids that are later modified, e.g., hydroxyproline, (see, e.g., Creighton, Proteins (1984)). Y-carboxyglutamate, and O-phosphoserine. Amino acid ana The term “substantially identical indicates that two or logs refers to compounds that have the same basic chemical more nucleotide sequences share a majority of their sequence. structure as a naturally occurring amino acid, i.e., an O. carbon Generally, this will be at least about 90% of their sequence that is bound to a hydrogen, a carboxyl group, an amino and preferably about 95% of their sequence. Another indica group, and an R group, e.g., homoserine, norleucine, tion that sequences are substantially identical is if they methionine sulfoxide, methionine methyl sulfonium. Such hybridize to the same nucleotide sequence under stringent 10 conditions (see, e.g., Sambrook and Russell, eds, Molecular analogs have modified R groups (e.g., norleucine) or modi Cloning: A Laboratory Manual, 3rd Ed, vols. 1-3, Cold fied peptide backbones, but retain the same basic chemical Spring Harbor Laboratory Press, 2001; and Current Proto structure as a naturally occurring amino acid. Amino acid cols in Molecular Biology, Ausubel, ed. John Wiley & Sons, mimetics refers to chemical compounds that have a structure Inc. New York, 1997). Stringent conditions are sequence that is different from the general chemical structure of an 15 dependent and will be different in different circumstances. amino acid, but that functions in a manner similar to a natu Generally, stringent conditions are selected to be about 5°C. rally occurring amino acid. (or less) lower than the thermal melting point (Tm) for the Amino acids may be referred to herein by either their specific sequence at a defined ionic strength and pH. The T. commonly known three letter symbols or by the one-letter of a DNA duplex is defined as the temperature at which 50% symbols recommended by the IUPAC-IUB Biochemical of the nucleotides are paired and corresponds to the midpoint Nomenclature Commission. Nucleotides, likewise, may be of the spectroscopic hyperchromic absorbance shift during referred to by their commonly accepted single-letter codes. DNA melting. The T indicates the transition from double “Conservatively modified variants' applies to both amino helical to random coil. acid and nucleic acid sequences. With respect to particular Typically, stringent conditions will be those in which the nucleic acid sequences, conservatively modified variants 25 salt concentration is about 0.2xSSC at pH 7 and the tempera refers to those nucleic acids which encode identical or essen ture is at least about 60° C. For example, a nucleic acid of the tially identical amino acid sequences, or where the nucleic invention or fragment thereof can be identified in standard acid does not encode an amino acid sequence, to essentially filter hybridizations using the nucleic acids disclosed here identical sequences. Because of the degeneracy of the genetic under stringent conditions, which for purposes of this disclo 30 sure, include at least one wash (usually 2) in 0.2xSSC at a code, a large number of functionally identical nucleic acids temperature of at least about 60° C., usually about 65° C. encode any given protein. For instance, the codons GCA, sometimes 70° C. for 20 minutes, or equivalent conditions. GCC, GCG and GCU all encode the amino acid alanine. For PCR, an annealing temperature of about 5°C. below Tm, Thus, at every position where an alanine is specified by a is typical for low stringency amplification, although anneal codon, the codon can be altered to any of the corresponding 35 ing temperatures may vary between about 32°C. and 72°C., codons described without altering the encoded polypeptide. e.g., 40° C., 42°C., 45° C., 52° C., 55° C. 57°C., or 62° C., Such nucleic acid variations are “silent variations, which are depending on primer length and nucleotide composition or one species of conservatively modified variations. Every high Stringency PCR amplification, a temperature at, or nucleic acid sequence herein which encodes a polypeptide slightly (up to 5°C.) above, primer Tm is typical, although also describes every possible silent variation of the nucleic 40 high Stringency annealing temperatures can range from about acid. One of skill will recognize that each codon in a nucleic 50° C. to about 72°C., and are often 72°C., depending on the acid (except AUG, which is ordinarily the only codon for primer and buffer conditions (Ahsen et al., Clin Chem. methionine, and TGG, which is ordinarily the only codon for 47: 1956-61, 2001). Typical cycle conditions for both high tryptophan) can be modified to yield a functionally identical and low stringency amplifications include a denaturation molecule. Accordingly, each silent variation of a nucleic acid 45 phase of 90°C.-95°C. for 30 sec-2 min., an annealing phase which encodes a polypeptide is implicit in each described lasting 30 sec.-10 min., and an extension phase of about 72° Sequence. C. for 1-15 min. As to amino acid sequences, one of skill will recognize that The terms “identical' or percent “identity, in the context individual Substitutions, deletions or additions to a nucleic of two or more nucleic acids, refer to two or more sequences acid, peptide, polypeptide, or protein sequence which alters, 50 or Subsequences that are the same or have a specified percent adds or deletes a single amino acid or a small percentage of age of nucleotides that are the same (i.e., at least 70% identity, amino acids in the encoded sequence is a “conservatively preferably at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, modified variant' where the alteration results in the substitu 98%, or 99% identity, over a specified region, when compared tion of an amino acid with a chemically similar amino acid. and aligned for maximum correspondence over a comparison Conservative substitution tables providing functionally simi 55 window, or designated region as measured using a BLAST or laramino acids are well known in the art. Such conservatively BLAST 2.0 sequence comparison algorithms with default modified variants are in addition to and do not exclude poly parameters described below, or by manual alignment and morphic variants, interspecies homologs, and alleles of the visual inspection. Such sequences are then said to be "sub invention. stantially identical.” This definition also refers to the comple The following eight groups each contain amino acids that 60 ment of a test sequence. Preferably, the identity exists over a are conservative Substitutions for one another: region that is at least about 15, 20 or 25 nucleotides in length, 1) Alanine (A), Glycine (G); or more preferably over a region that is 50-100 nucleotides in 2) Aspartic acid (D), Glutamic acid (E); length. 3) Asparagine (N). Glutamine (Q); For sequence comparison, typically one sequence acts as a 4) Arginine (R), Lysine (K); 65 reference sequence, to which test sequences are compared. 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); When using a sequence comparison algorithm, test and ref 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W); erence sequences are entered into a computer, Subsequence US 8,148,129 B2 15 16 coordinates are designated, if necessary, and sequence algo nucleic acid is considered similar to a reference sequence if rithm program parameters are designated. Default program the Smallest Sum probability in a comparison of the test parameters can be used, or alternative parameters can be nucleic acid to the reference nucleic acid is less than about designated. The sequence comparison algorithm then calcu 0.2, more preferably less than about 0.01, and most preferably lates the percent sequence identities for the test sequences less than about 0.001. relative to the reference sequence, based on the program Administering an expression vector, nucleic acid, pro parameters. tein, or a delivery vehicle to a cell comprises transducing, A “comparison window', as used herein, includes refer transfecting, electroporating, translocating, fusing, phagocy ence to a segment of any one of the number of contiguous tosing, shooting or ballistic methods, etc., i.e., any means by positions selected from the group consisting of from 15 to 10 which a protein or nucleic acid can be transported across a cell 600, usually about 20 to about 200, more usually about 50 to membrane and preferably into the nucleus of a cell. about 150 in which a sequence may be compared to a refer A "delivery vehicle' refers to a compound, e.g., a lipo ence sequence of the same number of contiguous positions Some, toxin, or a membrane translocation polypeptide, which after the two sequences are optimally aligned. Methods of is used to administer dominant negative proteins. Delivery alignment of sequences for comparison are well-known in the 15 vehicles can also be used to administer nucleic acids encoding art. Optimal alignment of sequences for comparison can be dominant negative proteins of the invention, e.g., a lipid: conducted, e.g., by the local algorithm of Smith & nucleic acid complex, an expression vector, a virus, and the Waterman, Adv. Appl. Math. 2:482 (1981), by the homology like. alignment algorithm of Needleman & Wunsch, J. Mol. Biol. Design of Dominant Negative Proteins 48:443 (1970), by the search for similarity method of Pearson The dominant negative proteins of the invention comprise & Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988), by any of a number of possible fusions of a transcription factor or computerized implementations of these algorithms (GAP, other protein, or fragment thereof, with a protein that is BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics capable of localization to the transcriptional machinery. Such Software Package, Genetics Computer Group, 575 Science as nuclear localized proteins, RNA processing proteins, com Dr. Madison, Wis.), or by manual alignment and visual 25 ponents of the transcriptional machinery, and proteins inspection (see, e.g., Current Protocols in Molecular Biology involved in co-transcriptional processes. Among the co-tran (Ausubel et al., eds. 1995 supplement)). Scriptional processes that are subjects of the invention are A preferred example of algorithm that is suitable for deter capping, splicing, polyadenylation, RNA export, translation. mining percent sequence identity and sequence similarity are The transcription factor can be derived from any of a number the BLAST and BLAST 2.0 algorithms, which are described 30 of species including, and not limited to, viruses, HIV, bacte in Altschulet al., Nuc. Acids Res. 25:3389-3402 (1977) and ria, yeast, Drosophila, C. elegans, Xenopus, mouse, monkey, Altschuletal., J. Mol. Biol. 215:403-410 (1990), respectively. and human. For human applications, a human TF is generally BLAST and BLAST 2.0 are used, with the default parameters preferred. One of skill in the art will recognize that a wide described herein, to determine percent sequence identity for variety of transcription factor proteins known in the art may the nucleic acids described herein. Software for performing 35 be used in this invention. See Goodrich et al., Cell 84:825-30 BLAST analyses is publicly available through the National (1996), Barnes & Adcock, Clin. Exp. Allergy 25 Suppl. 2:46-9 Center for Biotechnology Information. This algorithm (1995), and Roeder, Methods Enzymol. 273:165-71 (1996) involves first identifying high scoring sequence pairs (HSPs) for general reviews of transcription factors. Databases dedi by identifying short words of length W in the query sequence, cated to transcription factors are known (see, e.g., Science which either match or satisfy some positive-valued threshold 40 269:630 (1995)). Nuclear transcription score T when aligned with a word of the same length in a factors are described in, for example, Rosen et al., J. Med. database sequence. T is referred to as the neighborhood word Chem.38:4855-74 (1995). The C/EBP family of transcription score threshold (Altschul et al., Supra). These initial neigh factors are reviewed in Wedelet al., Immunobiology 193: 171 borhood word hits act as seeds for initiating searches to find 85 (1995). Coactivators and co-repressors that mediate tran longer HSPs containing them. The word hits are extended in 45 Scription regulation by nuclear hormone receptors are both directions along each sequence for as far as the cumu reviewed in, for example, Meier, Eur: J. Endocrinol. 134(2): lative alignment score can be increased. Cumulative scores 158-9 (1996); Kaiser et al., Trends Biochem. Sci. 21:342-5 are calculated using, for nucleotide sequences, the parameters (1996); and Utley et al., Nature 394:498–502 (1998)). GATA M (reward score for a pair of matching residues; always >0) transcription factors, which are involved in regulation of and N (penalty score for mismatching residues; always <0). 50 hematopoiesis, are described in, for example, Simon, Nat. Extension of the word hits in each direction are halted when: Genet. 11:9-11 (1995); Weiss et al., Exp. Hematol. 23:99 the cumulative alignment score falls off by the quantity X 107. TATA box binding protein (TBP) and its associated TAF from its maximum achieved value; the cumulative score goes polypeptides (which include TAF30, TAF55, TAF80, TAF to zero or below, due to the accumulation of one or more 110, TAF 150, and TAF250) are described in Goodrich & negative-scoring residue alignments; or the end of either 55 Tjian, Curr. Opin. Cell Biol. 6:403-9 (1994) and Hurley, Curr: sequence is reached. The BLAST algorithm parameters W.T. Opin. Struct. Biol. 6:69-75 (1996). The STAT family of tran and X determine the sensitivity and speed of the alignment. Scription factors are reviewed in, for example, Barahmand The BLASTN program (for nucleotide sequences) uses as Pour et al., Curr: Top. Microbiol. Immunol. 211: 121-8 (1996). defaults a word length (W) of 11, an expectation (E) of 10, Transcription factors involved in disease are reviewed in Aso M=5, N=-4 and a comparison of both strands. 60 et al., J. Clin. Invest. 97:1561-9 (1996). The BLAST algorithm also performs a statistical analysis As further examples, the transcription factor may be cho of the similarity between two sequences (see, e.g., Karlin & sen from any of a number of different classes of known Altschul, Proc. Natl. Acad. Sci. USA 90:5873-5787 (1993)). transcription factors such as those that contain home One measure of similarity provided by the BLAST algorithm odomains, POU domains, Helix-Loop-Helix (HLH), Zinc is the smallest sum probability (P(N)), which provides an 65 Fingers, Leucine Zippers, or Winged Helix, to name but a few indication of the probability by which a match between two of the structural motifs found in transcription factors. Cur nucleotide sequences would occur by chance. For example, a rently, there are about 2000 known transcription factors. See, US 8,148,129 B2 17 18 e.g., Brivanlou and Darnell, Science, 295: 813-818 (2002). The RS domain is a structural and functional feature char Among some of the better known transcription factors acteristic of many nuclear proteins, particularly splicing fac include: c-Myc and Max, c-Fos and c-Jun, CREB, c-ErbA, tors. A large number of RS domain proteins are known in the c-Ets, GATA c-Myb, MyoD KF-kB, RAR, and SRF, to name art, and many have been identified through a genome-wide a few. 5 survey of RS domain proteins from various species. See Among the classes of transcription factors that find use in Boucher et al., RNA 7:1693-1701 (2001). Among the classes this invention are viral transcription factors, nuclear proto of known RS domain containing proteins that may be used in oncogene or oncogene proteins, nuclear tumor suppressor the practice of the invention are those listed in the table below. proteins, heart specific transcription factors, and immune cell In one embodiment of the invention, HIV Tat protein, or a 10 fragment thereof, can used as the transcription factor in a transcription factors. The viral transcription factors useful in dominant negative fusion protein as described herein. The the practice of this invention include: HIV-Tat, HPV-E2. human Tat protein is an 86 amino acid protein that is required HPV-E7, BPV-E2, Adenovirus IVa2, HSV-1 ICP4, EBNA efficient viral gene expression. The Tat sequence has been LP, EBNA-2, EBNA-3A, EBNA-3B, EBNA-3C, BZLF-1, Subdivided into several distinct regions based on structure and CMV-IE-1, CMV-IE2, HHSV-8Kb7IP, HBV Hbx, Poxvirus 15 function: a N-terminal activation region (amino acids 1-19), a Vaccinia, VETF, HCV NS5A, T-Ag, Adenovirus EIA, Herp cysteine-rich domain (amino acids 20-31), a core region esvirus VP16, HTLV Tax, HepadnavirusX protein, and Bacu (amino acids 32-47), a basic region (amino acids 48-57), and lovirus AcNPVIE-1, among others. The nuclear proto-onco a glutamine-rich region (amino acids 60-76). See Karn, J. gene or oncogene proteins and nuclear tumor suppressor (ref). In one particular embodiment, a full length Tat is linked proteins transcription factors useful in the practice of this to the splicing factors SF1 or U2AF65. In another embodi invention include: Abl, Myc, Myb, Rel, Jun, Fos, Sp I, Apl. ment, the Tat activation domain (Tat AD) is linked to the NF-kB, STAT3 or 5, B-catenin, Notch, GLI, PML-RARC. and splicing factors SF1 or U2AF65. p53, among others. The heart specific transcription factors Generation of Nucleic Acids Encoding Dominant Negative useful in the practice of this invention include: Nkx 2, 3, 4, or Proteins. 5, TBX5, GATA 4, 5, or 6, and MEF2, among others. The 25 Dominant negative polypeptides and nucleic acids of the immune cell specific transcription factors useful in the prac invention can be made using routine techniques in the field of tice of this invention include: Ikaros, PU. 1, PAX-5, Oct-2, and recombinant genetics. Basic texts disclosing the general BOB.1/OBF.1, among others. A nonlimiting list of transcrip methods of use in this invention include Sambrook et al., tion factors that may be used in the practice of this invention Molecular Cloning, A Laboratory Manual (2nd ed. 1989); is provided in Table 3. The transcription factors useful in the 30 Kriegler, Gene Transfer and Expression. A Laboratory practice of this invention can be human as well as derived Manual (1990); and Current Protocols in Molecular Biology from yeast or higher eukaryotes such as viruses, HIV. Droso (Ausubel et al., eds., 1994)). In addition, essentially any phila, C. elegans, Xenopus, or mouse, among other species. nucleic acid can be custom ordered from any of a variety of In the practice of this invention, the transcription factor can commercial sources. Similarly, peptides and antibodies can be either a transcriptional activator or repressor, examples of 35 be custom ordered from any of a variety of commercial which are well known in the art. Non-limiting examples of SOUCS. transcriptional activators and repressors are provided in Table Expression Vectors for Nucleic Acids Encoding Dominant 3. Negative Proteins Proteins that localize to the transcriptional machinery A nucleic acid encoding a dominant negative protein is include: components of the transcriptional machinery, 40 typically cloned into intermediate vectors for transformation nuclear localized proteins, RNA processing proteins, compo into prokaryotic or eukaryotic cells for replication and/or nents of the transcriptional machinery, and proteins involved expression. Intermediate vectors are typically prokaryote in co-transcriptional processes and RNA processing. vectors, e.g., plasmids, or shuttle vectors, or insect vectors, Among the components of the transcriptional machinery for storage or manipulation of the nucleic acid encoding that may be used in the practice of this invention are TAFs, 45 dominant negative proteins or production of protein. The CDK7, cyclin H. DNA helicase, unwinding enzymes, tran nucleic acid encoding a dominant negative protein is also Scription factors, among others. typically cloned into an expression vector, for administration A wide range of proteins have been shown to localize to the to a plant cell, animal cell, preferably a mammalian cell or a nucleus and may be used in the practice of this invention. A human cell, fungal cell, bacterial cell, or protozoal cell. non-limiting list of such proteins is provided in Table 1. 50 To obtain expression of a cloned gene or nucleic acid, a Among the co-transcriptional processes and RNA process nucleic acid encoding a dominant negative protein is typically ing activities that are subjects of the invention are capping, Subcloned into an expression vector that contains a promoter splicing, polyadenylation, RNA export, and translation. to direct transcription. Suitable bacterial and eukaryotic pro Accordingly, proteins involved in capping, splicing, polyade moters are well known in the art and described, e.g., in Sam nylation, RNA export, and translation may be used in the 55 brook et al., Molecular Cloning, A Laboratory Manual (2nd practice of this invention. Splicing factors represent one par ed. 1989); Kriegler, Gene Transfer and Expression: A Labo ticular class of proteins involved in co-transcriptional pro ratory Manual (1990); and Current Protocols in Molecular cessing of RNA and are suitable for the practice of this inven Biology (Ausubel et al., eds., 1994). Bacterial expression tion. As many as 300 factors are known to comprise the systems for expressing a dominant negative protein are avail spliceosome. The protein components of spliceosomes are 60 able in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al., disclosed in RappSilber, J., Ryder, U. Lamond, A. I., and Gene 22:229-235 (1983)). Kits for such expression systems Mann, M. (2002) Genome Res 12(8), 1231-1245 and Zhou, are commercially available. Eukaryotic expression systems Z. Licklider, L. J., Gygi, S. P., and Reed, R. (2002) Nature for mammalian cells, yeast, and insect cells are well known in 419(6903), 182-185, among other sources. Many splicing the art and are also commercially available. factors useful for the practice of this invention are compiled in 65 The promoter used to direct expression of a nucleic acid Table 2. Particular examples of splicing factors useful in the encoding a dominant negative protein depends on the particu practice of this invention include SF1, U2AF65, and 9G8. lar application. For example, a strong constitutive promoter is US 8,148,129 B2 19 20 typically used for expression and purification of a dominant Standard transfection methods are used to produce bacte negative protein. In contrast, when a dominant negative pro rial, mammalian, yeast or insect cell lines that express large tein is administered in vivo for gene regulation, either a con quantities of protein, which are then purified using standard stitutive or an inducible promoter is used, depending on the techniques (see, e.g., Colley et al., J. Biol. Chem. 264:17619 particular use of the dominant negative protein. In addition, a 17622 (1989); Guide to Protein Purification, in Methods in preferred promoter for administration of a dominant negative Enzymology, Vol. 182 (Deutscher, ed., 1990)). Transforma protein can be a weak promoter, such as HSV TK or a pro tion of eukaryotic and prokaryotic cells are performed moter having similar activity. The promoter typically can also according to standard techniques (see, e.g., Morrison, J. Bact. include elements that are responsive to transactivation, e.g., 132:349-351 (1977); Clark-Curtiss & Curtiss, Methods in 10 Enzymology 101:347-362 (Wu et al., eds, 1983). hypoxia response elements, GalA response elements, lac Any of the well known procedures for introducing foreign repressor response element, and Small molecule control sys nucleotide sequences into host cells may be used. These tems such as tet-regulated systems and the RU-486 system include the use of calcium phosphate transfection, polybrene, (see, e.g., Gossen & Bujard, PNAS89:5547 (1992); Oligino et protoplast fusion, electroporation, liposomes, microinjec al., Gene Ther: 5:491-496 (1998); Wang et al., Gene Ther. 15 tion, naked DNA, plasmid vectors, viral vectors, both episo 4:432-441 (1997); Neering et al., Blood 88:1147-1155 mal and integrative, and any of the other well known methods (1996); and Rendahl et al., Nat. Biotechnol. 16:757-761 for introducing cloned genomic DNA, cDNA, synthetic DNA (1998)). or other foreign genetic material into a host cell (see, e.g., In addition to the promoter, the expression vector typically Sambrook et al., Supra). It is only necessary that the particular contains a transcription unit or expression cassette that con genetic engineering procedure used be capable of Success tains all the additional elements required for the expression of fully introducing at least one gene into the host cell capable of the nucleic acid in host cells, either prokaryotic or eukaryotic. expressing the protein of choice. A typical expression cassette thus contains a promoter oper Assays for Determining Regulation of Gene Expression by ably linked, e.g., to the nucleic acid sequence encoding the Dominant Negative Proteins dominant negative protein, and signals required, e.g., for effi 25 A variety of assays can be used to determine the level of cient polyadenylation of the transcript, transcriptional termi gene expression regulation by dominant negative proteins. nation, ribosome binding sites, or translation termination. The activity of a particular dominant negative protein can be Additional elements of the cassette may include, e.g., enhanc assessed using a variety of in vitro and in vivo assays, by ers, and heterologous spliced intronic signals. measuring, e.g., protein or mRNA levels, product levels, The particular expression vector used to transport the 30 enzyme activity, tumor growth; transcriptional activation or genetic information into the cell is selected with regard to the repression of a reporter gene Such as a fluorescent protein intended use of the dominant negative protein, e.g., expres (e.g., GFP); second messenger levels (e.g., c6MP. cAMP sion in plants, animals, bacteria, fungus, protozoa etc. (see IP3, DAG, Ca"): cytokine and hormone production levels; expression vectors described below). Standard bacterial and neovascularization, using, e.g., immunoassays (e.g., expression vectors include plasmids such as pBR322 based 35 ELISA and immunohistochemical assays with antibodies), plasmids, pSKF, pFT23D, and commercially available fusion hybridization assays (e.g., RNase protection, northerns, in expression systems such as GST and Lacz. A preferred fusion situ hybridization, oligonucleotide array studies), colorimet protein is the maltose binding protein, “MBP.” Such fusion ric assays, amplification assays, enzyme activity assays, proteins are used for purification of the dominant negative tumor growth assays, phenotypic assays, and the like. protein. Epitope tags can also be added to recombinant pro 40 Dominant negative proteins are typically first tested for teins to provide convenient methods of isolation, for moni activity in vitro using cultured cells, e.g., 293 cells, CHO toring expression, and for monitoring cellular and Subcellular cells, VERO cells, BHK cells, HeLa cells, COS cells, and the localization, e.g., c-myc or FLAG. like. Preferably, human cells are used. The dominant negative Expression vectors containing regulatory elements from protein is often first tested using a transient expression system eukaryotic viruses are often used in eukaryotic expression 45 with a reporter gene, and then regulation of the target endog vectors, e.g., SV40 vectors, papilloma virus vectors, and vec enous gene is tested in cells and in animals, both in vivo and tors derived from Epstein-Barr virus. Other exemplary ex vivo. The dominant negative protein can be recombinantly eukaryotic vectors include pMSG, paV009/A+, pMTO10/ expressed in a cell, recombinantly expressed in cells trans A+, pMAMneo-5, baculovirus plSVE, and any other vector planted into an animal, or recombinantly expressed in a trans allowing expression of proteins under the direction of the 50 genic animal, as well as administered as a protein to an animal SV40 early promoter, SV40 late promoter, metallothionein or cell using delivery vehicles described below. The cells can promoter, murine mammary tumor virus promoter, Roussar be immobilized, be in Solution, be injected into an animal, or coma virus promoter, polyhedrin promoter, or other promot be naturally occurring in a transgenic or non-transgenic ani ers shown effective for expression in eukaryotic cells. mal. Some expression systems have markers for selection of 55 Modulation of gene expression is tested using one of the in stably transfected cell lines such as thymidine kinase, hygro vitro or in vivo assays described herein. Samples or assays are mycin B phosphotransferase, and dihydrofolate reductase. treated with a dominant negative protein and compared to High yield expression systems are also suitable. Such as using control samples without the test compound, to examine the a baculovirus vector in insect cells, with a dominant negative extent of modulation. protein encoding sequence under the direction of the polyhe 60 The effects of the dominant negative proteins can be mea drin promoter or other strong baculovirus promoters. Sured by examining any of the parameters described above. The elements that are typically included in expression vec Any Suitable gene expression, phenotypic, or physiological tors also include a replicon that functions in E. coli, a gene change can be used to assess the influence of a dominant encoding antibiotic resistance to permit selection of bacteria negative protein. When the functional consequences are that harbor recombinant plasmids, and unique restriction sites 65 determined using intact cells or animals, one can also mea in nonessential regions of the plasmid to allow insertion of Sure a variety of effects such as tumor growth, neovascular recombinant sequences. ization, hormone release, transcriptional changes to both US 8,148,129 B2 21 22 known and uncharacterized genetic markers (e.g., northern target tissues. Such methods can be used to administer nucleic blots or oligonucleotide array studies), changes in cell acids encoding dominant negative proteins to cells in vitro. metabolism such as cell growth or pH changes, and changes Preferably, the nucleic acids encoding dominant negative pro in intracellular second messengers such as cCMP. teins are administered for in vivo orex vivo gene therapy uses. Assays for dominant negative protein regulation of endog Non-viral vector delivery systems include DNA plasmids, enous gene expression can be performed in vitro. In one naked nucleic acid, and nucleic acid complexed with a deliv preferred in vitro assay format, dominant negative protein ery vehicle such as a liposome. Viral vector delivery systems regulation of endogenous gene expression in cultured cells is include DNA and RNA viruses, which have either episomal measured by examining protein production using an ELISA or integrated genomes after delivery to the cell. For a review assay (see Examples VI and VII). The test sample is compared 10 of gene therapy procedures, see Anderson, Science 256:808 to control cells treated with an empty vector or an unrelated 813 (1992); Nabel & Feigner, TIBTECH 11:211-217 (1993); dominant negative protein that is targeted to another gene. Mitani & Caskey, TIBTECH 11:162-166 (1993); Dillon, In another embodiment, dominant negative protein regula TIBTECH 11:167-175 (1993); Miller, Nature 357:455-460 tion of endogenous gene expression is determined in vitro by (1992); Van Brunt, Biotechnology 6(10):1149-1154 (1988); measuring the level of target gene mRNA expression. The 15 Vigne, Restorative Neurology and Neuroscience 8:35-36 level of gene expression is measured using amplification, e.g., (1995); Kremer & Perricaudet, British Medical Bulletin using PCR, LCR, or hybridization assays, e.g., northern 51(1):31-44 (1995); Haddada et al., in Current Topics in hybridization, RNase protection, dot blotting. RNase protec Microbiology and Immunology Doerfler and Böhm (eds) tion is used in one embodiment (see Example VIII and FIG. (1995); and Yu et al., Gene Therapy 1:13-26 (1994). 10). The level of protein or mRNA is detected using directly Methods of non-viral delivery of nucleic acids encoding or indirectly labeled detection agents, e.g., fluorescently or engineered dominant negative proteins include lipofection, radioactively labeled nucleic acids, radioactively or enzy microinjection, ballistics, Virosomes, liposomes, immunoli matically labeled antibodies, and the like, as described herein. posomes, polycation or lipid:nucleic acid conjugates, naked Alternatively, a reporter gene system can be devised using DNA, artificial virions, and agent-enhanced uptake of DNA. the target gene promoter operably linked to a reporter gene 25 Lipofection is described in e.g., U.S. Pat. No. 5,049.386, U.S. Such as luciferase, green fluorescent protein, CAT, or 3-gal. Pat. No. 4,946,787; and U.S. Pat. No. 4,897,355) and lipofec The reporter construct is typically co-transfected into a cul tion reagents are sold commercially (e.g., TransfectamTM and tured cell. After treatment with the dominant negative protein LipofectinTM). Cationic and neutral lipids that are suitable for of choice, the amount of reporter gene transcription, transla efficient receptor-recognition lipofection of polynucleotides tion, or activity is measured according to standard techniques 30 include those of Feigner, WO 91/17424, WO 91/16024. known to those of skill in the art. Delivery can be to cells (ex vivo administration) or target Another example of an assay format useful for monitoring tissues (in vivo administration). dominant negative protein regulation of endogenous gene The preparation of lipid:nucleic acid complexes, including expression is performed in vivo. This assay is particularly targeted liposomes such as immunolipid complexes, is well useful for examining dominant negative proteins that inhibit 35 known to one of skill in the art (see, e.g., Crystal, Science expression of tumor promoting genes, genes involved in 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291 tumor Support, such as neovascularization (e.g., VEGF), or 297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 that activate tumor Suppressor genes such as p53. In this (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); assay, cultured tumor cells expressing the dominant negative Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., protein of choice are injected Subcutaneously into an immune 40 Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, compromised mouse Such as an athymic mouse, an irradiated 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, mouse, or a SCID mouse. After a suitable length of time, 4,774,085, 4,837,028, and 4,946,787). preferably 4-8 weeks, tumor growth is measured, e.g., by The use of RNA or DNA viral based systems for the deliv Volume or by its two largest dimensions, and compared to the ery of nucleic acids encoding engineered dominant negative control. Tumors that have statistically significant reduction 45 protein take advantage of highly evolved processes for target (using, e.g., Student's T test) are said to have inhibited ing a virus to specific cells in the body and trafficking the viral growth. Alternatively, the extent of tumor neovascularization payload to the nucleus. Viral vectors can be administered can also be measured. Immunoassays using endothelial cell directly to patients (in vivo) or they can be used to treat cells specific antibodies are used to stain for vascularization of the in vitro and the modified cells are administered to patients (ex tumor and the number of vessels in the tumor. Tumors that 50 vivo). Conventional viral based systems for the delivery of have a statistically significant reduction in the number of dominant negative proteins could include retroviral, lentivi vessels (using, e.g., Student's T test) are said to have inhibited rus, adenoviral, adeno-associated and herpes simplex virus neovascularization. vectors for gene transfer. Viral vectors are currently the most Transgenic and non-transgenic animals are also used as a efficient and Versatile method of gene transfer in target cells preferred embodiment for examining regulation of endog 55 and tissues. Integration in the host genome is possible with enous gene expression in vivo. Transgenic animals typically the retrovirus, lentivirus, and adeno-associated virus gene express the dominant negative protein of choice. Alterna transfer methods, often resulting in long term expression of tively, animals that transiently express the dominant negative the inserted transgene. Additionally, high transduction effi protein of choice, onto which the dominant negative protein ciencies have been observed in many different cell types and has been administered in a delivery vehicle, can be used. 60 target tissues. Regulation of endogenous gene expression is tested using any The tropism of a retrovirus can be altered by incorporating one of the assays described herein. foreign envelope proteins, expanding the potential target Nucleic Acids Encoding Dominant Negative Proteins and population of target cells. Lentiviral vectors are retroviral Gene Therapy vector that are able to transduce or infect non-dividing cells Conventional viral and non-viral based genetransfer meth 65 and typically produce high viral titers. Selection of a retrovi ods can be used to introduce nucleic acids encoding engi ral gene transfer system would therefore depend on the target neered dominant negative proteins in mammalian cells or tissue. Retroviral vectors are comprised of cis-acting long US 8,148,129 B2 23 24 terminal repeats with packaging capacity for up to 6-10kb of cells Such as those found in the liver, kidney and muscle foreign sequence. The minimum cis-acting LTRS are Suffi system tissues. Conventional Advectors have a large carrying cient for replication and packaging of the vectors, which are capacity. An example of the use of an Ad Vector in a clinical then used to integrate the therapeutic gene into the target cell trial involved polynucleotide therapy for antitumor immuni to provide permanent transgene expression. Widely used ret Zation with intramuscular injection (Sterman et al., Hum. roviral vectors include those based upon murine leukemia Gene Ther. 7:1083-9 (1998)). Additional examples of the use virus (Mul V), gibbon ape leukemia virus (Gal. V), Simian of adenovirus vectors for gene transfer in clinical trials Immuno deficiency virus (SIV), human immuno deficiency include Rosenecker et al., Infection 24:15-10 (1996); Ster virus (HIV), and combinations thereof (see, e.g., Buchscher man et al., Hum. Gene Ther: 9:71083-1089 (1998); Welsh et et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 10 al., Hum. Gene Ther. 2:205-18 (1995); Alvarez et al., Hum. 66:1635-1640 (1992); Sommerfelt et al., Virol. 176:58-59 Gene Ther. 5:597-613 (1997); Topfet al., Gene Ther. 5:507 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Milleret 513 (1998); Sterman et al., Hum. Gene Ther. 7:1083-1089 al., J. Virol. 65:2220-2224 (1991); PCT/US94/05700). (1998). In applications where transient expression of the dominant Packaging cells are used to form virus particles that are negative protein is preferred, adenoviral based systems are 15 capable of infecting a host cell: Such cells include 293 cells, typically used. Adenoviral based vectors are capable of very which package adenovirus, and p2 cells or PA317 cells, high transduction efficiency in many cell types and do not which package retrovirus. Viral vectors used in gene therapy require cell division. With such vectors, high titer and levels are usually generated by producer cell line that packages a of expression have been obtained. This vector can be pro nucleic acid vector into a viral particle. The vectors typically duced in large quantities in a relatively simple system. Adeno contain the minimal viral sequences required for packaging associated virus (AAV) vectors are also used to transduce and Subsequent integration into a host, other viral sequences cells with target nucleic acids, e.g., in the in vitro production being replaced by an expression cassette for the protein to be of nucleic acids and peptides, and for in vivo and ex vivo gene expressed. The missing viral functions are Supplied in trans therapy procedures (see, e.g., West et al., Virology 160:38-47 by the packaging cell line. For example, AAV vectors used in (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, 25 gene therapy typically only possess ITR sequences from the Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. AAV genome which are required for packaging and integra Invest. 94:1351 (1994). Construction of recombinant AAV tion into the host genome. Viral DNA is packaged in a cell vectors are described in a number of publications, including line, which contains a helperplasmid encoding the other AAV U.S. Pat. No. 5,173,414: Tratschin et al., Mol. Cell. Biol. genes, namely rep and cap, but lacking ITR sequences. The 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol. 4:2072 30 cell line is also infected with adenovirus as a helper. The 2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 helper virus promotes replication of the AAV vector and (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989). expression of AAV genes from the helper plasmid. The helper In particular, at least six viral vector approaches are cur plasmid is not packaged in significant amounts due to a lack rently available for gene transfer in clinical trials, with retro of ITR sequences. Contamination with adenovirus can be viral vectors by far the most frequently used system. All of 35 reduced by, e.g., heat treatment to which adenovirus is more these viral vectors utilize approaches that involve comple sensitive than AAV. mentation of defective vectors by genes inserted into helper In many gene therapy applications, it is desirable that the cell lines to generate the transducing agent. gene therapy vector be delivered with a high degree of speci pIASN and MFG-S are examples are retroviral vectors ficity to a particular tissue type. A viral vector is typically that have been used in clinical trials (Dunbar et al., Blood 40 modified to have specificity for a given cell type by express 85:3048-305 (1995); Kohn et al., Nat. Med. 1: 1017-102 ing a ligand as a fusion protein with a viral coat protein on the (1995); Malech et al., PNAS 94:22 12133-12138 (1997)). viruses outer surface. The ligand is chosen to have affinity for PA317/pLASN was the first therapeutic vector used in a gene a receptor known to be present on the cell type of interest. For therapy trial. (Blaese et al., Science 270:475-480 (1995)). example, Han et al., PNAS 92.9747-9751 (1995), reported Transduction efficiencies of 50% or greater have been 45 that Moloney murine leukemia virus can be modified to observed for MFG-S packaged vectors. (Ellem et al., Immu express human heregulin fused to gp70, and the recombinant nol Immunother. 44(1):10-20 (1997); Dranoff et al., Hum. virus infects certain human breast cancer cells expressing Gene Ther: 1:111-2 (1997). human epidermal growth factor receptor. This principle can Recombinant adeno-associated virus vectors (rAAV) are a be extended to other pairs of virus expressing a ligand fusion promising alternative gene delivery systems based on the 50 protein and target cell expressing a receptor. For example, defective and nonpathogenic parvovirus adeno-associated filamentous phage can be engineered to display antibody type 2 virus. All vectors are derived from a plasmid that fragments (e.g., FAB or FV) having specific binding affinity retains only the AAV 145 by inverted terminal repeats flank for virtually any chosen cellular receptor. Although the above ing the transgene expression cassette. Efficient gene transfer description applies primarily to viral vectors, the same prin and stable transgene delivery due to integration into the 55 ciples can be applied to nonviral vectors. Such vectors can be genomes of the transduced cell are key features for this vector engineered to contain specific uptake sequences thought to system. (Wagner et al., Lancet 351:91.17 1702-3 (1998), favor uptake by specific target cells. Kearns et al., Gene Ther: 9:748-55 (1996)). Gene therapy vectors can be delivered in vivo by adminis Replication-deficient recombinant adenoviral vectors (Ad) tration to an individual patient, typically by Systemic admin are predominantly used for colon cancer gene therapy, 60 istration (e.g., intravenous, intraperitoneal, intramuscular, because they can be produced at high titer and they readily Subdermal, or intracranial infusion) or topical application, as infect a number of different cell types: Most adenovirus vec described below. Alternatively, vectors can be delivered to tors are engineered such that a transgene replaces the Ad E1a, cells ex vivo. Such as cells explanted from an individual E1b, and E3 genes; subsequently the replication defector patient (e.g., lymphocytes, bone marrow aspirates, tissue vector is propagated in human 293 cells that supply deleted 65 biopsy) or universal donor hematopoietic stem cells, fol gene function in trans. Ad Vectors can transduce multiply lowed by reimplantation of the cells into a patient, usually types of tissues in vivo, including nondividing, differentiated after selection for cells which have incorporated the vector. US 8,148,129 B2 25 26 Ex vivo cell transfection for diagnostics, research, or for found to be the third helix of the protein, from amino acid gene therapy (e.g., via re-infusion of the transfected cells into position 43 to 58 (see, e.g., Prochiantz, Current Opinion in the host organism) is well known to those of skill in the art. In Neurobiology 6:629-634 (1996)). Another subsequence, the h a preferred embodiment, cells are isolated from the subject (hydrophobic) domain of signal peptides, was found to have organism, transfected with a dominant negative protein 5 similar cell membrane translocation characteristics (see, e.g., nucleic acid (gene or cDNA), and re-infused back into the Lin et al., J. Biol. Chem. 270: 14255-14258 (1995)). Subject organism (e.g., patient). Various cell types suitable for Examples of peptide sequences which can be linked to a ex vivo transfection are well known to those of skill in the art dominant negative protein of the invention, for facilitating (see, e.g., Freshney et al., Culture of Animal Cells, A Manual uptake of dominant negative protein into cells, include, but of Basic Technique (3rd ed. 1994)) and the references cited 10 are not limited to: an 11 animo acid peptide of the tat protein therein for a discussion of how to isolate and culture cells of HIV: a 20 residue peptide sequence which corresponds to from patients). amino acids 84-103 of the p16 protein (see Fahraeus et al., In one embodiment, stem cells are used in ex vivo proce Current Biology 6:84 (1996)); the third helix of the 60-amino dures for cell transfection and gene therapy. The advantage to acid long homeodomain of Antennapedia (Derossi et al., J. using stem cells is that they can be differentiated into other 15 Biol. Chem. 269:10444 (1994)); the h region of a signal cell types in vitro, or can be introduced into a mammal (Such peptide such as the Kaposi fibroblast growth factor (K-FGF) as the donor of the cells) where they will engraft in the bone h region (Lin et al., Supra); or the VP22 translocation domain marrow. Methods for differentiating CD34+ cells in vitro into from HSV (Elliot & O'Hare, Cell 88:223-233 (1997)). Other clinically important immune cell types using cytokines Such suitable chemical moieties that provide enhanced cellular a GM-CSF, IFN-Y and TNF-C. are known (see Inaba et al., J. uptake may also be chemically linked to dominant negative Exp. Med. 176: 1693-1702 (1992)). proteins. For example, nuclear localization signals may be Stem cells are isolated for transduction and differentiation appended to enhance uptake into the nuclear compartment of using known methods. For example, stem cells are isolated cells. from bone marrow cells by panning the bone marrow cells Toxin molecules also have the ability to transport polypep with antibodies which bindunwanted cells, such as CD4+ and 25 tides across cell membranes. Often, Such molecules are com CD8+ (T cells), CD45+(panB cells), GR-1 (granulocytes), posed of at least two parts (called “binary toxins): a translo and Tad (differentiated antigen presenting cells) (see Inaba et cation or binding domain or polypeptide and a separate toxin al., J. Exp. Med. 176:1693-1702 (1992)). domain or polypeptide. Typically, the translocation domain or Vectors (e.g., retroviruses, adenoviruses, liposomes, etc.) polypeptide binds to a cellular receptor, and then the toxin is containing therapeutic dominant negative protein nucleic 30 transported into the cell. Several bacterial toxins, including acids can be also administered directly to the organism for Clostridium perfingens iota toxin, diphtheria toxin (DT), transduction of cells in vivo. Alternatively, naked DNA can be Pseudomonas exotoxin A (PE), pertussis toxin (PT), Bacillus administered. Administration is by any of the routes normally anthracis toxin, and pertussis adenylate cyclase (CYA), have used for introducing a molecule into ultimate contact with been used in attempts to deliver peptides to the cell cytosol as blood or tissue cells. Suitable methods of administering such 35 internal or amino-terminal fusions (Arora et al., J. Biol. nucleic acids are available and well known to those of skill in Chem., 268:3334-3341 (1993); Perelle et al., Infect. Immun., the art, and, although more than one route can be used to 61:5147-5156 (1993); Stenmarket al., J. Cell Biol. 113:1025 administer a particular composition, a particular route can 1032 (1991); Donnelly et al., PNAS 90:3530-3534 (1993); often provide a more immediate and more effective reaction Carbonetti et al., Abstr. Annu. Meet. Am. Soc. Microbiol. than another route. 40 95:295 (1995); Sebo et al., Infect. Immun. 63:3851-3857 Pharmaceutically acceptable carriers are determined in (1995); Klimpelet al., PNAS U.S.A. 89:10277-10281 (1992); part by the particular composition being administered, as well and Novak et al., J. Biol. Chem. 267:17186-17193 1992)). as by the particular method used to administer the composi Such Subsequences can be used to translocate dominant tion. Accordingly, there is a wide variety of suitable formu negative proteins across a cell membrane. Dominant negative lations of pharmaceutical compositions of the present inven 45 proteins can be conveniently fused to orderivatized with such tion, as described below (see, e.g., Remington's sequences. Typically, the translocation sequence is provided Pharmaceutical Sciences, 17th ed., 1989). as part of a fusion protein. Optionally, a linker can be used to Delivery Vehicles for Dominant Negative Proteins link the dominant negative protein and the translocation An important factor in the administration of polypeptide sequence. Any suitable linker can be used, e.g., a peptide compounds, such as the dominant negative proteins of the 50 linker. present invention, is ensuring that the polypeptide has the The dominant negative protein can also be introduced into ability to traverse the plasma membrane of a cell, or the an animal cell, preferably a mammalian cell, via a liposomes membrane of an intra-cellular compartment Such as the and liposome derivatives such as immunoliposomes. The nucleus. Cellular membranes are composed of lipid-protein term “liposome” refers to vesicles comprised of one or more bilayers that are freely permeable to small, nonionic lipo 55 concentrically ordered lipid bilayers, which encapsulate an philic compounds and are inherently impermeable to polar aqueous phase. The aqueous phase typically contains the compounds, macromolecules, and therapeutic or diagnostic compound to be delivered to the cell, i.e., a dominant negative agents. However, proteins and other compounds such as lipo protein. somes have been described, which have the ability to trans The liposome fuses with the plasma membrane, thereby locate polypeptides such as dominant negative proteins 60 releasing the drug into the cytosol. Alternatively, the lipo across a cell membrane. Some is phagocytosed or taken up by the cell in a transport For example, “membrane translocation polypeptides have vesicle. Once in the endoSome or phagosome, the liposome amphiphilic or hydrophobic amino acid Subsequences that either degrades or fuses with the membrane of the transport have the ability to act as membrane-translocating carriers. In vesicle and releases its contents. one embodiment, homeodomain proteins have the ability to 65 In current methods of drug delivery via liposomes, the translocate across cell membranes. The shortest internaliz liposome ultimately becomes permeable and releases the able peptide of a homeodomain protein, Antennapedia, was encapsulated compound (in this case, a dominant negative US 8,148,129 B2 27 28 protein) at the target tissue or cell. For systemic or tissue rate protein A (see Renneisen et al., J. Biol. Chem., 265: specific delivery, this can be accomplished, for example, in a 16337-16342 (1990) and Leonetti et al., PNAS87:2448-2451 passive manner wherein the liposome bilayer degrades over (1990). time through the action of various agents in the body. Alter Doses of Dominant Negative Proteins natively, active drug release involves using an agent to induce For therapeutic applications of dominant negative proteins, a permeability change in the liposome vesicle. Liposome the dose administered to a patient, in the context of the present membranes can be constructed so that they become destabi invention should be sufficient to effect a beneficial therapeu lized when the environment becomes acidic near the lipo tic response in the patient over time. In addition, particular some membrane (see, e.g., PNAS 84:7851 (1987); Biochem dosage regimens can be useful for determining phenotypic istry 28:908 (1989)). When liposomes are endocytosed by a 10 changes in an experimental setting, e.g., in functional genom target cell, for example, they become destabilized and release ics studies, and in cell or animal models. The dose will be their contents. This destabilization is termed fusogenesis. determined by the condition of the patient, as well as the body Dioleoylphosphatidylethanolamine (DOPE) is the basis of weight or surface area of the patient to be treated. The size of many “fusogenic systems. the dose also will be determined by the existence, nature, and Such liposomes typically comprise a dominant negative 15 extent of any adverse side-effects that accompany the admin protein and a lipid component, e.g., a neutral and/or cationic istration of a particular compound or vector in a particular lipid, optionally including a receptor-recognition molecule patient. Such as an antibody that binds to a predetermined cell Surface The appropriate dose of an expression vector encoding a receptor or ligand (e.g., an antigen). A variety of methods are dominant negative protein can also be calculated by taking available for preparing liposomes as described in, e.g., Szoka into account the average rate of dominant negative protein et al., Ann. Rev. Biophys. Bioeng 9:467 (1980), U.S. Pat. Nos. expression from the promoter and the average rate of domi 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, nant negative protein degradation in the cell. Preferably, a 4,501,728, 4,774,085, 4,837,028, 4,235,871, 4,261,975, weak promoter such as a wild-type or mutant HSV TK is 4,485,054, 4,501,728, 4,774,085, 4,837,028, 4,946,787, PCT used. Publication No. WO 91\17424, Deamer & Bangham, Bio 25 In determining the effective amount of a dominant negative chim. Biophys. Acta 443:629-634 (1976); Fraley, et al., PNAS protein to be administered in the treatment or prophylaxis of 76:3348-3352 (1979); Hope et al., Biochim. Biophys. Acta disease, the physician evaluates circulating plasma levels of 812:55-65 (1985); Mayer et al., Biochim. Biophys. Acta 858: the dominant negative protein or nucleic acid encoding the 161-168 (1986); Williams et al., PNAS 85:242-246 (1988); dominant negative protein, potential dominant negative pro Liposomes (Ostro (ed.), 1983, Chapter 1); Hope et al., Chem. 30 tein toxicities, progression of the disease, and the production Phys. Lip. 40.89 (1986); Gregoriadis, Liposome Technology of anti-dominant negative protein antibodies. Administration (1984) and Lasic, Liposomes: from Physics to Applications can be accomplished via single or divided doses. (1993)). Suitable methods include, for example, sonication, Pharmaceutical Compositions and Administration extrusion, high pressure/homogenization, microfluidization, Dominant negative proteins and expression vectors encod detergent dialysis, calcium-induced fusion of Small liposome 35 ing dominant negative proteins can be administered directly vesicles and ether-fusion methods, all of which are well to the patient for modulation of gene expression and for known in the art. therapeutic or prophylactic applications, for example, cancer, In certain embodiments of the present invention, it is desir ischemia, diabetic retinopathy, macular degeneration, rheu able to target the liposomes of the invention using targeting matoid arthritis, psoriasis, HIV infection, sickle cell anemia, moieties that are specific to a particular cell type, tissue, and 40 Alzheimer's disease, muscular dystrophy, neurodegenerative the like. Targeting of liposomes using a variety of targeting diseases, vascular disease, cystic fibrosis, stroke, and the like. moieties (e.g., ligands, receptors, and monoclonal antibodies) Examples of microorganisms that can be inhibited by domi has been previously described (see, e.g., U.S. Pat. Nos. 4,957, nant negative protein gene therapy include pathogenic bacte 773 and 4,603,044). ria, e.g., chlamydia, rickettsial bacteria, mycobacteria, sta Examples of targeting moieties include monoclonal anti 45 phylococci, streptococci, pneumococci, meningococci and bodies specific to antigens associated with neoplasms, such as conococci, klebsiella, proteus, serratia, pseudomonas, prostate cancer specific antigen and MAGE. Tumors can also legionella, diphtheria, Salmonella, bacilli, cholera, tetanus, be diagnosed by detecting gene products resulting from the botulism, anthrax, plague, leptospirosis, and Lyme disease activation or over-expression of oncogenes, such as ras or bacteria; infectious fungus, e.g., Aspergillus, Candida spe c-erbB2. In addition, many tumors express antigens normally 50 cies; protozoa Such as sporozoa (e.g., Plasmodia), rhizopods expressed by fetal tissue, such as the alphafetoprotein (AFP) (e.g., Entamoeba) and flagellates (Trypanosoma, Leishma and carcinoembryonic antigen (CEA). Sites of viral infection nia, Trichomonas, Giardia, etc.); Viral diseases, e.g., hepatitis can be diagnosed using various viral antigens Such as hepa (A, B, or C), herpesvirus (e.g., VZV, HSV-1, HSV-6, HSV-II, titis B core and surface antigens (HBVc, HBVs) hepatitis C CMV, and EBV), HIV, Ebola, adenovirus, influenza virus, antigens, Epstein-Barr virus antigens, human immunodefi 55 flaviviruses, echovirus, rhinovirus, coxsackie Virus, coro ciency type-1 virus (HIV1) and papilloma virus antigens. navirus, respiratory syncytial virus, mumps virus, rotavirus, Inflammation can be detected using molecules specifically measles virus, rubella virus, parvovirus, vaccinia virus, recognized by Surface molecules which are expressed at sites HTLV virus, dengue virus, papillomavirus, poliovirus, rabies of inflammation Such as integrins (e.g., VCAM-1), selectin virus, and arboviral encephalitis virus, etc. receptors (e.g., ELAM-1) and the like. 60 Administration of therapeutically effective amounts is by Standard methods for coupling targeting agents to lipo any of the routes normally used for introducing dominant Somes can be used. These methods generally involve incor negative protein into ultimate contact with the tissue to be poration into liposomes lipid components, e.g., phosphati treated. The dominant negative proteins are administered in dylethanolamine, which can be activated for attachment of any suitable manner, preferably with pharmaceutically targeting agents, orderivatized lipophilic compounds, such as 65 acceptable carriers. Suitable methods of administering Such lipid derivatized bleomycin. Antibody targeted liposomes can modulators are available and well known to those of skill in be constructed using, for instance, liposomes which incorpo the art, and, although more than one route can be used to US 8,148,129 B2 29 30 administer a particular composition, a particular route can expression of a candidate gene by “conventional methods is often provide a more immediate and more effective reaction yet more problematic. Antisense methods and methods that than another route. rely on targeted ribozymes are unreliable. Succeeding for only Pharmaceutically acceptable carriers are determined in a small fraction of the targets selected. Gene knockout by part by the particular composition being administered, as well homologous recombination works fairly well in recombino as by the particular method used to administer the composi genic stem cells but very inefficiently in somatically derived tion. Accordingly, there is a wide variety of suitable formu cell lines. In either case large clones of Syngeneic genomic lations of pharmaceutical compositions of the present inven DNA (on the order of 10 kb) should be isolated for recombi tion (see, e.g., Remington's Pharmaceutical Sciences, 17' nation to work efficiently. ed. 1985)). 10 The dominant negative protein technology can be used to The dominant negative proteins, alone or in combination rapidly analyze differential gene expression studies. Engi with other Suitable components, can be made into aerosol neered dominant negative proteins can be readily used to up formulations (i.e., they can be “nebulized') to be adminis or down-regulate any endogenous target gene. This makes the tered via inhalation. Aerosol formulations can be placed into dominant negative protein technology ideal for analysis of pressurized acceptable propellants, such as dichlorodifluo 15 long lists of poorly characterized differentially expressed romethane, propane, nitrogen, and the like. genes. Formulations suitable for parenteral administration, Such This specific example of using engineered dominant nega as, for example, by intravenous, intramuscular, intradermal, tive proteins to add functional information to genomic data is and Subcutaneous routes, include aqueous and non-aqueous, merely illustrative. Any experimental situation that could isotonic sterile injection solutions, which can contain antioxi benefit from the specific up or down-regulation of a gene or dants, buffers, bacteriostats, and solutes that render the for genes could benefit from the reliability and ease of use of mulation isotonic with the blood of the intended recipient, engineered dominant negative proteins. and aqueous and non-aqueous sterile Suspensions that can Additionally, greater experimental control can be imparted include Suspending agents, solubilizers, thickening agents, by dominant negative proteins than can be achieved by more stabilizers, and preservatives. In the practice of this invention, 25 conventional methods. This is because the production and/or compositions can be administered, for example, by intrave function of an engineered dominant negative protein can be nous infusion, orally, topically, intraperitoneally, intravesi placed under Small molecule control. Examples of this cally or intrathecally. The formulations of compounds can be approach are provided by the Tet-On system, the ecdysone presented in unit-dose or multi-dose sealed containers, such regulated system and a system incorporating a chimeric factor as ampules and vials. Injection Solutions and Suspensions can 30 including a mutant . These systems are be prepared from Sterile powders, granules, and tablets of the all capable of indirectly imparting Small molecule control on kind previously described. any endogenous gene of interest or any transgene by placing Functional Genomics Assays the function and/or expression of a dominant negative protein Dominant negative proteins also have use for assays to under Small molecule control. determine the phenotypic consequences and function of gene 35 Transgenic Mice expression. The recent advances in analytical techniques, A further application of the dominant negative protein coupled with focussed mass sequencing efforts have created technology is manipulating gene expression in transgenic the opportunity to identify and characterize many more animals. Conventional down-regulation of gene expression in molecular targets than were previously available. This new transgenic animals is plagued by technical difficulties. Gene information about genes and their functions will speed along 40 knockout by homologous recombination is the method most basic biological understanding and present many new targets commonly applied currently. This method requires a for therapeutic intervention. In some cases analytical tools tively long genomic clone of the gene to be knocked out (ca. have not kept pace with the generation of new data. An 10kb). Typically, a selectable marker is inserted into an exon example is provided by recent advances in the measurement of the gene of interest to effect the gene disruption, and a of global differential gene expression. These methods, typi 45 second counter-selectable marker provided outside of the fied by gene expression microarrays, differential cDNA clon region of homology to select homologous versus non-ho ing frequencies, subtractive hybridization and differential mologous recombinants. This construct is transfected into display methods, can very rapidly identify genes that are up or embryonic stem cells and recombinants selected in culture. down-regulated in different tissues or in response to specific Recombinant stem cells are combined with very early stage stimuli. Increasingly, such methods are being used to explore 50 embryos generating chimeric animals. If the chimerism biological processes Such as, transformation, tumor progres extends to the germline homozygous knockout animals can Sion, the inflammatory response, neurological disorders etc. be isolated by back-crossing. When the technology is suc One can now very easily generate long lists of differentially cessfully applied, knockout animals can be generated in expressed genes that correlate with a given physiological approximately one year. Unfortunately two common issues phenomenon, but demonstrating a causative relationship 55 often prevent the Successful application of the knockout tech between an individual differentially expressed gene and the nology; embryonic lethality and developmental compensa phenomenon is difficult. Until now, simple methods for tion. Embryonic lethality results when the gene to be knocked assigning function to differentially expressed genes have not out plays an essential role in development. This can manifest kept pace with the ability to monitor differential gene expres itself as a lack of chimerism, lack of germline transmission or Sion. 60 the inability to generate homozygous back crosses. Genes can Using conventional molecular approaches, over expres play significantly different physiological roles during devel sion of a candidate gene can be accomplished by cloning a opment versus in adult animals. Therefore, embryonic lethal full-length cDNA, Subcloning it into a mammalian expres ity is not considered a rationale for dismissing a gene target as sion vector and transfecting the recombinant vector into an a useful target for therapeutic intervention in adults. Embry appropriate host cell. This approach is straightforward but 65 onic lethality most often simply means that the gene of inter laborintensive, particularly when the initial candidate gene is est can not be easily studied in mouse models, using conven represented by a simple expressed sequence tag (EST). Under tional methods. US 8,148,129 B2 31 32 Developmental compensation is the Substitution of a Example 1 related gene product for the gene product being knocked out. Genes often exist in extensive families. Selection or induction Dominant Negative Inhibition of Transcription by during the course of development can in some cases trigger Linking Tatto a Protein that Localizes to the the substitution of one family member for another mutant Transcriptional Machinery member. This type of functional substitution may not be possible in the adult animal. A typical result of developmental The Tat-hybrid assay, in which Tat fused to a heterologous compensation would be the lack of a phenotype in a knockout RNA-binding domain (RBD) elicits activation of an HIV-1 mouse when the ablation of that gene's function in an adult LTR reporter plasmid containing a cognate RNA-binding 10 site, has been useful for studying RNA-protein interactions in would otherwise cause a physiological change. This is a kind living cells'. However, as with other types of fusion protein of false negative result that often confounds the interpretation assays, dominant negative proteins can be generated uninten of conventional knockout mouse models. tionally that score as false negatives. We discovered a novel A few new methods have been developed to avoid embry class of highly potent dominant negatives, exemplified by Tat onic lethality. These methods are typified by an approach 15 fusions to splicing factors, whose potency appears to be dic using the cre recombinase and lox DNA recognition ele tated by cotranscriptional recruitment to the HIV promoter. ments. The recognition elements are inserted into a gene of We devised a dual-fluorescence Tat-hybrid assay to moni interest using homologous recombination (as described tor RNA-binding specificity using two pairs of orthogonal above) and the expression of the recombinase induced in reporters and Tat fusions, herein referred to as T-fusions. To adult mice post-development. This causes the deletion of a calibrate the assay, T-BIV, a fusion between the HIV Tat portion of the target gene and avoids developmental compli activation domain (AD) and the RBD of bovine immunode cations. The method is labor intensive and suffers form chi ficiency virus (BIV) Tat, was used to activate a BIV TAR merism due to non-uniform induction of the recombinase. (BTAR)-DsRed reporter, while T-SF1, a Tat fusion to human splicing factor SF1, was used to activate a branch point The use of engineered dominant negative proteins to 25 sequence (BPS)-GFP reporter (FIG. 1a). When transfected manipulate gene expression can be restricted to adult animals on their own, both T-BIV, and T-SF1 strongly activated using the Small molecule regulated systems described in the only their cognate RNA reporters. Strikingly, however, acti previous section. Expression and/or function of a dominant Vation via the T-BIV-BTAR interaction was strongly negative protein can be switched off during development and inhibited when both T-fusions were co-transfected (3-fold switched on at will in the adult animals. This approach relies 30 activation) whereas activation via the T-SF1-BPS interaction on the expression of the dominant negative protein only; was unafected (170-fold). homologous recombination is not required. Because the Using a more quantitative luciferase reporter, we found dominant negative proteins are trans dominant, there is no that inhibition was remarkably potent, with a stoichiometric concern about germline transmission or homozygosity. These amount of T-SF1 plasmid DNA (5 ng) sufficient to almost issues dramatically affect the time and labor required to go 35 completely block activation mediated by the BIV Tat-BTAR from a poorly characterized gene candidate (a cDNA or EST interaction (FIG. 1b). The dose response of inhibition by clone) to a mouse model. This ability can be used to rapidly T-SF1 mirrors activation of a BPS reporter (FIG. 1b), dem identify and/or validate gene targets for therapeutic interven onstrating that T-SF1 functions as an activator through its tion, generate novel model systems and permit the analysis of cognate RNA-binding site. We confirmed that the high complex physiological phenomena (development, hemato 40 potency observed in the transfection experiments accurately reflected relative protein stoichiometries by Western blot poiesis, transformation, neural function etc.). Chimeric tar analysis of HA-tagged Tat activator and dominant negative geted mice can be derived according to Hogan et al., Manipu proteins (FIG. 6). It is clear that the high potency results from lating the Mouse Embryo. A Laboratory Manual. (1988); the fusion, as SF1 alone does not inhibit Tat activation (data Teratocarcinomas and Embryonic Stem Cells: A Practical 45 not shown) and it is known that the Tat AD without an RBD is Approach, Robertson, ed., (1987); and Capecchi et al., Sci a very weak dominant negative'. Given that several splicing ence 244:1288 (1989). factors, including SF1 and U2AF65, interact with CTD-as All publications and patent applications cited in this speci sociated factors or directly with RNAP II", we hypothesized fication are herein incorporated by reference as if each indi that the SF1 moiety targets the T-fusion to RNAP II. We vidual publication or patent application were specifically and 50 propose a model in which this recruitment step increases the individually indicated to be incorporated by reference. local concentration of the non-activating T-fusion at the HIV Although the foregoing invention has been described in promoter thereby out-competing the wild-type Tat activator Some detail by way of illustration and example for purposes (see below). of clarity of understanding, it will be readily apparent to one If the targeting hypothesis is correct, then T-fusions to of ordinary skill in the art in light of the teachings of this 55 other RNAP II-localized splicing factors might show a simi invention that certain changes and modifications may be lar phenotype. Indeed, T-U2AF65 is an even more potent made thereto without departing from the spirit or scope of the inhibitor (FIG.1c, left panel). U2AF65 fusions to either full appended claims. length Tat or the Tat AD are equally potent (FIG. 5), showing that the Tat RBD is dispensable for the dominant negative 60 function. T-U2AF65 also is a potent inhibitor of Tatactivation EXAMPLES when mediated by the Rev-RRE IIB RNA interaction (FIG. 1C, right panel), further demonstrating that the inhibitor func The following examples are provided by way of illustration tions independently of the RNA-protein interaction. The Tat only and not by way of limitation. Those of skill in the art will AD alone is a poor inhibitor (FIG. 1c), again showing the readily recognize a variety of noncritical parameters that 65 requirement of the targeting moiety. Besides splicing factors, could be changed or modified to yield essentially similar other proteins interact with RNAP II before, during, or after results. pre-initiation complex (PIC) formation, including other US 8,148,129 B2 33 34 RNA-processing proteins that are co-transcriptionally AD. Thus, both an RS domain and a functional Tat AD are recruited to the CTD. T-fusions to some, but not all, of these necessary and Sufficient to generate the potent dominant factors inhibited Tat-mediated activation to different extents, negative phenotype. We envisage a model in which the but none was as potent as TU2AF65 (FIG. 7). U2AF65 RS-domain targets the T-fusion to subnuclear com The specificity of inhibition for the HIV promoter was partments (speckles) where transcription complexes are assessed by measuring effects of T-U2AF65 on other assembling, thereby facilitating the interaction of the Tat AD reporter-activator combinations. No inhibition was observed with one or more factors of the transcriptional machinery in any case, including activation by the P-TEFb-dependent assembling at the HIV promoter. MHC class II transactivator (CIITA) and heat-shock factor 1 (HSF1), as well as p53 and GAL4-VP16, and constitutive 10 Example 3 expression from the cytomegalovirus (CMV) promoter (FIG. 1d). Furthermore, no inhibition of cellular promoters was Recruitment of the Dominant Negative Protein to the observed in stable cell lines expressing T-U2AF65 (FIG. 9). Transcriptional Machinery Example 2 15 To examine the recruitment of T-U2AF65 to the transcrip tional machinery, we first analyzed possible interactions with Effect of Localization of the Dominant Negative RNAP II by co-immunoprecipitation using antibodies against Protein on Inhibition of Transcription the Ser5-phosphorylated CTD (Ser5P-CTD), known as RNAP IIa. T-U2AF65-GFP, as well as the K41A. Tat AD To begin examining the effect of localization on inhibitor mutant, are complexed with RNAP IIa in a RNA-independent activity, we first asked whether nuclear localization alone manner (FIG. 3a). Strikingly, no interaction is seen with might account for Some of its potency, particularly because a T-NLS-GFP lacking the U2AF65 moiety despite the reported variety of T-fusions showed activity, albeit not as strong as interaction of Tat with RNAP II in vitro". Identical results T-U2AF65 (FIG. 7). We generated T-fusions to GFP with or were obtained using antibodies that recognize RNAPII with without a nuclear localization signal (NLS) and observed 25 unphosphorylated CTD (data not shown). Thus, it appears very weak dominant negative activity for the AD fusion alone that the U2AF65 RS moiety localizes the inhibitor to tran (TGFP) and only slightly enhanced inhibition for T-NLS Scription complexes more efficiently than the Tat AD, consis GFP (FIG. 2a). This result is consistent with the mild domi tent with the observations that U2AF65 interacts with RNAP nant negative phenotype observed for a Tat 1-53 truncation II and that fusing an RS domain to a cytoplasmic reporter mutant that deletes part of the RNA-binding domain but still 30 protein results in nuclear localization and interaction with retains an NLS. In contrast, T-U2AF65-GFP is a highly RNAP II'''. The interaction with RNAP II was confirmed by potent inhibitor (FIG.2a), indicating that nuclear localization immunofluorescence, in which TU2AF65-GFP was seen to is not the major factor contributing to potency. TGFP is co-localize with both unphosphorylated and Ser5P-CTD distributed in the cytoplasm and nucleus, like unfused GFP. forms of polymerase (FIG. 3b). Partial co-localization whereas T-NLS-GFP is greater than 95% nuclear and absent 35 (~18%) was observed with SC35, a marker of speckle-asso from the , as expected (FIG. 2a). T-U2AF65-GFP ciated patches'. Consistent with the hypothesis that the RS shows a striking Subnuclear pattern of “speckle-associated domain drives the interaction with RNAP II, T-RS-GFP patches' (FIG.2a). Related patterns are seen with RS-domain showed the same co-localization as the full-length U2AF65 containing proteins, which include U2AF65, promptingus T-fusion (data not shown). In addition to interacting with to examine the domains of T-U2AF65 important for inhibi 40 RNAP II, T-U2AF65-GFP also is complexed to P-TEFb tion. (FIG. 3a), as is the Tat AD fusion without the U2AF65 moi The RS domains of U2AF65 and other splicing factors help ety. The Tat AD K41A mutation, known to abrogate the Tat recruit these proteins to regions of active splicing within the cyclin T1 interaction', eliminates the interaction of nucleus and also are believed to interact with RNAP II T-U2AF65-GFP with P-TEFb, supporting the hypothesis that during transcription complex asembly. The presence of 45 inhibitor potency results from bivalent interactions involving RNAP II and splicing and mRNA-export factors suggests an both the Tat AD and RS domain. active role for the “speckle-associated patches' in mRNA processing, although they are otherwise considered mainly as Example 4 storage sites for factors involved in mRNA metabolism''. To test the possible involvement of RS domains in dominant 50 Targeting of the Dominant Negative Protein to the negative inhibition, we generated a T-fusion lacking the RS HIV Promoter domain (T-U2AF65ARS, which contains U2AF65 residues 91-475) and a second with the RS domain alone (T-RS, which A primary function of Tat is to enhance transcription elon contains U2AF65 residues 2-73). Of these, only T-RS gation but it also participates in pre-initiation complex remained a potent inhibitor (FIG.2b). T-RS shows a speckle 55 assembly'. RNase protection experiments using promoter pattern even more striking than full-length T-U2AF65, with proximal (Pp) and distal (Pd) probes indicate that the T-RS concentrated in only about 10-30 speckles. To confirm T-U2AF65 dominant negative primarily inhibits elongation that the Tat AD also is important for inhibition, we generated (FIG. 3c). Tat transfected into HeLa cells substantially T-U2AF65 and T-RS mutants with a Lys41-to-Ala substitu enhances transcription in the Pd but not Pp region of a tion in the AD that disrupts interactions with transcriptional 60 luciferase reporter (compare lanes 1 and 2), as previously co-activators, particularly P-TEFb'. Both are inactive as reported 3, whereas a stoichiometric amount of co-trans inhibitors despite having the same localization patterns as the fected TU2AF65 reduces transcription in the Pd region to non-mutant versions (FIGS. 2b and 8). U2AF65 RS-domain basal levels but does not effect Pp transcription (lane 3). fusions to other transcriptional ADs, including VP16 and Inhibition is dose responsive (data not shown) and requires E1A, do not inhibit Tat-mediated activation (D’Orso and 65 the U2AF65 moiety as the Tat AD alone shows little inhibi Frankel, unpublished observations), further demonstrating tion (lane 4). We next used chromatin immunoprecipitation the specificity of inhibition and the requirement for the Tat (ChIP) assays to examine recruitment of RNAP II, Tat, and US 8,148,129 B2 35 36 T-U2AF65 to the HIV promoter and to test the hypothesis that viruses, Suggesting that Tat levels in these viruses are limiting the inhibitor is efficiently localized to the promoter. To assess and/or Tat may benefit viral adaptability. In the inhibitor cell complex assembly in an integrated chromatin context, we lines, virus that emerged after 18-20 days displayed slow generated a stable HeLa cell line carrying an LTR-RREIIB replication kinetics and reached a low plateau of p24 expres FFL reporter, which was strongly activated by T-Rev (215 sion that remained constant for at least 110 days (FIGS. 4a fold) and inhibited by T-U2AF65 in a dose-responsive man and 4b) without producing cytophatic effects. Viral stocks ner (FIG.3d). In the absence ofT-Rev. RNAP II is detected in harveted from these cell lines after 30 days displayed identi the Pp but not Pd region (panel 1), implying a block to elon cal growth kinetics as the original stock upon re-infection gation, while RNAP II is seen in both regions following T-Rev (FIG. 10). Sequencing of integrated viral DNA showed no transfection (panel 2), as previously reported''7. The level 10 mutations in the LTR or Tat, indicating that the viruses do not of RNAP II detected in the Pp region increases ~5-fold in the acquire resistance mutations during this time period but presence of Tat, consistent with the proposed role of Tat in rather grow poorly under these conditions of dominant nega transcription complex assembly'. The T-Rev-HA activator tive inhibitor expression. was also detected in the Pp region (panel 2) but, notably, the T-U2AF65-GFP inhibitor showed even higher occupancy 15 Example 6 (panel 3); consistent with the observation that U2AF65 can be detected in the Pp region in the absence of Tat'7. To more Tat RBD is Dispensable for Dominant Negative directly evaluate competition between the activator and Activity inhibitor, we co-transfected both plasmids and observed strong occupancy of TU2AF65-GFP in the Pp region To assess whether the RBD of Tat contributes to the domi whereas no T-Rev-HA could be detected (panel 4). Further nant negative activity, we generated U2AF65 fusions to full more, the Tat AD alone, without the U2AF65 moiety, was not length Tat or Tat, and measured their effects using an LTR detectable at the promoter (panel 5, T-NLS-GFP). Thus, the BTAR-RL reporter and Tat-BIV, activator. Indeed, both ChIP experiments support the hypothesis that the T-U2AF65 T-HIV-U2AF65 and T-U2AF65 inhibited activation inhibitor is recruited to the HIV promoter through an inter 25 more than 10-fold at sub-stoichiometric plasmid DNA levels action with RNAP II, efficiently pre-loading the inhibitor into relative to the activator (FIG. 5a). Tat, without tethered transcription complexes and blocking entry of the Tat activa U2AF65, showed little inhibition. Similarly, full-length Tat is tOr. a weak dominant negative inhibitor of BIV Tat-TAR-medi The specificity of dominant negative inhibition for the HIV ated activation, consistent with a previous report 1. In a promoter is clear (FIG. 1d), but the co-localization data (FIG. 30 converse experiment, activation of an LTR-HTAR-FFL 3b) suggest that a substantial amount of RNAP II interacts reporter by Tat-HIV is potently inhibited by T-U2AF65 with the inhibitor, prompting us to test whether T-U2AF65 is and T-BIV-U2AF65 but not by un-fused Tat, or recruited to other promoters. Offive cellular promoters ana T-BIV (FIG.5b). Additional control experiments showed lyzed by ChIP, including the P-TEFb-dependent MHC class that T-HIV-U2AF65 and T-BIV-U2AF65 fusion pro II and hsp70 promoters, only hsp70 showed any detectable 35 teins activated expression of their cognate reporters to about T-U2AF65-GFP, unlike the high occupancy observed at the 50% of the un-fused protein levels and that expression of HIV promoter (FIG. 3e). These data indicate that the effi non-Tat fused U2AF65 did not inhibit activation (data not ciency of TU2AF65 recruitment involves interactions other shown). than to RNAP II, likely including interactions with PTEFb and other factors in the transcription machinery. 40 Example 7 Example 5 Relative Expression Levels of Tat Activator and Dominant Negative Use of the Tat Dominant Negative to Inhibit HIV Replication 45 Immunofluorescence experiments showed that the T-Rev activator and TU2AF65 dominant negative were expressed The high potency of the Tat dominant negatives and the similarly and localized to the nucleus (FIG. 2a). We analyzed requirement of Tat for viral replication Suggested that they protein levels more quantitatively by Western blot using HA might be effective HIV inhibitors. To analyze this we gener tagged proteins and confirmed that stoichiometric plasmid ated SupT1 lymphocyte cell lines stably expressing 50 levels express similar amounts of protein (FIG. 6). Thus, the T-U2AF65, T-HIV-U2AF65, or T-BIV-U2AF65 high potency of TU2AF65 is striking given that the best dominant negatives or the non-fusion controls, Tat Tat, reported dominant negative Tat inhibitors require more than T-BIV, or U2AF65, and monitored HIV replication rates 5-fold higher inhibitor levels to reduce activation by less than using viruses dependent on either the HIV or BIV Tat-TAR 10-fold 2-4. interactions'. We observed striking specificity of the domi 55 nant negatives in which replication was inhibited only in Example 8 viruses driven by a non-cognate RNA-protein interaction. Expression of TU2AF65, which contains no TAR RNA Inhibition Activities of Other T-Fusions binding domain, markedly Suppressed replication of both viruses compared to the Tat, or U2AF65 controls, with no 60 The potent inhibition observed with T-SF1 and T-U2AF65 p24 antigen detectable until 18-20 days after infection (FIGS. prompted us to evaluate whetherfusions to other transcription 4a and 4b). Expression of T-HIV-U2AF65 or T-BIV or RNA processing factors might also act as dominant nega U2AF65 inhibited replication of the non-cognate virus to a tives. While T-SF1 was slightly less potent than T-U2AF65, a similar extent as TU2AF65 and showed only a slight inhibi fusion to the SR-protein 9G8 (T-9G8) was nearly as potent as tory effect on the cognate virus (FIGS. 4a and 4b). Interest 65 T-U2AF65 (FIG. 7). Fusions to the CstF1 polyadenylation ingly, expression of the Tat or T-BIV activators actually factor known to be recruited to the CTD 5.6 and to an accelerated replication of the cognate, but not non-cognate hnRNPA1 fusion containing RRM RBDs also showed some US 8,148,129 B2 37 38 modest inhibition (about 4 fold). In contrast, fusions to the stable SupT1 populations expressing the full-length Tat moi DNA-binding transcription factors Spl or RelA showed rela ety activated an LTR-HTAR-GFP reporter, varying from 9-20 tively little inhibition (about 2 fold), consistent with a previ fold, while cell lines expressing Tat did not activate. Stable ous report showing little inhibitory effect by fusing Tat to cell populations expressing T-BIV activated an LTR other DNA-binding factors7. TTAF8 also showed no inhi BTAR-GFP reporter about 7-9 fold but not an LTR-HTAR bition, consistent with the proposal that Tat-activation is GFP reporter. The Tat-U2AF65-expressing population exerted through a TFIID-containing TBP complex but inde weakly activated an LTR-BPS-GFP reporter, through its pendent of TBP-associated factors (TAFs) 8. All T-fusions polypyrimidine tract (PPT) binding site 9. This weak activ were nuclear and expressed at similar levels as judged by ity likely reflected the generally lower activation observed indirect immunofluorescence (FIG. 7), except that T-RelA 10 with the U2AF65-PPT interaction 9 and, probably, the low showed more prominent perinuclear localization in the transfection efficiency of the SupT1 cells. Thus, expression of absence of TNF-C. activation. Thus, T-fusions to splicing each Tator Tat-fusion protein could be confirmed by RT-PCR factors containing RS domains (T-U2AF65 and T-9G8) are and functional assays, but expression levels generally the most potent inhibitors. 15 appeared low, as expected for a stable cell population trans Example 9 duced by a retrovirus but not clonally selected 10. Weak expression was further confirmed by Western blot and immu Possible Contribution of Subnuclear Localization to nofluorescence analysis using an anti-Tat antibody where Dominant Negative Activity expression was virtually undetectable (data not shown). We also estimated the activities of the integrated dominant Deleting the RS domain of TU2AF65 eliminates dominant negatives in the SupT1 cell lines using functional assays. negative activity (see T-U2AF65ARS in FIG. 2b) and its Cells were co-transfected with a fixed amount of the LTR subcellular localization is strikingly different (FIG. 2b). HTAR-FFL or LTR-BTAR-FFL reporter and varying concen While TU2AF65 shows speckle-associated patches typical trations of the corresponding Tat activator and levels of inhi of splicing factors, U2AF65ARS is spread throughout the 25 bition were measured. For example, SupT1 cells expressing . To evaluate whether the Tat or U2AF65 moi T-U2AF65 and T-BIV-U2AF65 were co-transfected with eties were responsible for these localization patterns, we first the LTR-HTAR-FFL reporter and HIV Tat, and no significant compared localization of T-U2AF65-GFP, U2FA65-GFP. activation was observed at low levels (0.1-1 ng) of transfected and the inactive T(K41A)-U2AF65-GFP variant (FIG. 8). All activator (FIG. 9a). Significant activation was observed with three are localized similarly in speckles (Spk), implying that 30 higher (5-20 ng) plasmid amounts, further confirming that the U2AF65 drives the localization of the dominant negative and cell lines do not express very large amount of protein and that localization is necessary but not sufficient for inhibition. consequently do not block Tat activity completely. It seems An even more striking Subnuclear localization pattern is probable that more highly expressing dominant negative cell seen for T-RS-GFP bearing only the U2AF65 RS-domain in lines can be identified through cloning that would result in which only a few (10-30) bright clusters are observed (FIG. 35 even more effective viral inhibition than observed (FIG. 4). 8). Again, the Tat AD K41A mutation does not alter its local ization. Interestingly, an RS-GFP fusion lacking the Tat AD is Example 11 no longer localized to speckles but rather to nucleoli (FIG. 8), suggesting that both the AD and RS domains of T-RS con Lack of Dominant Negative Activity on Cellular tribute to its speckle localization in this shorter context. Dele 40 Promoters tion of the RS-domain in both U2AF65 and T-U2AF65 also eliminates localization to speckles and shows a nuclear pat Transcriptional squelching has been described for many tern with nucleolar exclusion (FIG. 8), further highlighting dominant negative transcription factors, such as yeast Gal4, the importance of the RS domain for speckle localization. and herpes simplex virus VP16, where common components 45 of the transcriptional apparatus become “titrated of of pro Example 10 moters 11,12. Typically, these dominant negatives are rather promiscuous because the target co-activators do not need to Dominant Negative Expression Levels and be bound to the specific promoter. For HIV Tat, for example, Functional Activity in Stable Supt1 Populations it has been shown that Tat over-expression leads to decreased 50 transcription from an MHC class II promoter, because both To assess expression levels of the Tat activators (Tat, Tat Tat and the class II transactivator (CIITA) require P-TEFb to and T-BIV) and dominant negative inhibitors function 13. Because the Tat dominant negatives described (T-U2AF65, T-HIV-U2AF65, and T-BIV-U2AF65) here apparently operate via co-transcriptional recruitment to in the stable SupT1 populations used for the viral replication the HIV promoter, we suspected that they might display pro assays, we first determined mRNA steady-state levels for 55 moter specificity, unlike the more traditional dominant nega each protein by quantitative real-time RT-PCR, using two sets tives. Reporter experiments show that T-U2AF65 has speci of primers that amplify Tat or U2AF65 portions of the ficity for the HIV promoter versus other P-TEFb-regulated mRNAs. While the RNA expression levels varied widely promoters (FIG. 1). To further analyze promoter specificity, between samples, all were clearly detectable, with the SupT1 we compared the relative expression levels of nine endog Tat population expressing the highest levels (normalized 60 enous transcripts in the SupT1-Tat (non-inhibitor)- and expression level of 370 units), followed by T-BIV (120 SupT1-T-U2AF65 (inhibitor)-expressing stable cell lines units), Tat (100 units), Tat-U2AF65 and T-BIV using quantitative RT-PCR and observed no significant dif U2AF65 (35 units), and Tat-U2AF65 (7 units). We next char ferences in RNA levels from any of these promoters (FIG. acterized expression in a more functional assay in vivo by 9b). The tested genes encode housekeeping proteins (actin, transfecting each stable cell population with an activatable 65 GAPDH, HPRT1), regulatory factors (TBP. hnRNPA1, GFP reporter, depending on the Tat protein expressed, and EEF1G), and include an MHC class II (HLA-DQA1) and two monitored activity by flow cytometry (data not shown). All other P-TEFb regulated genes (IL-8 and AR) 14. Thus, US 8,148,129 B2 39 40 whereas expression of TU2AF65 effectively blocks Tat acti HIV replication is substantially inhibited by low-level vation and HIV replication, it shows no significant effect on expression of the dominant negative in stable cell lines (FIG. cellular promoters. 4), even without optimizing and selecting for lines with high activity (FIG. 9). It is interesting that these cells establish a Example 12 chronic infection without cytopathic effects, reminiscent of other cellular environments that may resemble latent stages of Virus Emerging from the HIV infection. The balance of Tat clearly affects viral rep Dominant-Negative-Induced Latency-Like State lication rates' and also can drive phenotypic diversity, and Behaves as the Original Stock here we show that expression of the dominant negative pro 10 vides another means to alter the Tat balance. Other dominant We observed that virus eventually emerged after 18-20 negative HIV proteins have been used to suppress HIV rep days in the inhibitor-containing cell lines but with low repli lication, including the nuclear export-deficient Rev M10 cation kinetics and reaching a low steady-state plateau of p24 mutant, but resistance mutations have been found and expression (FIG. 4). No mutations were found in these emer relatively high expression levels are required for inhibition gent viruses in the LTR or Tat coding region (data not shown), 15 despite the oligomeric nature of Rev. It will beinteresting Suggesting the cellular expression of the dominant negative to examine mechanisms by which resistance to the Tat domi inhibitor continuously suppressed replication. To test this, we nant negative might arise and to evaluate its therapeutic harvested viruses that emerged after 30 days and performed a potential. re-infection experiment to compare the kinetics of the origi Methods and Materials nal and emergent viruses. Indeed, identical growth kinetics Transcriptional Activation and Inhibition Reporter Assays were observed when the initial or new viral stocks were used HeLa cells were transfected with GFP or firefly luciferase to infect the SupT1-T-U2AF65 inhibitor cell line, reaching (FFL) reporter plasmids (typically 25 ng), appropriate the same chronic p24-expressing plateau, whereas rapid amounts of Tatactivator and inhibitor plasmids, and 5 ng of a growth was observed for both stocks in the SupT1-Tat, CMV-Renilla luciferase (RL) plasmid using the Polyfect lipid control cells (FIG. 10). As expected, inhibitor-expressing 25 transfection reagent (Qiagen) in a 48-well format. Reporter cells infected at a high m.o. i. (10 versus 1) showed a cyto activity was measured 48 hr post-transfection using a Becton pathic effect, although again slower replication kinetics was Dickinson FACS Calibur (FIG. 1a) or Dual-Glo luciferase observed than in the control cells (data not shown). Thus, assay (Promega). All LTR reporter plasmids used contained even with low inhibitor expression and a high m.o.i. Some an internal ribosome entry site (IRES) upstream of the FFL protective effect still is seen, highlighting the efficacy of the 30 gene to ensure efficient translation irrespective of the 5' UTR inhibitor and the balance between activator and inhibitor sequence used, and RL activity was used to normalize for observed upon transfection of the SupT1 cell lines (FIG. 9a). transfection efficiencies. For experiments presented in FIG. Conclusions 2, cells were transfected with 10 ng of activator and 2.5 or 10 The potent Tat dominant negative inhibitors described in ng of Tat-fusion plasmids. All activation assays were per this work represent a new mechanistic class in which we 35 formed in triplicate, and error bars’ represent the SD of the hypothesize that a transcription factor AD is efficiently Ca. recruited to its promoter via a tethering signal, in this case an Microsopy RS domain, among other specific contacts with the transcrip HeLa or stably-integrated HeLa LTR-RREIIB-FFL cells tional apparatus. Unlike other dominant negatives, these Tat were grown to 50% confluence on glass cover slips, trans inhibitors function at stoichiometric or even sub-stoichiomet 40 fected with 100 ng of plasmid DNAs, fixed in 4% paraform ric levels and do not require the considerable over-expression aldehyde in 1x PBS buffer (pH 7.6) 24 hr post-transfection, typically required for Squelching or other simple competition rinsed twice with PBS, and permeabilized with PBS-Triton mechanisms''. We speculate that their specificity and 0.5% for 10 min at 4° C. Nonspecific antibody sites were potency is imposed by localization, first at the Sub-cellular blocked in 1x PBS, 3% goat serum, and 4% BSA for 1 hr at and sub-nuclear levels and second by efficient recruitment to 45 room temperature, cells were incubated with primary anti the promoter. Ptashne and Gann proposed the concept of bodies for 1 hr at room temperature, washed three times with “regulated localization', where specificity typically is PBS, incubated with appropriate Alexa 488- or Alexa 546 imposed by simple binding interactions between a locator, the coupled secondary antibodies (Molecular Probes) for 1 hr at transcriptional machinery, and the DNA. We propose that room temperature, and washed three times with PBS. Cells combining localization functions within a single polypeptide 50 were mounted on DAPI-containing Vecta-shield slides (Vec can Substantially enhance activity. In the case of the tor Labs). Light microscopy was done using an LSM510 T-U2AF65 inhibitor, it appears that the Tat AD provides the confocal microscope (Zeiss) and images were processed dominant negative function, in part through interactions with using LSM(Zeiss) software. P-TEFb at the HIV promoter, while the RS domain, provides Co-Immunoprecipitation additional localization and timing functions utilizing co-tran 55 To examine association of dominant negative inhibitors Scriptional mechanisms that RNA-processing factors, includ with RNAP II, HeLa cells were transiently transfected with ing SR proteins, use to load into transcription complexes''. T-U2AF65-GFP, T(K41A)-U2AF65-GFP, or T-NLS-GFP. This hypothesis is supported by the observations that RS and nuclear extracts were prepared with RIPA buffer. Half of domain-containing proteins localize to Sub-nuclear speckles, the extract was used directly for the immunoprecipitation and which are thought to anchor splicing factors to the nuclear 60 the remaining half was treated with 1 g of RNAse A, which matrix and facilitate assembly with RNAP II, and that Tat was sufficient to quantitatively digest the RNA from 10° and P-TEFb co-localize to nuclear speckles’. It remains to be HeLa cells. RNAP II was immunoprecipitated using agarose determined if other transcription factors, including those that conjugated to 8WG16 and H14 antibodies overnight at 4°C. do not function at the elongation step, can be efficiently with mild shaking. Similarly, GFP-tagged proteins were localized and assembled into transcription complexes in a 65 immunoprecipitated using agarose-conjugated GFP-antibod similar manner, and if other types of targeting domains may ies. After centrifuging and washing the beads immunocom be used. plexes were dissociated by boiling for 10 min in 2x gel US 8,148,129 B2 41 42 loading buffer, samples were separated by 10% SDSPAGE, Dominant Negative-Expressing SupT1 Cells and Viral Rep transferred to PVDF, and analyzed by Western blot. lication Kinetics. RNase Protection Assay Plasmids expressing Tat Tat, T-BIV, T-U2AF65. HeLa cells were transfected with the plTR-HTAR-FFL T-HIV-U2AF65, T-BIV-U2AF65, and U2AF65 were reporter alone or with activator and inhibitor-expressing plas constructed in a pBMN retroviral vector (kindly provided by mids, total RNA was extracted using TRIZol (Invitrogen), and G. Nolan), using an SV40 promoter to express the Tat or 15ug of each sample was hybridized with proximal and distal Tat-fusion proteins. Plasmids were transfected into ONX probes corresponding to HIV promoter and luciferase ORF packaging cells using the Polyfect reagent, and the retrovirus regions, respectively. The antisense probes were synthesized containing Supernatant recovered after 48 hr was used to 10 transduce human CD4+ SupT1 cells. Populations of stable using a T3/T7 MaxiScript kit (Ambion) from plasmid tem integrants were selected by growing cells in 2 mg/ml G418 plates linearized at a Kipni site, hybridization was performed (Invitrogen) for at least 4 weeks. Relative expression levels with approximately 10,000 cpm of 'P-CTP-labeled probe for each protein were assessed by real-time RT-PCR, tran (in 80% formamide, 40 mM PIPES, 400 mM. NaCl, 1 mM Scriptional activation of transfected reporter plasmids and EDTA) incubated at 42°C. overnight, RNase digestion was 15 Western bloting (Supplementary Information). Each stable performed for 1.5 hr at 30° C. (in 10 mM Tris pH 8.0,300 mM SupT1 population was infected with an HIV Tat-TAR-depen NaCl, 5 mM EDTA, 11 units/ml of RNase A, 11 units/ml dent (R7HTat/HTAR) or BIV Tat-TAR-dependent (R7 RNaseT1), samples were treated with proteinase K, extracted HBTat/BTAR) virus' at an m.o.i of 1. Supernatant samples with phenol/chloroform, and RNA duplexes were precipi were harvested at different intervals following infection and tated with ethanol and glycogen carrier. RNAS were separated the amount of viral replication was monitored by p24 antigen on a 6% polyacrylamide/8 M urea gel and visualized and expression using ELISA (Immuno Diagnostics, Inc.) over a quantified using a Typhoon phosphorimager (Molecular period of 110 days. Each experiment was performed in dupli Dynamics). Experiments were performed in duplicate, with cate and mean values of p24 were calculated. errors bars representing the SD of the mean. RNA. Isolation and Expression Levels by Quantitative Real Selection of a HeLA LTR-RREIIB-FFL Reporter Cell Line 25 Time RT-PCR and ChIP Assays Total RNA was isolated from cells using the Trizol reagent HeLa cells were transfected in 6-well plates with a according to manufacturer instructions (Invitrogen). Ran pcDNA3.1-derived plasmid (Invitrogen) bearing the LTR domly primed cDNA was prepared from 1 ug of total RNA RREIIB-FFL using Polyfect reagent (Qiagen). Clones were using MMULV reverse transcriptase (New England Biolabs). 30 One twentieth of the resultant cDNA was amplified in 35 ul Selected over more than four weeks in D-MEM-10% FBS reactions containing 1.25 units of Taq DNA polymerase supplemented with 750 ug/ml of G418 (Gibco). Twenty (ABI), 1.5 mMMgCl, 300 nM of each primer, 0.5 mM dNTP clones were analyzed for activation by pSV-T-Rev-HA by mix, and 0.2x SYBR green I dye (Molecular Probes) in 1x luciferase assays and a single highly active clone was chosen Taq polymerase buffer. Real-time PCR was performed in an for ChIP analyses. ChIP assays were performed as 35 Opticon 2 DNA Engine (MJ Research) and analyzed using described with minor modifications. HeLa LTR-RREIIB the Ct method (Applied Biosystems Prism 7700). FFL reporter cells were transfected with various expressor Expression Analysis by Western Blot plasmids (5 ug each) using 30 ul of Lipofectamine 2000 To more quantitatively assess relative inhibitor and activa (Invitrogen) per 25 cm culture dish, incubated for 36 hr, and tor expression levels, HeLa cells were co-transfected with washed in PBS. Chromatin was cross-linked with 1% form 40 300 ng of pEGFPN3 (Clontech) and either 1.35ug of pSV2 aldehyde for 15 min at RT and the reaction stopped by adding T-Rev-HA, 1.35 ug pSV2-T-U2AF65-HA, or both plasmids glycine to 125 mM. Cells were washed with PBS and har in 6-well plates. Nuclear extracts were prepared using NE vested in RIPA buffer, and samples were sonicated to gener PER reagents (Pierce), samples were separated on a 12.5% ate DNA fragments <500 bp. For immuno-precipitations, 1 SDS-PAGE gel, transferred to nitrocellulose, and probed with mg of protein extract was pre-cleared for 2 hr with 40 ul of a 45 anti-HA, anti-GFP, or anti-nucleolin antibodies. 50% slurry of 50:50 protein A/G-agarose and -then incubated Functional Analysis of Protein Expression and Activity in with protein A/G-agarose and the appropriate antibodies SupT1 Cell Lines overnight at 4°C. preblocked with 1 mg/ml and 0.3 mg/ml of Stable SupT1 G418-resistant cell populations (3x10° cells) salmon sperm DNA. Immunocomplexes were recovered were transfected by electroporation (Bio-Rad, 250V, 0.975 using anti-rabbit IgG/protein A/G-agarose beads (Santa 50 uF) with LTR-HTAR-GFP or LTR-BPS-BTAR-GFP report Cruz), beads were washed twice with RIPA buffer, four times ers to assess the activities of integrated plasmids expressing with ChIP wash buffer (100 mM Tris-HCl, pH 8.5, 500 mM Tat or T-fusion proteins. After 48 hours, cells were analyzed LiCL, 1% v?v Nonidet P-40, 1% w/v deoxycholic acid), twice by flow cytometry and GFP activity was quantitated using with RIPA buffer, and twice with 1x TE buffer. Immunocom Celquest software (Becton Dickinson). Populations express plexes were eluted in 1% SDS for 10 min at 65° C. and 55 ing Tat and derivatives were transfected with LTR-HTAR cross-linking was reversed by adjusting to 200 mMNaCl and GFP. populations expressing T-BIV, and derivatives were incubating for 5 hr at 65° C. A fraction of purified DNA was transfected with LTR-BPS-BTAR-GFP, and populations used for PCR amplification, with 25-32 cycles performed in expressing U2AF65 fusions were transfected with LTR-BPS the exponential range depending on the particular primers BTAR-GFP, which contains a BPS and PPT that binds and antibodies. To ensure linearity, control PCR reactions 60 U2AF65 cooperatively with SF1 9. For luciferase assays, were performed for one cycleusing twice and half the amount we used the LTR-HTAR-FFL or LTR-BTAR-FFL reporters of sample. PCR products (100-250 bp) were quantified by and CMV-RL as an internal control for data normalization. incorporation of SyBr Green and fluorescence detection (MJ Genomic DNA Extraction from SupT1-Infected Cells and Research) and by visualization on 2% agarose gels stained Viral Genome Sequencing with ethidium bromide, using PCR products from known 65 SupT1-T-U2AF65, SupT1-T-BIV-U2AF65, and input DNAS as standards and IQMac 1.2 for analysis. Primer SupT1-T-HIV-U2AF65 infected populations (about sequences are provided in Supplementary Information. 1x10° cells) were harvested 25 days post-infection and US 8,148,129 B2 43 44 genomic DNA was extracted using Flexigene according to TABLE 1-continued manufacturer instructions (Qiagen). DNA was amplified by PCR using Turbo Pfu (Stratagene), with primer pair specific Nuclear localized proteins Last updated: 2006 Feb. 26 to regions of the HIV LTR promoter and surrounding Tat nucleus coding sequence. PCR-amplified DNA was gel purified 5 (Qiagen) and cloned into a TOPO vector (Invitrogen). Eight GO:0005634: nucleus (<10723) GO:0043226: organelle (<55511) clones from each cell population were sequenced, and GO:0043229: intracellular organelle (<55495) sequences were compared to the original viral isolate, HXB2, GO:0043231: intracellular membrane-bound organelle (<51579) using the NCBI BLAST algorithm. GO:0005634: nucleus (<10723) 10 GO:0043227: membrane-bound organelle (<51596) SUPPLEMENTARY REFERENCES GO:0043231: intracellular membrane-bound organelle (<51579) GO:0005634: nucleus (<10723) External References 1. Carol, R. et al., J Virol, 66:2000-7 (1992). 2. Gren, M. et al., Cell, 58:215-23 (1989). InterPro (333) 3 . Pearson, L. et al., Proc Natl Acad Sci USA, 87:5079-83 15 MIPS funcat (1) Pfam (221) (1990). PRINTS (94) . Caputo, A. et al., Gene Ther. 3:235-45 (1996). ProDom (25) . McCracken, S. et al., Nature, 385:357-61 (1997). PROSITE (99) . Fong, N. & Bentley, D. L., Gene Dev, 15:1783-95 (2001). SMART (46) SP KW (1) . Fraisier, C. et al., Gene Ther 5:946-54 (1998). TIGR role (1) . Raha, T. et al., PLoS Biol, 3:e44 (2005). All Gene Product Associations 9. Peled-Zehaviet al., Mol Cell Biol, 21:5232-41 (2001). (1790 results) 10. Hamm, T. E. et al., J Virol, 73:5741-7 (1999). Get ALL associations here: 11. Hope, I. A. & Struhl, K., Cell, 46:885-94 (1986). Direct Associations All Associations All Associations With Terms 12. Friedman, A. et al., Nature, 335:452-4 (1988). 25 Filter Associations 13. Kanazawa, S. et al., Immunity, 12:61-70 (2000). DataSource 14. Luecke, H. F. &Yamamoto, K.R., Gene Dev, 19:1116-27 AllFlyBaseSGDMGIgenedb spombeUniProtTAIRdictyBaseWorm baseFinsemblRGDTIGR CMRTIGRFAMSTIGR Athl TIGR Tba1 (2005). Gramenegenedb tsetsegemedb thruceigenedb pfalciparumgenedb major ZFIN TABLE 1. 30 Evidence Code All Curator ApprovedICIMPIGIIPIISSIDAIEPTASNAS Nuclear localized proteins Species Last updated: 2006 Feb. 26 All A. japonica A. niger A. platyrhynchos A. thaliana A. trivirgatus nucleus B. anthracis str. Am B. coronavirus B. indicus B. mori B. taurus C. aethiops C. albicans C. briggsae C. burnetii RSA 493 C. carpio Accession: GO:0005634 C. elegans Cfamiliaris C. griseus Ciacchus C. jejuni RM1221 Ontology: cellular component 35 C. porcelius C. torquatus atys D. discoideum D. erecta D. ethenogenes 195 Synonyms: None D. mauritiana D. melanogaster D. pseudoobscura D. rerio D. Sechelia Definition: D. simulans D. sp. D. virilis D. yakuba E. cabalius F. Catus G. gallus A membrane-bounded organelle of eukaryotic cells in which G. gorilla G. gorilla gorilla G. sulfurreducens PCH. lar H. sapiens are housed and replicated. In most cells, the nucleus contains all of the L. major L. monocytogenes str M. at trait is M. capsulatus Str. B cell's chromosomes except the organellar chromosomes, and is the site of 40 M. fascicularis M. fuscata fiscata M. monax M. mulatta RNA synthesis and processing. In some species, or in specialized cell M. musculus M. musculus castianett M. musculus domestic M. musculius types, RNA metabolism or DNA replication may be absent. Comment: None molossin M. musculus musculus M. natalensis M. nemestrina M. parviflora Term Lineage M. tinguiculatus O. aries O. clinicatius O. kitabensis O. longistaminata O. mykiss O. nivara O. officinalis O. sativa O. sativa (indica cu O. sativa Graphical View (japonica O. vulgaris Paniibis Pfalcipartin P. monodon P. pygmaeus all: all (<167657) 45 Psativlin Psyringae pv. toma Psyringae pv. toma Piroglodytes GO:0005575: cellular component (<105038) Panictim R. norvegicus R. sp. S. cerevisiae S. Coronavirus S. oedipus GO:0005623: cell (<75863) S. Oneidensis S. Oneidensis MR-1 S. pombe S. pomeroyi DSS-3 GO:0005622: intracellular (<61387) S. sciureus S. scrofa S. sp. PCC 6803 T. bruceii T. bruceii TREU927 GO:0043229: intracellular organelle (<55495) T. Cambridgei T. vulpecula V. arvensis V. cholerae O1 biova V. Odorata GO:0043231: intracellular membrane-bound organelle (<51579) X. laevis X. tropicalis GO:0005634: nucleus (<10723)

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence 2ASD HUMAN Splice Isoform Delta-1 of UniProt TAS PMID: 8703017 Serine/threonine protein phosphatase 2A, 56 kDa regulatory Subunit, delta isoform, protein from Homo sapiens 2ASG HUMAN Splice Isoform Gamma-3 UniProt IDA PMID: 8703017 of Serine/threonine Protein phosphatase 2A, 56 kDa regulatory Subunit, gamma isoform, protein from Homo sapiens US 8,148,129 B2 45 46 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence 2AAA HUMAN Serine/threonine protein UniProt NAS PMID: 11007961 phosphatase 2A, 65 kDa regulatory subunit A, alpha isoform, protein from Homo sapiens 2AAB. HUMAN Serine/threonine protein UniProt ISS UniProt: P3O154 phosphatase 2A, 65 kDa regulatory subunit A, beta isoform, protein from Homo sapiens 2ACCHUMAN Protein phosphatase 2, UniProt TAS PMID: 10629059 regulatory subunit B", isoform 1, protein from Homo sapiens 4ET HUMAN Splice Isoform 1 of UniProt TAS PMID: 10856257 Eukaryotic translation initiation factor 4E transporter, protein from Homo sapiens AATF HUMAN Protein AATF, protein UniProt IDA PMID: 12429849 rom Homo sapiens AB2BP HUMAN Splice Isoform 1 of UniProt NAS PMID: 1083.3507 Amyloid beta A4 protein binding family A member 2-binding protein, protein rom Homo sapiens ABCCD HUMAN Splice Isoform 1 of UniProt NAS UniProt: Q9NSE7 Putative ATP-binding cassette transporter C13, protein from Homo sapiens ABL1 HUMAN Splice Isoform IA of UniProt NAS PMID: 8242749 Proto-oncogene tyrosine protein kinase ABL1, protein from Homo sapiens ACINUHUMAN Splice Isoform 1 of UniProt IDA PMID: 10490026 Apoptotic chromatin condensation inducer in he nucleus, protein from Homo sapiens ACL6B HUMAN Actin-like protein 6B, UniPro IDA PMID: 1038.0635 protein from Homo sapiens ACTN4 HUMAN Alpha-actinin 4, protein UniPro TAS PMID: 9508771 rom Homo sapiens ADA1O HUMAN ADAM 10 precursor, UniPro ISS UniProt: O14672 protein from Homo sapiens ADA2 HUMAN Transcriptional adapter 2 UniPro TAS PMID: 8552O87 ike, protein from Homo Sapiens AF9 HUMAN Protein AF-9, protein UniPro TAS PMID: 8506309 rom Homo sapiens AFF3 HUMAN AF4/FMR2 family member UniPro TAS PMID: 85.55498 3, protein from Homo sapiens AHNK HUMAN Neuroblast differentiation UniPro NAS PMID: 1608957 associated protein AHNAK, protein from Homo sapiens AHRHUMAN Aryl hydrocarbon receptor UniPro IDA PMID: 10395741 precursor, protein from Homo sapiens AIF1 HUMAN Allograft inflammatory UniPro TAS PMID: 9614071 actor 1, protein from Homo sapiens AIPL1 HUMAN Aryl-hydrocarbon UniPro IDA PMID: 12374,762 interacting protein-like 1, protein from Homo sapiens AIRE HUMAN Splice Isoform 1 of UniPro NAS PMID: 9398.840 , protein from Homo sapiens AKAP8 HUMAN A-kinase anchor protein 8, UniPro TAS PMID: 94.73.338 protein from Homo sapiens AKIPHUMAN Aurora kinase A-interacting UniPro IDA PMID: 12244051 protein, protein from Homo Sapiens AKP8L, HUMAN A-kinase anchor protein UniPro TAS PMID:10761695 ike protein 8, protein rom Homo sapiens US 8,148,129 B2 47 48 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence ALPHUMAN N-acetyltransferase-like Oni NAS PM D: 11214970 protein, protein from Homo sapiens ALX4 HUMAN protein Oni NAS PM D: 11137991 aristaless-like 4, protein rom Homo sapiens AN32A HUMAN Acidic leucine-rich Oni IDA PM D: 11555662 nuclear phosphoprotein 32 amily member A, protein rom Homo sapiens AN32E HUMAN Acidic leucine-rich Oni ISS Oni Prot: Q9BTTO nuclear phosphoprotein 32 amily member E, protein rom Homo sapiens ANDR HUMAN Androgen receptor, Oni IDA PM D: 15572661 protein from Homo sapiens ANKR2 HUMAN Splice Isoform 1 of Ankyrin Oni ISS PM D: 1204OOS repeat domain protein 2, protein from Homo sapiens ANM1 HUMAN Splice Isoform 1 of Protein Oni IDA PM D: 10749851 arginine N-methyltransferase 1, protein from Homo sapiens ANM2 HUMAN Protein arginine N Oni TAS PM D: 954-5638 methyltransferase 2, protein from Homo sapiens AP2A HUMAN OTTHUMPOOOOOO16011, Oni TAS PM D: 83.21221 protein from Homo sapiens APBB1 HUMAN Splice Isoform 1 of Oni ISS Oni Prot: OOO213 Amyloid beta A4 precursor protein-binding family B member 1, protein from Homo Sapiens ISS UniProt: Q96A93 APBB2 HUMAN Splice Isoform 1 of UniProt ISS UniProt: Q92870 Amyloid beta A4 precursor protein-binding family B member 2, protein from Homo sapiens APBP2 HUMAN Amyloid protein-binding Oni NAS PM D: 11742091 protein 2, protein from Homo sapiens APCHUMAN Splice Isoform Long of Oni IDA PM D: 12072559 Adenomatous polyposis coli protein, protein from Homo sapiens APEG1 HUMAN Hypothetical protein Oni TAS PM D: 8663449 FLJ46856, protein from Homo sapiens APEX1 HUMAN DNA-(apurinic or Oni IDA PM D: 91.19221 apyrimidinic site) lyase, protein from Homo sapiens APLP2 HUMAN Splice Isoform 1 of Oni IDA PM D: 7702756 Amyloid-like protein 2 precursor, protein from Homo sapiens ARD1EHEHUMAN N-terminal acetyltransferase Oni IDA PM D: 15496142 complex ARD1 subunit homolog, protein from Homo Sapiens ARI1A HUMAN Splice Isoform 1 of AT Oni NAS Oni Prot: O14497 rich interactive domain containing protein 1A, protein from Homo sapiens ARI2 HUMAN Ariadne-2 protein homolog, Oni TAS PM D: 10422847 protein from Homo sapiens ARI3A HUMAN AT-rich interactive Oni NAS Oni Prot: Q99856 domain-containing protein 3A, protein from Homo Sapiens ARI4A HUMAN Splice Isoform I of AT Oni TAS PM D: 8414S17 rich interactive domain containing protein 4A, protein from Homo sapiens US 8,148,129 B2 49 50 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence IDA PM : 11283269 ARISBHUMAN Splice Isoform 1 of AT Oni IC PM : 15640446 rich interactive domain containing protein 5B, protein from Homo sapiens ARL4A HUMAN ADP-ribosylation factor Oni TAS PM : 10462O49 ike protein 4A, protein rom Homo sapiens ARL4CHUMAN ADP ribosylation factor Oni TAS PM : 10462O49 ike protein 7, protein rom Homo sapiens ARNT2 HUMAN Aryl hydrocarbon receptor Oni IDA PM : 12239177 nuclear translocator 2, protein from Homo sapiens ARNT HUMAN Splice Isoform Long of Oni TAS PM : 1317062 Aryl hydrocarbon receptor nuclear translocator, protein from Homo sapiens ASCL2 HUMAN Achaete-scute homolog2, Oni NAS PM : 8751384 protein from Homo sapiens ASH2L HUMAN Splice Isoform 1 of Set1/ Oni IDA PM : 151991 22 Asb2 histone methyltransferase complex subunit ASH2, protein from Homo sapiens ASPP1 HUMAN Apoptosis stimulating of Oni IDA PM : 11684O14 p53 protein 1, protein rom Homo sapiens ATBF1 HUMAN Splice Isoform A of Oni TAS PM : 1719379 Alpha-fetoprotein enhancer binding protein, protein from Homo sapiens ATE1 HUMAN Splice Isoform ATE1-1 of Oni IDA PM : 98.58543 Arginyl-tRNA-protein transferase 1, protein rom Homo sapiens ATF4 HUMAN Cyclic AMP-dependent Oni ISS Oni Prot: P18848 transcription factor ATF-4, protein from Homo sapiens ATF6B HUMAN Splice Isoform 1 of Cyclic Oni TAS PM D: 8586413 AMP-dependent tran scription factor ATF-6 beta, protein from Homo sapiens ATN1 HUMAN Atropbin-1, protein from Oni TAS PM : 10814707 Homo sapiens ATRXHUMAN Splice Isoform 4 of Oni TAS PM : 7874112 Transcriptional regulator ATRX, protein from Homo sapiens ATX1 HUMAN Ataxin-1, protein from Oni TAS PM : 7647801 Homo sapiens ATX2 HUMAN AtAXin 2, protein from Oni TAS PM : 10973246 Homo sapiens ATX7 HUMAN Splice Isoform a of Ataxin Oni TAS PM : 10441328 7, protein from Homo sapiens AURKCHUMAN Splice Isoform 1 of Oni TAS PM : 1 OO66797 Serine/threonine-protein kinase 13, protein from Homo sapiens AXN1 HUMAN Axin 1, protein from Oni IDA PM : 12072559 Homo sapiens AXN2 HUMAN Axin-2, protein from Oni IDA PM : 12072559 Homo sapiens BAP1 HUMAN Ubiquitin carboxyl Oni TAS PM : 9528852 terminal hydrolase BAP1, protein from Homo sapiens BARD1 HUMAN BRCA1-associated RING Oni IMP PM : 15632137 domain protein 1, protein from Homo sapiens IDA PM : 15265711 BAZ1B. HUMAN Splice Isoform 1 of Oni NAS PM : 11124022 Bromodomain adjacent to domain protein 1B, protein from Homo Sapiens US 8,148,129 B2 51 52 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence BC11A HUMAN Splice Isoform 1 of B-cell Oni NAS UniProt: Q9H165 ymphoma leukemia 11A, protein from Homo sapiens BCL6 HUMAN B-cell lymphoma 6 protein, Oni IDA PMID: 10898795 protein from Homo sapiens BCLF1 HUMAN Splice Isoform 1 of Bcl-2- Oni NAS UniProt: Q9NYF8 associated transcription actor 1, protein from Homo sapiens BCOR HUMAN Splice Isoform 1 of BCoR Oni IDA PMID: 10898795 protein, protein from Homo sapiens BHLH2 HUMAN Class B basic helix-loop Oni NAS PMID: 924O428 helix protein 2, protein rom Homo sapiens BHLH3 HUMAN Class B basic helix-loop Oni NAS UniProt: Q9COJ9 helix protein 3, protein rom Homo sapiens BI1 HUMAN Bax inhibitor-1, protein Oni TAS PMID: 853OO40 rom Homo sapiens BINCA HUMAN Splice Isoform 1 of Bcl10 Oni NAS PMID: 15637807 interacting CARD protein, protein from Homo sapiens BLMH HUMAN Bleomycin hydrolase, Oni TAS PMID: 86396.21 protein from Homo sapiens BNC1 HUMAN Zinc finger protein Oni TAS PMID: 8O34748 basonuclin-1, protein rom Homo sapiens BNIPL HUMAN Splice Isoform 1 of Bcl Oni IDA PMID: 11741952 2fadenovirus E1B 19 kDa interacting protein 2-like protein, protein from Homo sapiens BRCA1 HUMAN Breast cancer type 1 Oni TAS PMID: 109183O3 Susceptibility protein, protein from Homo sapiens BRCA2 HUMAN Breast cancer type 2 Oni IDA PMID: 956O268 Susceptibility protein, protein from Homo sapiens BRD1 HUMAN Bromodomain-containing Oni TAS PMID: 106O2SO3 protein 1, protein from Homo sapiens BRD3 HUMAN Splice Isoform 1 of Oni NAS UniProt: Q15059 Bromodomain-containing protein 3, protein from Homo sapiens BRD8 HUMAN Splice Isoform 1 of Oni NAS PMID: 8611617 Bromodomain-containing protein 8, protein from Homo sapiens BRPF1 HUMAN Peregrin, protein from Oni TAS PMID: 7906940 Homo sapiens BRSK1 HUMAN Splice Isoform 1 of BR Oni IDA PMID: 15150265 serine/threonine-protein kinase 1, protein from Homo sapiens BSN HUMAN Bassoon protein, protein Oni TAS PMID: 9806829 rom Homo sapiens BT3L2 HUMAN Transcription factor BTF3 Oni NAS UniProt: Q13891 homolog2, protein from Homo sapiens BT3L3 HUMAN Transcription factor BTF3 Oni NAS UniProt: Q13892 homolog 3, protein from Homo sapiens BTAF1 HUMAN TATA-binding-protein Oni NAS UniProt: O14981 associated factor 172, protein from Homo sapiens BTG1 HUMAN BTG1 protein, protein Oni IMP PMID: 1142O681 rom Homo sapiens IEP PMID:982O826 CABIN HUMAN Calcineurin-binding protein Oni NAS UniProt: Q9Y6JO Cabin 1, protein rom Homo sapiens CAF1B HUMAN Chromatin assembly factor Oni NAS PMID: 9614144 subunit B, protein from Homo sapiens US 8,148,129 B2 53 54 -continued

Symbol Evi Refer Qualifier Sequence/GOst Information Source dence ence CARM1 HUMAN Splice Isoform 1 of Oni IDA PM D: 15221992 Histone-arginine methyltransferase CARM1, protein from Homo sapiens CASCS HUMAN Splice Isoform 1 of Oni IDA PM D: 10980622 Cancer susceptibility candidate gene 5 protein, protein from Homo sapiens CASL HUMAN Enhancer of filamentation Oni TAS PM D: 86681.48 1, protein from Homo Sapiens CBP HUMAN CREB-binding protein, Oni TAS PM D: 79132O7 protein from Homo sapiens CBX2 HUMAN Splice Isoform 1 of Oni NAS Oni Prot: Q14781 Chromobox protein homolog 2, protein from Homo sapiens CBX3 HUMAN Chromobox protein homolog Oni TAS PM D: 86.63349 3, protein from Homo sapiens CBX4 HUMAN Chromobox homolog 4, Oni TAS PM D: 9315667 protein from Homo sapiens CC14A HUMAN Splice Isoform 1 of Dual Oni TAS PM D: 936,7992 specificity protein phosphatase CDC14A, protein from Homo sapiens CC14B HUMAN Splice Isoform 2 of Dual Oni IDA PM D: 936,7992 specificity protein phosphatase CDC14B, protein from Homo sapiens CC4SL HUMAN CDC45-related protein, Oni TAS PM D: 966O782 protein from Homo sapiens CCNE1 HUMAN Splice Isoform E1L of Oni NAS Oni Prot: P24864 G1/S-specific cyclin-E1, protein from Homo sapiens CCNH HUMAN Cyclin-H, protein from Oni TAS PM D: 7936635 Homo sapiens CCP1 HUMAN Calcipressin 1 large isoform, Oni TAS PM D: 8S95418 protein from Homo sapiens CD2A1 HUMAN Splice Isoform 1 of Cyclin Oni NR Oni Prot: P42771 dependent kinase inhibitor 2A, isoforms 1/2/3, protein from Homo sapiens CD2L1 HUMAN Splice Isoform SV9 of Oni IEP PM D: 8195233 PITSLRE serine/threonine-protein kinase CDC2L1, protein rom Homo sapiens CD2L2 HUMAN Splice Isoform SV6 of UniProt IEP PMID: 8195233 PITSLRE serine/threonine-protein kinase CDC2L2, protein rom Homo sapiens CD2L7 HUMAN Cell division cycle 2-related Oni IDA PM D: 116833.87 protein kinase 7, protein rom Homo sapiens CDC2HUMAN Hypothetical protein Oni TAS PM D:1076.7298 DKFZp686L20222, protein from Homo sapiens CDC6HUMAN Cell division control protein Oni TAS PM D: 95.66895 6 homolog, protein from Homo sapiens CDC7HUMAN Cell division cycle 7 Oni TAS PM related protein kinase, protein from Homo sapiens CDCAS HUMAN Sororin, protein from Oni ISS Oni Prot: Q96FF9 Homo sapiens CDK1 HUMAN Cyclin-dependent kinase Oni TAS PM D: 9506968 2-associated protein 1, protein from Homo sapiens CDK2 HUMAN Cell division protein kinase Oni TAS PM D:1076.7298 2, protein from Homo sapiens CDKS HUMAN Cell division protein kinase Oni ISS PROT: Q00535 5, protein from Homo sapiens CDK7 HUMAN Cell division protein kinase Oni TAS PM D: 7936635 7, protein from Homo sapiens CDK9 HUMAN Splice Isoform 1 of Cell Oni TAS PM D: 817.0997 division protein kinase 9, protein from Homo sapiens US 8,148,129 B2 55 56 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence DN1A HUMAN Cyclin-dependent kinase Oni TAS PM D: 96.60939 inhibitor 1, protein from Homo sapiens DN1B. HUMAN Cyclin-dependent kinase Oni IDA PM D: 12093740 inhibitor 1B, protein from Homo sapiens DN2CHUMAN Cyclin-dependent kinase 6 Oni NR Oni Prot: P42773 inhibitor, protein from Homo sapiens DN2D HUMAN Cyclin-dependent kinase 4 Oni TAS PM D: 8741839 inhibitor D, protein from Homo sapiens DR2 HUMAN Cerebellar degeneration Oni NAS Oni Prot: Q13977 related protein 2, protein rom Homo sapiens DT1 HUMAN DNA replication factor Cdt1, Oni IDA PM D: 11125.146 protein from Homo sapiens EBPA HUMAN CCAAT?enhancer binding Oni NAS PM D: 7575.576 protein alpha, protein rom Homo sapiens EBPB HUMAN CCAAT?enhancer binding Oni TAS PM D: 10821850 protein beta, protein from Homo sapiens EBPG HUMAN CCAAT?enhancer binding Oni ISS PM D: 7SO1458 protein gamma, protein rom Homo sapiens EBPZ HUMAN CCAAT?enhancer binding Oni TAS PM D: 2247079 protein Zeta, protein from Homo sapiens ENA1 HUMAN Centaurin-alpha 1, Oni IDA PM D: 10448098 protein from Homo sapiens IDA PM : 10333475 ENG1 HUMAN Centaurin-gamma 1, Oni ISS PM : 11136977 protein from Homo sapiens ENPA HUMAN protein A, Oni TAS PM D: 7962047 protein from Homo sapiens ENPE HUMAN Centromere protein E, Oni IMP PM D: 976342O protein from Homo sapiens EZ1 HUMAN Zinc finger protein CeZanne, Oni IDA PM D: 11463,333 protein from Homo sapiens HD6 HUMAN Splice Isoform 1 O Oni NAS PM D: 12592387 Chromodomain-helicase DNA-binding protein 6, protein from Homo sapiens HD8 HUMAN Chromodomain-helicase Oni NAS Oni Prot: Q9HCK8 DNA-binding protein 8, protein from Homo sapiens HK2 HUMAN Splice Isoform 1 O Oni NAS Oni Prot: O96O17 Serine/threonine-protein kinase Chk2, protein rom Homo sapiens TE2 HUMAN Splice Isoform 2 o Oni NAS PM D: 10552.932 Cbp/p300-interacting transactivator 2, protein rom Homo sapiens Z1 HUMAN Splice Isoform 1 of Cip1 Oni TAS PM D: 105293.85 interacting zinc finger protein, protein from Homo sapiens KOO1 HUMAN Protein C11orf1, protein Oni NAS PM D: 10873569 rom Homo sapiens LAT HUMAN Splice Isoform M of Oni TAS PM D: 10861222 Choline O-acetyltransferase, protein from Homo sapiens LIC2HUMAN Chloride intracellular Oni TAS PM D: 16130169 channel protein 2, protein rom Homo sapiens LIC3 HUMAN Chloride intracellular Oni IDA PM D: 9880541 channel protein 3, protein rom Homo sapiens CNOO4 HUMAN Protein C14orf4, protein Oni NAS PM D: 11095982 rom Homo sapiens CND1 HUMAN complex subunit Oni NAS PM D: 10958.694 protein from Homo sapiens CND3 HUMAN Condensin complex subunit Oni NAS PM D: 1091OO72 3, protein from Homo sapiens US 8,148,129 B2 57 58 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence CNOT2 HUMAN Splice Isoform 1 of Oni NAS PM D: 10637334 CCR4-NOT transcription complex subunit 2, protein from Homo sapiens CNOT7 HUMAN CCR4-NOT transcription Oni IEP PM D:982O826 complex subunit 7, protein from Homo sapiens CNOT8 HUMAN CCR4-NOT transcription Oni NAS PM D: 10036195 complex subunit 8, protein from Homo sapiens COF1 HUMAN Cofilin, non-muscle isoform, Oni TAS PM D: 16130169 protein from Homo sapiens COT2 HUMAN COUP transcription factor Oni TAS PM D: 1899293 2, protein from Homo sapiens CREB1 HUMAN Splice Isoform. CREB-A Oni TAS PM D: 109.09971 of cAMP response element binding protein, protein from Homo sapiens CREB3 HUMAN Splice Isoform 1 of Cyclic Oni NAS PM D: 9271389 AMP-responsive element binding protein 3, protein rom Homo sapiens HUMAN Splice Isoform 1 of cAMP Oni IC PM D: 8378O84 response element-binding protein 5, protein from Homo sapiens CREM HUMAN CAMP responsive element Oni NAS Oni Prot: Q16114 modulator, protein from Homo sapiens CRKHUMAN Splice Isoform Crk-II of Oni TAS PM D: 10748058 Proto-oncogene C-crk, protein from Homo sapiens CRNL1 HUMAN Crn, crooked neck-like 1, Oni NAS Oni Prot: Q9BZI9 protein from Homo sapiens CRSP2 HUMAN CRSP complex subunit 2, Oni IDA PM D: 1023S267 protein from Homo sapiens CRSP6 HUMAN CRSP complex subunit 6, Oni IDA PM D: 1023S267 protein from Homo sapiens CRYAB HUMAN Alpha crystallin B chain, Oni NR Oni Prot: PO2S11 protein from Homo sapiens CSDC2HUMAN Cold shock domain protein Oni NAS Oni C2, protein from Homo sapiens CSE1 HUMAN Splice Isoform 1 of Importin Oni TAS PM D: 93.231.34 alpha re-exporter, protein from Homo sapiens CSR2B HUMAN Splice Isoform 1 of Oni IPI PM D: 10924.333 Cysteine-rich protein 2 binding protein, protein rom Homo sapiens CSRP2 HUMAN Cysteine and glycine-rich Oni NAS PM D: 96215313 protein 2, protein from Homo sapiens CSTF1 HUMAN Cleavage stimulation Oni TAS PM D: 1358.884 actor, 50 kDa subunit, protein from Homo sapiens CSTF2HUMAN Splice Isoform 1 of Oni TAS PM D: 1741396 Cleavage stimulation actor, 64 kDa subunit, protein from Homo sapiens CSTF3 HUMAN Cleavage stimulation Oni TAS PM D: 7984242 actor, 77 kDa subunit, protein from Homo sapiens CTCF HUMAN Transcriptional repressor Oni IDA PM D: 94O7128 CTCF, protein from Homo sapiens CTDS1 HUMAN Carboxy-terminal domain Oni TAS PM D: 10967134 RNA polymerase II polypeptide A Small phosphatase 1, protein rom Homo sapiens CTNB1 HUMAN Splice Isoform 1 of Beta UniProt TAS PMID: 90654O1 catenin, protein from Homo sapiens CTND1 HUMAN Splice Isoform 1 ABC of UniProt NAS PMID: 983.17528 Catenin delta-1, protein from Homo sapiens US 8,148,129 B2 59 60 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence CUGB1 HUMAN Splice Isoform 2 of CUG. Oni NAS PM D: 10893.231 triplet repeat RNA binding protein 1, protein from Homo sapiens CUTL2 HUMAN Homeobox protein cut-like Oni NAS Oni Prot: O14529 2, protein from Homo sapiens CX4NB. HUMAN Neighbor of COX4, Oni TAS PM D: 10337626 protein from Homo sapiens CXCC1 HUMAN CpG binding protein, Oni IDA PM D: 10688657 protein from Homo sapiens DAPK3 HUMAN Death-associated protein Oni ISS Oni Prot: O43293 kinase 3, protein from Homo sapiens DAXXHUMAN Splice Isoform 1 of Death Oni IDA PM D: 15572661 domain-associated protein 6, protein from Homo sapiens DCTN4 HUMAN Dynactin Subunit 4, Oni TAS PM D: 10671518 protein from Homo sapiens DDX17 HUMAN Splice Isoform 1 of Oni TAS PM D: 8871553 Probable ATP-dependent RNA helicase DDX17, protein from Homo sapiens DDX39 HUMAN ATP-dependent RNA Oni ISS PM D: 15047853 helicase DDX39, protein rom Homo sapiens DDX3X HUMAN ATP-dependent RNA Oni IDA PM D: 10329544 helicase DDX3X, protein rom Homo sapiens DDXS4 HUMAN ATP-dependent RNA Oni ISS Oni Prot: Q9BRZ1 helicase DDX54, protein rom Homo sapiens DDXS HUMAN Probable ATP-dependent Oni NAS PM D: 2451786 RNA helicase DDX5, protein from Homo sapiens DEAF1 HUMAN Splice Isoform 1 of Deformed Oni TAS PM D: 9773984 epidermal autoregulatory actor 1 homolog, protein rom Homo sapiens DEKHUMAN Protein DEK, protein Oni TAS PM D: 90SO861 rom Homo sapiens DFFA HUMAN Splice Isoform DFF45 of Oni IDA PM D: 15572351 DNA fragmentation factor alpha Subunit, protein rom Homo sapiens DFFB. HUMAN Splice Isoform Alpha of Oni IDA PM D: 15572351 DNA fragmentation factor 40 kDa subunit, protein rom Homo sapiens DGC14 HUMAN DGCR14 protein, protein Oni ISS PM D: 8703114 rom Homo sapiens DGCR8 HUMAN Splice Isoform 1 of Oni IDA PM D: 15574589 DGCR8 protein, protein rom Homo sapiens DGKIHUMAN Diacylglycerol kinase, iota, Oni TAS PM D: 98.30018 protein from Homo sapiens DGKZHUMAN Splice Isoform Long of Oni TAS PM D: 9716136 Diacylglycerol kinase, Zeta, protein from Homo sapiens DHRS2 HUMAN Dehydrogenase/reductase, Oni TAS PM D: 75561.96 protein from Homo sapiens DHX1S HUMAN Putative pre-mRNA splicing Oni TAS PM D: 9388478 actor ATP-dependent RNA helicase DHX15, protein from Homo sapiens DHX16 HUMAN Putative pre-mRNA splicing Oni TAS PM D: 9547260 actor ATP-dependent RNA helicase DHX16, protein from Homo sapiens DHX9 HUMAN DEAH (Asp-Glu-Ala-His) Oni TAS PM D: 911 1062 box polypeptide 9 isoform 1, protein from Homo sapiens DLG7 HUMAN Splice Isoform 2 of Discs Oni IDA PM D: 12S27899 arge homolog 7, protein rom Homo sapiens DLX1 HUMAN Homeobox protein DLX Oni NAS Oni Prot: P56.177 protein from Homo sapiens US 8,148,129 B2 61 62 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence DMAP1 HUMAN DNA methyltransferase 1 UniPro NAS PMID: 10888872 associated protein 1, protein from Homo sapiens DNJC1 HUMAN DnaJ homolog subfamily UniPro ISS UniProt: Q96KC8 C member 1, protein rom Homo sapiens DNL1 HUMAN DNA ligase I, protein UniPro TAS PMID: 8696349 rom Homo sapiens DNL3 HUMAN Ligase III, DNA, ATP UniPro TAS PMID: 7565692 dependent, isoform alpha, protein from Homo sapiens DNL4 HUMAN DNA ligase IV, protein UniPro TAS PMID: 8798671 rom Homo sapiens DNM3A HUMAN DNA, protein from Homo UniPro ISS PMID: 12138111 Sapiens DNM3B HUMAN Splice Isoform 1 of DNA, UniPro TAS PMID: 104.33969 protein from Homo sapiens DNM3L HUMAN DNA (cytosine-5)- UniPro NAS PMID: 122O2768 methyltransferase 3-like, protein from Homo sapiens DNMT1 HUMAN Splice Isoform 1 of DNA, UniPro TAS PMID: 894O1 OS protein from Homo sapiens DP13A HUMAN DCC-interacting protein UniPro IDA PMID: 15O16378 3 alpha, protein from Homo sapiens DP13B. HUMAN DCC-interacting protein UniPro IDA PMID: 15O16378 3 beta, protein from Homo sapiens DPF3 HUMAN Zinc-finger protein DPF3, UniPro NAS UniProt: Q92784 protein from Homo sapiens DPOA2 HUMAN DNA polymerase alpha UniPro NAS UniProt: Q14181 subunit B, protein from Homo sapiens DPOD2 HUMAN DNA polymerase delta UniPro TAS PMID: 853 OO69 Subunit 2, protein from Homo sapiens DPOD4 HUMAN DNA polymerase delta UniPro TAS PMID: 10751307 Subunit 4, protein from Homo sapiens DPOE3 HUMAN DNA polymerase epsilon UniPro TAS PMID: 108O1849 subunit 3, protein from Homo sapiens DPOE4 HUMAN DNA polymerase epsilon UniPro TAS PMID: 108O1849 Subunit 4, protein from Homo sapiens DPOLA HUMAN DNA polymerase alpha UniPro NAS UniProt: PO9884 catalytic Subunit, protein rom Homo sapiens DPOLLEHUMAN DNA polymerase lambda, UniPro NAS PMID: 1098.2892 protein from Homo sapiens DRBP1 HUMAN Splice Isoform 1 of UniPro IDA PMID: 1222O514 Developmentally regulated RNA-binding protein 1, protein from Homo sapiens DRR1 HUMAN DRR1 protein, protein UniPro IDA PMID: 1056458O rom Homo sapiens DSRAD HUMAN Splice Isoform 1 of UniPro TAS PMID: 7565688 Double-stranded RNA specific adenosine deaminase, protein from Homo sapiens DTBP1 HUMAN Splice Isoform 1 of UniProt ISS UniProt: Q96EV8 Dystrobrevin-binding protein 1, protein from Homo sapiens DUS10 HUMAN Dual specificity protein UniProt TAS PMID: 10391943 phosphatase 10, protein from Homo sapiens DUS11 HUMAN Splice Isoform 1 of UniProt TAS PMID: 96.85386 RNA/RNP complex-1 intereracting phosphatase, protein from Homo sapiens DUS16 HUMAN Dual specificity protein UniProt TAS PMID: 11489891 phosphatase 16, protein from Homo sapiens US 8,148,129 B2 63 64 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence DUS21 HUMAN Dual specificity protein UniProt IDA PMID: 12408986 phosphatase 21, protein

DUS2 HUMAN UniProt TAS PMID: 81078SO

DUS4 HUMAN UniProt TAS PMID: 753.5768

DUS9 HUMAN Dual specificity protein UniProt TAS PMID: 903 0581 phosphatase 9, protein DUTHUMAN Splice Isoform DUT-M of UniProt TAS PMID: 8631816 Deoxyuridine 5'-triphosphate nucleotidohydrolase, mitochondrial precursor, protein from Homo sapiens DYR1A HUMAN Splice Isoform Long of UniProt IDA PMID: 974.8265 Dual specificity tyrosine phosphorylation regulated kinase 1A, protein from Homo sapiens DYR1B. HUMAN Splice Isoform 1 of Dual UniProt TAS PMID: 99.18863 specificity tyrosine phosphorylation regulated kinase 1B, protein from Homo sapiens DZIP1 HUMAN Splice Isoform 1 of Zinc UniPro IDA PMID: 15081113 finger protein DZIP1, protein from Homo sapiens ECM29 HUMAN PREDICTED: KIAAO368 UniPro IDA PMID: 15496406 protein, protein from Homo sapiens EDD1 HUMAN Ubiquitin-protein ligase UniPro IDA PMID: EDD1, protein from Homo sapiens EDF1 HUMAN Splice Isoform 1 of UniPro IDA PMID: 1056.7391 Endothelial differentiation related factor 1, protein rom Homo sapiens EGF HUMAN Pro-epidermal growth UniPro NR UniProt: PO1133 actor precursor, protein rom Homo sapiens EGFRHUMAN Splice Isoform 1 of UniPro IDA PMID: 2828935 Epidermal growth factor receptor precursor, protein from Homo sapiens EGLN2HUMAN Egl nine homolog2, UniPro IDA PMID: 1850811 protein from Homo sapiens EHD2 HUMAN Similar to EH-domain UniPro TAS PMID: O673336 containing protein 2, protein from Homo sapiens EHD3 HUMAN EH-domain containing UniPro TAS PMID: O673336 protein 3, protein from Homo sapiens EHD4 HUMAN EH-domain containing UniPro TAS PMID: O673336 protein 4, protein from Homo sapiens EHMT1 HUMAN Splice Isoform 2 of UniPro IC PMID: 200413S Histone-lysine N methyltransferase, H3 ysine-9 specific 5, protein from Homo sapiens ELF1 HUMAN ETS-related transcription UniProt NAS UniProt: P32519 actor Elf-1, protein from Homo sapiens ELF2HUMAN Splice Isoform 1 of ETS UniProt IC PMID: 14970218 related transcription factor Elf-2, protein from Homo Sapiens ELL3 HUMAN RNA polymerase II UniProt IDA PMID: 10882741 elongation factor ELL3, protein from Homo sapiens EMX1 HUMAN Homeobox protein EMX1, UniProt NAS UniProt: Q04741 protein from Homo sapiens US 8,148,129 B2 65 66 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence EMX2HUMAN Homeobox protein EMX2, Oni NAS UniProt: Q04743 protein from Homo sapiens ENC1 HUMAN Ectoderm-neural cortex 1 Oni TAS PMID: 9566959 protein, protein from Homo sapiens ENLHUMAN ENL protein, protein Oni TAS PMID: 8080983 rom Homo sapiens E P300 HUMAN E1A-associated protein p300, Oni IDA PMID: 91.9456S protein from Homo sapiens E PC1 HUMAN Splice Isoform 1 of Enhancer Oni IDA PMID: 10976108 of polycomb homolog 1, protein from Homo sapiens E RCC2HUMAN TFIIH basal transcription Oni NAS UniProt: P18074 actor complex helicase Subunit, protein from Homo sapiens E RCC3 HUMAN TFIIH basal transcription Oni TAS PMID: 8663148 actor complex helicase XPB subunit, protein rom Homo sapiens E RG HUMAN Splice Isoform ERG-2 of Oni TAS PMID: 85O2479 Transcriptional regulator ERG, protein from Homo apiens E RR1 UMAN Oni TAS PMID: 92867OO ERR1, protein from Homo sapiens E RR3 HUMAN Splice Isoform 1 of Estrogen Oni ISS UniProt: P62508 related receptor gamma, protein from Homo sapiens ESR2 HUMAN Splice Isoform 1 of Estrogen Oni TAS PMID: 11181953 receptor beta, protein from Homo sapiens ETV3 HUMAN Splice Isoform 1 of ETS Oni NAS UniProt: P41162 translocation variant 3, protein from Homo sapiens E TV4 HUMAN TS translocation variant Oni NAS UniProt: P43268 protein from Homo sapiens E TV7 HUMAN plice Isoform B of Oni TAS PMID: 10828O14 Cranscription factor ETV7, protein from Homo sapiens EVI1 HUMAN Splice Isoform 1 of Oni NAS UniProt: Q03112 Ecotropic virus integration site protein, protein from Homo sapiens EVX2 HUMAN Homeobox even-skipped Oni NAS UniProt: Q03828 homolog protein 2, protein from Homo sapiens EXOS2 HUMAN Exosome complex Oni TAS PMID: 86OOO32 exonuclease RRP4, protein from Homo sapiens EASOAHUMAN Protein FAM50A, protein Oni TAS PMID: 93.393.79 rom Homo sapiens EAF1 HUMAN Splice Isoform Long of Oni IDA PMID: 155964SO FAS-associated factor 1, protein from Homo sapiens FALZHUMAN Fetal Alzheimer antigen, Oni IDA PMID: 10727212 protein from Homo sapiens FANCA HUMAN Splice Isoform 1 of Fanconi Oni TAS PMID: 9398857 anemia group A protein, protein from Homo sapiens EANCCHUMAN Fanconi anemia group C Oni TAS PMID: 9398857 protein, protein from Homo sapiens EANCE HUMAN Fanconi anemia group E. Oni NAS PMID: 11 OO1585 protein, protein from Homo sapiens EANC HUMAN Splice Isoform 1 of Fanconi Oni NAS PMID: 11301 010 anemia group J protein, protein from Homo sapiens FGF10 HUMAN Fibroblast growth factor 10 Oni IDA PMID: 11923.311 precursor, protein rom Homo sapiens FHL2 HUMAN FHL2 isoform 5, protein Oni TAS PMID: 9150430 rom Homo sapiens US 8,148,129 B2 67 68 -continued

Symbol Evi Refer Qualifier Sequence/GOst information Source dence ence FHOD1 HUMAN FH1 FH2 domain Uni Prot TAS PM D: 10352228 containing protein, protein rom Homo sapiens FIBP HUMAN Splice Isoform Short of Uni Prot TAS PM D: 9806903 Acidic fibroblast growth actor intracellular binding protein, protein rom Homo sapiens FIZ1 HUMAN Flt3-interacting Zinc Oni ISS Oni Prot: Q96SL8 finger protein 1, protein rom Homo sapiens FMR1 HUMAN Splice Isoform 6 of Fragile Oni TAS PM D: 8515814 X mental retardation 1 protein, protein from Homo Sapiens FOSHUMAN Proto-oncogene protein c Oni TAS PM D: 9443941 os, protein from Homo Sapiens FOSL1 HUMAN Fos-related antigen 1, Oni TAS PM D: 109.1858O protein from Homo sapiens FOSL2 HUMAN Fos-related antigen 2, Oni TAS PM D: 89S4781 protein from Homo sapiens FOXC1 HUMAN Forkhead box protein C1, Oni NAS Oni Prot: Q9BYM1 protein from Homo sapiens FOXD3 HUMAN Forkhead box protein D3, Oni ISS Oni Prot: Q9UJU5 protein from Homo sapiens FOXD4 HUMAN Forkhead box protein D4, Oni NAS Oni Prot: O43638 protein from Homo sapiens FOX E 3 H UMAN Forkhead box protein E3, Oni NAS Oni Prot: Q13461 protein from Homo sapiens FOXF1 HUMAN Forkhead box protein F1, Oni TAS PM D: 9722567 protein from Homo sapiens FOXF2HUMAN Forkhead box protein F2, Oni TAS PM D: 9722567 protein from Homo sapiens FOXGCHUMAN Forkhead box protein G1C, Oni NAS Oni Prot: Q14488 protein from Homo sapiens FOXI1 HUMAN Forkhead box I1 isoForma, Oni NAS Oni Prot: Q12951 protein from Homo sapiens FOXJ1 HUMAN Forkhead box protein J1, Oni TAS PM D:9073514 protein from Homo sapiens FOXK2 HUMAN Splice Isoform 1 of Oni TAS PM D: 1909027 Forkhead box protein K2, protein from Homo sapiens FOXL1 HUMAN Forkhead box protein L1, Oni NAS Oni Prot: Q12952 protein from Homo sapiens FOXL2 HUMAN FOXL2, protein from Oni NAS Oni Prot: PS8012 Homo sapiens FOXN1 HUMAN Forkhead box protein N1, Oni TAS PM D:10767081 protein from Homo sapiens FOXO3 HUMAN Forkhead box protein O3A, Oni TAS PM D: 10102273 protein from Homo sapiens FOXO4 HUMAN Splice Isoform 1 of Oni TAS PM D: 9010221 Putative forkhead domain transcription factor AFX1, protein from Homo sapiens FOXP3 HUMAN Splice Isoform 1 of Oni NAS Oni Prot: Q9BZS1 Forkhead box protein P3, protein from Homo sapiens FREA HUMAN Forkhead-related Oni NAS Oni Prot: O43638 transcription factor 10, protein from Homo sapiens FRKHUMAN Tyrosine-protein kinase FRK, Oni TAS PM D: 7696.183 protein from Homo sapiens FUBP3 HUMAN Splice Isoform 2 of Far Oni NAS PM D: 894O189 upstream element-binding protein 3, protein from Homo sapiens FUS HUMAN Fus-like protein, protein Oni TAS PM D: 8510758 rom Homo sapiens FUSIPHUMAN Splice Isoform 1 of FUS Oni ISS Oni Prot: Q96P17 interacting serine arginine-rich protein 1, protein from Homo sapiens IC PM D: 9774382 FXL10 HUMAN Splice Isoform 1 of F UniProt NAS PM D: 10799292 box/LRR-repeat protein 10, protein from Homo sapiens US 8,148,129 B2 69 70 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source (Ce Ce FXR2 HUMAN Fragile X mental retardation Oni TAS PM D: 10888S99 syndrome-related protein 2, protein from Homo sapiens FYB HUMAN Splice Isoform FYB-120 Oni TAS PM D: 9207119 of FYN-binding protein, protein from Homo sapiens G10 HUMAN G10 protein homolog, Oni TAS PM D: 78412O2 protein from Homo sapiens G3BP HUMAN Ras-GTPase-activating Oni TAS PM D: 9889278 protein binding protein 1, protein from Homo sapiens GA45AHUMAN Growth arrest and DNA Oni TAS PM D: 7798.274 damage-inducible protein GADD45 alpha, protein rom Homo sapiens GABP2 HUMAN Splice Isoform 1 of GA Oni TAS PM D: 901 6666 binding protein beta chain, protein from Homo sapiens GABPA HUMAN GA binding protein alpha Oni TAS PM D: 901 6666 chain, protein from Homo GATA1 HUMAN Splice Isoform 1 of Oni TAS PM D: 2300555 Erythroid transcription actor, protein from Homo sapiens GATA2 HUMAN Endothelial transcription Oni TAS PM D: 1370462 actor GATA-2, protein rom Homo sapiens GATA4 HUMAN Transcription factor GATA-4, Oni NAS PM D: 1284.5333 protein from Homo sapiens GCFCHUMAN Splice Isoform A of GC Oni NAS Oni Prot: Q9YSB6 rich sequence DNA-binding actor homolog, protein from Homo sapiens GCM1 HUMAN Chorion-specific transcription Oni TAS PM D: 89621SS actor GCMa, protein rom Homo sapiens GCM2 HUMAN Chorion-specific transcription Oni TAS PM D: 992.8992 actor GCMb, protein rom Homo sapiens GCRHUMAN Splice Isoform Alpha of Oni TAS PM D: 98.73O44 , protein from Homo sapiens GLI3 HUMAN Zinc finger protein GLI3, Oni TAS PM D: 10077605 protein from Homo sapiens GLI4 HUMAN Zinc finger protein GLI4, Oni NAS Oni Prot: P10075 protein from Homo sapiens GLIS1 HUMAN Zinc finger protein GLIS1, Oni ISS Oni Prot: Q8NBF1 protein from Homo sapiens GLIS3 HUMAN Zinc finger protein GLIS3, Oni ISS Oni Prot: Q8NEA6 protein from Homo sapiens GLRX2 HUMAN Splice Isoform 1 of Oni IEP PM D: 11297.543 Glutaredoxin-2, mitochondrial precursor, protein from Homo sapiens GMEB1 HUMAN Splice soform 1 of Oni TAS PM D: 10386S84 Glucocorticoid modulatory element-binding protein 1, protein from Homo sapiens GMEB2 HUMAN Glucocorticoid modulatory Oni TAS PM D: 105236.63 element-binding protein 2, protein from Homo sapiens GNEFRHUMAN Splice Isoform 2 of Oni IDA PM D: 10571079 Guanine nucleotide exchange actor-related protein, protein from Homo sapiens GNL3 HUMAN Splice Isoform 1 of Oni ISS Oni Guanine nucleotide binding protein-like 3, protein from Homo sapiens GO4S HUMAN Splice Isoform 1 of Oni NAS PM D: 91291.47 Golgin 45, protein from Homo sapiens GRAA HUMAN Granzyme A precursor, Oni TAS PM D: 119.09973 protein from Homo sapiens GRAB HUMAN Endogenous granzyme B, Oni TAS PM D: 119.09973 protein from Homo sapiens US 8,148,129 B2 71 72 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence GRLF1 HUMAN Glucocorticoid receptor DNA Oni IC PMID: 1894621 bindinC factor 1 isoform b, protein from Homo sapiens GRP78 HUMAN 78 kDa glucose-regulated Oni IDA PMID: 1266S508 protein precursor, protein rom Homo sapiens GSCHUMAN Homeobox protein goosecoid, Oni NAS UniProt: PS6915 protein from Homo sapiens GT2D1 HUMAN Splice Isoform 1 of General Oni NAS PMID: 11438732 transcription factor II-I repeat domain-containing protein 1, protein from Homo sapiens 2AFXHUMAN Histone H2A.X., protein Oni IDA PMID: 15604234 rom Homo sapiens IDA PMID: 11331621 AIRHUMAN Splice Isoform 1 of Oni NAS PMID: 94.4548O Hairless protein, protein rom Homo sapiens ASPHUMAN Splice Isoform 1 of Oni IEP PMID: 11228240 Serine/threonine-protein kinase Haspin, protein rom Homo sapiens AT1 HUMAN Histone acetyltransferase Oni TAS PMID: 9427644 type B catalytic subunit, protein from Homo sapiens BXAP HUMAN Remodeling and spacing Oni IDA PMID: 12972596 actoR1, protein from Homo sapiens IDA PMID: 11788S98 CC1 HUMAN Hec-1, Oni NAS PMID: 113561.93 protein from Homo sapiens CFC1 HUMAN Splice Isoform 1 of Host Oni IDA PMID: 78762O3 cell factor, protein from Homo sapiens CFC2HUMAN Host cell factor 2, protein Oni IDA PMID: 10196.288 rom Homo sapiens DA10 HUMAN Splice Isoform 1 of Oni IDA PMID: 11861901 Histone deacetylase 10, protein from Homoi. sapiens DA11 HUMAN Histone deacetylase 11, Oni IDA PMID: 11948178 protein from Homoi. sapiens DAC1 HUMAN Histone deacetylase 1, Oni TAS PMID: 12711221 protein from Homoi. sapiens DAC2 HUMAN Histone deacetylase 2, Oni TAS PMID: 12711221 protein from Homoi. sapiens DAC3 HUMAN Splice Isoform 1 of Oni TAS PMID: 12711221 Histone deacetylase 3, protein from Homo sapiens DAC4 HUMAN Histone deacetylase 4, Oni NAS UniProt: PS6524 protein from Homo sapiens DACS HUMAN Splice Isoform 1 of Oni TAS PMID: 12711221 Histone deacetylase 5, protein from Homo sapiens DAC6HUMAN Histone deacetylase 6, Oni NAS UNIPROT: Q9UBN7 protein from Homo sapiens DACA HUMAN Histone deacetylase, Oni TAS PMID: 12711221 protein from Homo sapiens DAC9HUMAN Splice Isoform 1 of Oni NAS UniProt: Q9UKVO Histone deacetylase 9, protein from Homo sapiens DGR3 HUMAN Hepatoma-derived growth Oni IDA PMID: 10581,169 actor-related protein 3, protein from Homo sapiens DHUMAN Huntingtin, protein from Oni TAS PMID: 9778.247 Homo sapiens ELIHUMAN Splice Isoform 1 of Zinc Oni NAS UniProt: Q9UKS7 finger protein Helios, protein from Homo sapiens C1 HUMAN Splice Isoform 2 of Hyper Oni NAS UniProt: Q14526 methylated in cancer 1 protein, protein from Homo sapiens C2HUMAN Splice Isoform 1 of Hyper Oni IEP PMID: 11554746 methylated in cancer 2 protein, protein from Homo sapiens F1A HUMAN Hypoxia-inducible factor Oni IDA PMID: 15261140 alpha, protein from Homo sapiens US 8,148,129 B2 73 74 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence HINT1 HUMAN Histidine triad nucleotide Oni TAS PMID: 9770345 binding protein 1, protein rom Homo sapiens HIPK2 HUMAN Splice Isoform 1 of Oni IDA PMID: 12220523 Homeodomain-interacting protein kinase 2, protein rom Homo sapiens HIPK3 HUMAN Splice Isoform 1 of Oni IDA PMID: 11034606 Homeodomain-interacting protein kinase 3, protein rom Homo sapiens HIRA HUMAN Splice Isoform Long of Oni TAS PMID: 9710638 HIRA protein, protein rom Homo sapiens HIRP3 HUMAN Splice Isoform 1 of Oni TAS PMID: 9710638 HIRA-interacting protein 3, protein from Homo sapiens HKR1 HUMAN Krueppel-related Zinc Oni NAS UniProt: P1OO72 finger protein 1, protein rom Homo sapiens HKR2 HUMAN Krueppel-related Zinc Oni NAS UniProt: P1OO73 finger protein 2, protein rom Homo sapiens HLF HUMAN Hepatic leukemia factor, Oni TAS PMID: 1386.162 protein from Homo sapiens HLXB9 HUMAN Homeobox protein HB9, Oni NAS UniProt: PSO219 protein from Homo sapiens NAS UniProt: Q9Y648 M2OB. HUMAN SWISNF-related matrix Oni NAS UniProt: Q9POW2 associated actin-dependent regulator of chromatin Subfamily E member 1-related, protein from Homo sapiens M2L1 HUMAN High mobility group Oni NAS UniProt: Q9UGU5 protein 2-like 1, protein from Homo sapiens MG17 HUMAN Nonhistone chromosomal Oni NAS UniProt: POS2O4 protein HMG-17, protein from Homo sapiens MG1 HUMAN High mobility group Oni TAS PMID: 16130169 protein 1, protein from Homo sapiens MGN3 HUMAN High mobility group Oni NAS UniProt: Q15651 nucleosome binding domain 3, protein from Homo sapiens HNF1B. HUMAN Splice Isoform A of Oni TAS PMID: 1677179 Hepatocyte nuclear factor -beta, protein from Homo sapiens HNF3G HUMAN epatocyte nuclear factor Oni TAS PMID: 7739897 -gamma, protein from omo sapiens HNF4A HUMAN epatocyte nuclear factor Oni TAS PMID: 9.048927 alpha isoform b, otein from Homo sapiens HNF6 HUMAN epatocyte nuclear factor Oni NAS UniProt: Q9UBCO 6, protein from Homo sapiens HNRPD HUMAN Splice Isoform 1 of Oni NAS PMID: 1433497 eterogeneous nuclear ribonucleoprotein D0, protein from Homo sapiens HNRPQ HUMAN Sp ice Isoform 1 of Oni TAS PMID: 9847309 eterogeneous nuclear ribonucleoprotein Q, protein from Homo sapiens RXHUMAN Splice Isoform 1 of Zinc Oni IDA PMID: 11313484 finger protein HRX, protein from Homo sapiens S74L HUMAN Heat shock 70 kDa Oni ISS UniProt: O957S7 protein 4L, protein from Homo sapiens SBP1. HUMAN binding Oni TAS PMID: 9649SO1 protein 1, protein from Homo sapiens US 8,148,129 B2 75 -continued Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence HSP71 HUMAN Heat shock 70 kDa UniProt TAS PMID: 10205060 protein 1, protein from Homo sapiens HTRA2 HUMAN Splice Isoform 1 of Serine UniProt TAS PMID: 10971580 protease HTRA2, mitochondrial precursor, protein from Homo sapiens HUWE1 HUMAN Splice Isoform 1 of UniProt ISS UniProt: Q7Z6Z7 HECT, UBA and WWE domain containing protein protein from Homo sapiens HXA5 HUMAN Homeobox protein Hox-A5, UniProt NAS UniProt: P2O719 protein from Homo sapiens HXB1 HUMAN Homeobox protein Hox-B1, UniProt NAS UniProt: P14653 protein from Homo sapiens HXB4 HUMAN Homeobox protein Hox-B4, UniProt NAS UniProt: P17483 protein from Homo sapiens HXB6 HUMAN Splice Isoform 1 of UniProt NAS UniProt: P17509 Homeobox protein Hox-B6, protein from Homo sapiens HXB7 HUMAN Homeobox protein Hox-B7, UniProt NAS PMID: 1678287 protein from Homo sapiens HXB8 HUMAN Homeobox protein Hox-B8, UniProt NAS UniProt: P17481 protein from Homo sapiens HXB9 HUMAN Homeobox protein Hox-B9, UniProt NAS UniProt: P17482 protein from Homo sapiens HXC13 HUMAN Homeobox protein Hox-C13, UniProt NAS UniProt: P31276 protein from Homo sapiens HXC4 HUMAN Homeobox protein Hox-C4, UniProt NAS UniProt: PO901.7 protein from Homo sapiens HXC8 HUMAN Homeobox protein Hox-C8, UniProt NAS UniProt: P31273 protein from Homo sapiens HXD11 HUMAN Homeobox protein Hox-D11, UniProt NAS UniProt: P31277 protein from Homo sapiens HXD4 HUMAN Homeobox protein Hox-D4, UniProt NAS UniProt: PO9016 protein from Homo sapiens HXD8 HUMAN Homeobox protein Hox-D8, UniProt NAS PMID: 2S68311 protein from Homo sapiens ASPPHUMAN Splice Isoform 1 of RelA- UniProt TAS PMID: 10336463 associated inhibitor, protein from Homo sapiens F16 HUMAN Splice Isoform 2 of UniProt IDA PMID: 7536752 Gamma-interferon inducible protein If-16, protein from Homo sapiens F6 HUMAN Eukaryotic translation UniProt TAS PMID: 9374S18 initiation factor 6, protein rom Homo sapiens KBA HUMAN NF-kappaB inhibitor alpha, UniProt IDA PMID: 7679069 protein from Homo sapiens LF2HUMAN interleukin enhancer- UniProt IDA PMID: 751.9613 binding factor 2, protein rom Homo sapiens LF3 HUMAN Splice Isoform 1 of UniProt IDA PMID: 11739746 interleukin enhancer-binding actor 3, protein rom Homo sapiens NAS PMID: 104OO669 MA2 HUMAN Importin alpha-2 subunit, UniProt TAS PMID:902O106 protein from Homo sapiens MB3 HUMAN Importin beta-3, protein UniProt TAS PMID: 91.14010 rom Homo sapiens MUP HUMAN Similar to Immortalization- UniProt TAS PMID: 11080599 upregulated protein, protein from Homo sapiens N3S HUMAN Splice Isoform 1 of UniProt IDA PMID: 828.8566 interferon-induced 35 kDa protein, protein from Homo sapiens NG1 HUMAN Splice Isoform 1 of Inhibitor UniProt NAS PMID: 108663O1 of growth protein 1, protein from Homo sapiens NG2 HUMAN inhibitor of growth protein 2, UniProt IEP PMID: 15243141 protein from Homo sapiens NG4 HUMAN Splice Isoform 1 of Inhibitor UniProt IDA PMID: 150291.97 of growth protein 4, protein from Homo sapiens US 8,148,129 B2 77 78 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence P6K1 HUMAN Inositol hexaphosphate kinase UniPro TAS PM D: 11502751 1, protein from Homo sapiens P6K2 HUMAN Splice Isoform 1 of Inositol UniPro SS Oni Prot: Q9UHH9 hexakisphosphate kinase 2, protein from Homo sapiens P6K3 HUMAN Inositol hexaphosphate UniPro DA PM D: 11502751 kinase 3, protein from Homo sapiens Oni Prot: Q96 PC2 RF4 HUMAN Splice Isoform 1 of Interferon UniPro PM D: 12374808 regulatory factor 4, protein from Homo sapiens RF7 HUMAN Splice Isoform A of UniPro SS Oni Prot: Q92985 interferon regulatory factor 7, protein from Homo sapiens RS1 HUMAN insulin receptor Substrate UniPro SS Oni Prot: P35568 protein from Homo sapiens RTF HUMAN Transcriptional regulator UniPro TAS PM D: 1630447 SGF3 gamma subunit, protein from Homo sapiens TF2HUMAN Splice Isoform SEF2-1B UniPro TAS PM D: 1681116 of Transcription factor 4, protein from Homo sapiens AD1A HUMAN umonji/ARID domain UniPro TAS PM D: 8414517 containing protein 1A, protein from Homo sapiens ERKLHUMAN erky homolog-like, UniPro TAS PM D: 924O447 protein from Homo sapiens KCY HUMAN UMP-CMP kinase, UniPro TAS PM D: 10462544 protein from Homo sapiens KIF22 HUMAN Kinesin-like protein KIF22, UniPro TAS PM D: 8S99929 protein from Homo sapiens KLF10 HUMAN Transforming growth UniPro TAS PM D: 974.8269 actor-beta-inducible early growth response protein 1, protein from Homo sapiens KLF11 HUMAN Transforming growth UniPro TAS PM D: 974.8269 actor-beta-inducible early growth response protein 2, protein from Homo sapiens KLF2HUMAN Kruppel-like factor 2, UniPro NAS Oni Prot: Q9YSW3 protein from Homo sapiens HUMAN Kruppel-like factor 4, UniPro ISS PM D: 9422764 protein from Homo sapiens KLF6 HUMAN Splice Isoform 1 of Core UniPro TAS PM D: 9689109 promoter element-binding protein, protein from Homo sapiens KNTC1 HUMAN Kinetochore-associated UniPro NAS PM D: 1114666O protein 1, protein from Homo sapiens KPCIHUMAN Protein kinase C, iota type, UniPro ISS Oni Prot: P41743 protein from Homo sapiens KR18 HUMAN Zinc finger protein Kr18, UniPro NAS Oni Prot: Q9HCG1 protein from Homo sapiens KS6A2 HUMAN Ribosomal protein S6 UniPro TAS PM D: 7623830 kinase alpha 2, protein rom Homo sapiens KS6A4 HUMAN Ribosomal protein S6 UniPro IEP PM D:9792677 kinase alpha 4, protein rom Homo sapiens ISS Oni Prot: O75585 IDA PM D: 1103S004 KS6AS HUMAN Ribosomal protein S6 UniProt IEP PM D: 9687510 kinase alpha 5, protein rom Homo sapiens KU7OHUMAN ATP-dependent DNA UniProt TAS PM D: 10508516 helicase II, 70 kDa Subunit, protein from Homo sapiens KU86 HUMAN ATP-dependent DNA UniProt TAS PM D: 795.7065 helicase II 80 kDa Subunit, protein from Homo sapiens LANC2HUMAN Lanc-like protein 2, UniProt IDA PM D: 12S66319 protein from Homo sapiens US 8,148,129 B2 79 80 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence LAP2 HUMAN Splice Isoform 1 of LAP2 UniPro ISS PM D: 11375975 protein, protein from Homo sapiens IDA PM : 11375975 LATS2 HUMAN Serine/threonine-protein UniPro NAS PM : 10673337 kinase LATS2, protein rom Homo sapiens LDOC1 HUMAN Protein LDOC1, protein UniPro TAS PM D: 10403.563 rom Homo sapiens LEG3 HUMAN LGALS3 protein, protein UniPro IDA PM D: 1496.1764 rom Homo sapiens LHX3 HUMAN Splice Isoform A of UniPro TAS PM D: 10598593 LIM homeobox protein Lhk3, protein from Homo sapiens LIMK2 HUMAN Splice Isoform LIMK2a UniPro TAS PM D: 89S4941 of LIM domain kinase 2, protein from Homo sapiens LMBL2 HUMAN Splice Isoform 1 of UniPro NAS Oni Prot: Q969R5 Lethal(3)malignant brain tumor-like 2 protein, protein from Homo sapiens LMBL3 HUMAN Splice Isoform 1 of UniPro NAS Oni Prot: Q96JM7 Lethal(3)malignant brain tumor-like 3 protein, protein from Homo sapiens LMO7 HUMAN Splice Isoform 3 of LIM UniPro TAS PM D:9826547 domain only protein 7, protein from Homo sapiens LMX1B. HUMAN Splice Isoform Short of UniPro NAS Oni Prot: O60663 LIM homeobox transcription factor 1 beta, protein from Homo sapiens IDA PM D:10767331 LPIN1 HUMAN Lipin-1, protein from UniPro ISS PM D: 11138O12 Homo sapiens LPPRCHUMAN 30 kDa leucine-rich protein, UniPro IDA PM D: 12832482 protein from Homo sapiens ISS Oni Prot: P42704 LSM1 HUMAN O6 snRNA-associate UniPro TAS PM D: 10369684 Sm-like protein LSm1, protein from Homo sapiens LSM2 HUMAN UniPro NAS Oni Prot: Q9Y333

LSM3 HUMAN UniPro TAS PM D: 10369684 protein from Homo sapiens LSMS HUMAN O6 snRNA-associate UniPro TAS PM D: 10369684 Sm-like protein LSms, protein from Homo sapiens LSM7 HUMAN UniPro NAS PROT: Q9UK45

LSM8 HUMAN UniPro NAS Oni Prot: O95777 Sm-like protein LSm8, protein from Homo sapiens LZTS1 HUMAN , putative UniPro NAS Oni tumor Suppressor 1, protein from Homo sapiens NAS Oni Prot: Q9YSW2 NAS Oni Prot: Q9YSW1 NAS Oni Prot: Q9YSWO NAS Oni Prot: Q9YSV8 MAD HUMAN MAD protein, protein UniProt TAS PM D: 8425218 from Homo sapiens MAFB. HUMAN Transcription factor MafB, UniProt TAS PM D: 8001130 protein from Homo sapiens MAGC2HUMAN Melanoma-associated UniProt IDA PM D: 1292O247 antigen C2, protein from Homo sapiens MAGE1 HUMAN Melanoma-associated UniProt ISS PM D: 14623885 antigen E1, protein from Homo sapiens MAML1 HUMAN Mastermind-like protein UniProt IDA PM D: 11101851 1, protein from Homo sapiens MAML2 HUMAN MasterMind-like 2, UniProt IDA PM D: 12370315 protein from Homo sapiens US 8,148,129 B2 81 -continued Symbol Evi- Refer ualifier SequencefC GOst information Source dence ence MAML3 HUMAN MasterMind-like 3, UniProt IDA PMID: 12370315 protein from Homo sapiens MAPK2 HUMAN Splice Isoform 1 of MAP UniProt TAS PMID: 8280084 kinase-activated protein kinase 2, protein from Homo sapiens MAPK3 HUMAN MAP kinase-activated UniProt TAS PMID: 10781029 protein kinase 3, protein rom Homo sapiens MBB1A HUMAN Splice Isoform 1 of Myb- UniProt ISS UniProt: Q9BQGO binding protein 1A, protein from Homo sapiens MBD1 HUMAN Splice Isoform 1 O UniProt NAS PMID: 10454587 Methyl-CpG-binding domain protein 1, protein rom Homo sapiens MBD2 HUMAN Splice Isoform 1 O UniProt NAS PMID: 10441743 Methyl-CpG-binding domain protein 2, protein rom Homo sapiens MBD4 HUMAN Splice Isoform 1 O UniProt TAS PMID: 9774669 Methyl-CpG-binding domain protein 4, protein rom Homo sapiens MBNL HUMAN Splice Isoform EXP35 of UniProt IDA PMID: 1097.0838 Muscleblind-like protein, protein from Homo sapiens MCA3 HUMAN Eukaryotic translation UniProt ISS UniProt: O43324 elongation factor 1 epsilon-1, protein from Homo sapiens MCE1 HUMAN Splice Isoform 1 of UniProt TAS PMID: 94.73487 mRNA capping enzyme, protein from Homo sapiens MCM2 HUMAN DNA replication licensing UniProt TAS PMID: 8175912 actor MCM2, protein rom Homo sapiens MCM3A HUMAN 80 kola MCM3-associated UniProt TAS PMID: 9712829 protein, protein from Homo sapiens MCM4 HUMAN DNA replication licensing UniProt NAS PMID: 8265339 actor MCM4, protein rom Homo sapiens MCM5 HUMAN DNA replication licensing UniProt TAS PMID: 8751386 actor MCM5, protein rom Homo sapiens MCM6 HUMAN DNA replication licensing UniProt NAS PMID: 9286856 actor MCM6, protein rom Homo sapiens MD2BP HUMAN MAD2L1 binding protein, UniProt IDA PMID: 10942595 protein from Homo sapiens MDC1 HUMAN Splice Isoform 1 o UniProt IDA PMID: 15604234 Mediator of DNA damage checkpoint protein 1, protein from Homo sapiens MDM4 HUMAN Molm4 protein, protein UniProt NAS PMID: 9226370 rom Homo sapiens MDN1 HUMAN Midasin, protein from UniProt NAS PMID: 12102729 Homo sapiens MECP2 HUMAN Methyl-CpG-binding protein UniProt TAS PMID: 10773092 2, protein from Homo sapiens MECT1 HUMAN Splice Isoform 1 of UniProt IDA PMID: 14506290 Mucoepidermoid carcinoma translocated protein 1, protein from Homo sapiens MED12 HUMAN Mediator of RNA polymerase UniProt IDA PMID: 10235.267 II transcription subunit 12, protein from Homo sapiens MED4 HUMAN Mediator complex subunit UniProt IDA PMID: 10235.267 4, protein from Homo sapiens MED6 HUMAN RNA polymerase UniProt TAS PMID: 10O24883 transcriptional regulation mediator, Subunit 6 homolog, protein from Homo sapiens US 8,148,129 B2 83 84 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence MEF2A HUMAN Splice Isoform MEF2 of UniPro TAS PMID: 1516833 Myocyte-specific enhancer factor 2A, protein from Homo sapiens MEF2B. HUMAN Myocyte-specific UniPro TAS PMID: 1516833 enhancer factor 2B, protein from Homo sapiens MEFWHUMAN Splice Isoform 1 of Pyrin, UniPro IDA PMID: 11115844 protein from Homo sapiens MEN1 HUMAN Splice Isoform 1 of Menin, UniPro IDA PMID: 151991 22 protein from Homo sapiens MERL HUMAN Splice Isoform 1 of Merlin, UniPro IDA PMID: 104O1 OO6 protein from Homo sapiens MGMT HUMAN Methylated-DNA-- UniPro TAS PMID: 2188979 protein-cysteine methyltransferase, protein rom Homo sapiens MGN HUMAN Mago nashi protein homolog, UniPro NAS UniProt: P61326 protein from Homo sapiens MITF HUMAN Splice Isoform A2 of UniPro NAS PMID: 9647758 Microphthalmia associated transcription actor, protein from Homo sapiens NAS PMID: 10578055 MK14 HUMAN Mitogen-activated protein UniProt ISS UniProt: Q16539 kinase 14 isoforM2, protein from Homo sapiens MKL2HUMAN Splice Isoform 1 of UniProt IC PMID: 14565952 MKL/myocardin-like protein 2, protein from Homo sapiens MLE3 HUMAN Splice Isoform 1 of DNA UniProt TAS PMID: 1061S123 mismatch repair protein MIh3, protein from Homo Sapiens MLL2HUMAN Splice Isoform 1 of UniProt NAS PMID: 9247308 Myeloid/lymphoid or mixed-lineage leukemia protein 2, protein from Homo sapiens MLL4 HUMAN Splice Isoform 1 of UniProt NAS UniProt: Q9UMN6 Myeloid/lymphoid or mixed-lineage leukemia protein 4, protein from Homo sapiens MLXHUMAN Splice Isoform Gamma of UniPro IDA PMID: 10918.583 MAX-like protein X, protein from Homo sapiens MLZE HUMAN Melanoma-derived UniPro NAS PMID: 11223S43 eucine Zipper-containing extranuclear factor, protein from Homo sapiens MO4L1 HUMAN Similar to Testis UniPro NAS UniProt: Q9UBU8 expressed gene 189, protein from Homo sapiens MO4L2 HUMAN Mortality factor 4-like UniPro NAS UniProt: Q15014 protein 2, protein from Homo sapiens MOL1A HUMAN Mps one binder kinase UniPro IDA PMID: 15067004 activator-like 1A, protein rom Homo sapiens MOS1A HUMAN Splice Isoform 1 of UniPro NAS PMID: 97.31530 Molybdenum cofactor biosynthesis protein 1 A, protein from Homo sapiens MPP8 HUMAN M-phase phosphoprotein UniPro IDA PMID: 8885239 8, protein from Homo sapiens MRE11 HUMAN Splice Isoform 1 of UniPro TAS PMID: 10802669 Double-strand break repair protein MRE 11A, protein from Homo sapiens MS3L1 HUMAN Splice Isoform 1 of Male UniPro TAS PMID: 103958O2 specific lethal 3-like 1, protein from Homo sapiens US 8,148,129 B2 85 86 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence MSH2 HUMAN DNA mismatch repair Oni NAS PM D: 7923193 protein Msh2, protein from Homo sapiens MSH4 HUMAN MutS protein homolog 4, Oni TAS PM D: 9299235 protein from Homo sapiens MSMBHUMAN Splice Isoform PSP94 of Oni TAS PM D: 7566962 Beta-microseminoprotein precursor, protein from Homo sapiens MTA70 HUMAN Splice Isoform 1 of N6 Oni IDA PM D: 94O9616 adenosine-methyltransferase 70 kDa subunit, protein from Homo sapiens MTF1 HUMAN Metal-regulatory Oni TAS PM D: 3208749 transcription factor 1, protein from Homo sapiens MTMR2 HUMAN Myotubularin-related Oni IDA PM D: 12837694 protein 2, protein from Homo sapiens MUSCHUMAN Musculin, protein from Oni TAS PM D: 9584154 Homo sapiens MUTYH HUMAN Splice Isoform Alpha-1 of Oni TAS PM D: 7823963 A G-specific adenine DNA glycosylase, protein rom Homo sapiens MVP HUMAN Major vault protein, Oni TAS PM D: 7585.126 protein from Homo sapiens MX2HUMAN interferon-induced GTP Oni TAS PM D: 8798.556 binding protein MX2, protein from Homo sapiens MXI1 HUMAN Splice Isoform 1 of MAX Oni TAS PM D: 8425219 interacting protein 1, protein from Homo sapiens MYBA HUMAN Myb-related protein A, Oni NAS PM D: 8058310 protein from Homo sapiens MYCHUMAN Myc proto-oncogene protein, Oni IDA PM D: 15994.933 protein from Homo sapiens MYCBP HUMAN C-Myc binding protein, Oni TAS PM D: 97974.56 protein from Homo sapiens MYCNHUMAN N-myc proto-oncogene Oni TAS PM D: 37966O7 protein, protein from Homo sapiens MYF6 HUMAN Myogenic factor 6, Oni TAS PM D: 2311584 protein from Homo sapiens MYOD1 HUMAN Myoblast determination Oni TAS PM D: 3175662 protein 1, protein from Homo sapiens MYST2HUMAN Histone acetyltransferase Oni TAS PM D: 10438470 MYST2, protein from Homo sapiens MYT1 HUMAN Myelin transcription factor Oni NAS PM D: 128O325 protein from Homo sapiens NAB1 HUMAN Splice Isoform Long of Oni NAS PM D: 8668170 NGFI-A binding protein protein from Homo sapiens NARG1 HUMAN Splice Isoform 1 of NMDA Oni IDA PM D: 12145306 receptor regulated protein 1, protein from Homo sapiens IDA PM D: 1214O756 NARGLHUMAN Splice Isoform 1 of NMDA Oni ISS Oni Prot: Q6NO69 receptor regulated 1-like protein, protein from Homo sapiens NASPHUMAN Splice Isoform 1 of Oni TAS PM D: 1426632 Nuclear autoantigenic sperm protein, protein from Homo sapiens NCBP2 HUMAN Nuclear cap binding Oni NAS PM D: 7651522 protein subunit 2, protein from Homo sapiens NCOA1 HUMAN Splice Isoform 1 of Oni TAS PM D: 9223431 Nuclear receptor coactivator 1, protein from Homo sapiens NCOA2 HUMAN Nuclear receptor coactivator Oni NAS Oni Prot: Q15596 2, protein from Homo sapiens US 8,148,129 B2 87 88 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence NCOA3 HUMAN Splice Isoform 1 of Oni NAS PM D: 97.410321 Nuclear receptor coactivator 3, protein from Homo sapiens NAS Oni NCOA4 HUMAN Splice Isoform Alpha of Oni TAS PM Nuclear receptor coactivator 4, protein from Homo sapiens NCOA6 HUMAN Nuclear receptor coactivator Oni IDA PM D: 1144.3112 6, protein from Homo sapiens NAS PM D: 105674.04 NCOR2 HUMAN Nuclear receptor co-repressor Oni TAS PM D: 10097068 2, protein from Homo sapiens NDKAHUMAN Nucleoside diphosphate Oni NAS Oni Prot: P15531 kinase A, protein from Homo sapiens TAS PM D: 16130169 NDKBHUMAN Nucleoside diphosphate Oni NAS Oni Prot: P22392 kinase B, protein from Homo sapiens NEDD8 HUMAN NEDD8 precursor, Oni TAS PM D: 9353319 protein from Homo sapiens NEK1 HUMAN Splice Isoform 1 of Oni IDA PM D: 15604234 Serine/threonine-protein kinase Nek1, protein rom Homo sapiens NEK3 HUMAN Serine/threonine-protein Oni NAS PM D: 7522O34 kinase Nek3, protein rom Homo sapiens NELFE HUMAN Splice Isoform 1 of Negative Oni NAS PM D: 2119325 elongation factor E, protein from Homo sapiens NFAC2 HUMAN Splice Isoform C of Oni TAS PM D: 8668213 Nuclear factor of activated T-cells, cytoplasmic 2, protein from Homo sapiens HUMAN Splice Isoform C of Nuclear Oni TAS PM D: 10051678 actor of activated T cells 5, protein from Homo sapiens NFE2 HUMAN Transcription factor NF Oni TAS PM D: 7774O11 E245 kDa subunit, protein from Homo sapiens NFIA HUMAN Nuclear factor 1 A-type, Oni NAS PM D: 7590749 protein from Homo sapiens NFIBHUMAN Splice Isoform 1 of Oni TAS PM D: 7590749 Nuclear factor 1 B-type, protein from Homo sapiens NFKB2 HUMAN Splice Isoform 1 of Oni IDA PM D: 15677444 Nuclear factor NF-kappa B p100 subunit, protein rom Homo sapiens NFS1 HUMAN Cysteine desulfurase, Oni TAS PM D: 98.85568 mitochondrial precursor, protein from Homo sapiens NFYAHUMAN Splice Isoform Long of Oni IDA PM D: 15243141 Nuclear transcription actorY subunit alpha, protein from Homo sapiens NFYB. HUMAN Nuclear transcription Oni IEP PM D: 15243141 actor Y subunit beta, protein from Homo sapiens NFYC HUMAN Splice Isoform 3 of Oni IEP PM D: 15243141 Nuclear transcription actor Y subunit gamma, protein from Homo sapiens NHRF2HUMAN Splice Isoform 1 of Oni TAS PM D:9054412 Na(+)/H(+) exchange regulatory cofactor NHE-RF2, protein from Homo sapiens NKRF HUMAN NF-kappa-B-repressing Oni IDA PM D: 10562553 actor, protein from Homo sapiens NKX31 HUMAN Splice Isoform 1 of Oni NAS PM D: 11137288 Homeobox protein Nkx-3.1, protein from Homo sapiens NLKHUMAN Serine/threonine kinase NLK, Oni ISS Oni Prot: Q9UBE8 protein from Homo sapiens US 8,148,129 B2 89 90 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence NMES1 HUMAN Normal mucosa of UniPro IDA PM : 12209954 esophagus-specific gene 1 protein, protein from Homo sapiens NMNA1 HUMAN Nicotinamide UniPro IDA PM : 11248244 mononucleotide adenylyltransferase 1, protein from Homo sapiens NNP1 HUMAN NNP-1 protein, protein UniPro TAS PM : 9192856 rom Homo sapiens NOCT HUMAN Nocturnin, protein from UniPro TAS PM : 105.21507 Homo sapiens NOG2 HUMAN Nucleolar GTP-binding UniPro TAS PM : 8822211 protein 2, protein from Homo sapiens NONO HUMAN Non-POU domain UniPro TAS PM : 9360842 containing octamer binding protein, protein rom Homo sapiens NOTC1 HUMAN Neurogenic notch UniPro TAS PM : 10713164 homolog protein 1 precursor, protein from Homo sapiens NOTC2 HUMAN Neurogenic locus notch UniPro IDA PM : 1303260 homolog protein 2 precursor, protein from Homo sapiens NOTC4 HUMAN Splice Isoform 1 of UniPro TAS PM : 8681805 Neurogenic locus notch homolog protein 4 precursor, protein from Homo sapiens NP1L2 HUMAN Nucleosome assembly UniPro TAS PM : 8789438 protein 1-like 2, protein rom Homo sapiens NPM2 HUMAN Nucleoplasmin-2, protein UniPro IEP PM : 1271.4744 rom Homo sapiens NPMHUMAN Nucleophosmin, protein UniPro IDA PM : 1208O348 rom Homo sapiens TAS PM : 16130169 NR1D1 HUMAN Orphan nuclear receptor UniPro TAS PM : 8622974 NR1D1, protein from Homo sapiens NR1D2 HUMAN Orphan nuclear receptor UniPro TAS PM : 7997240 NR1D2, protein from Homo sapiens NR1EH2 HUMAN Oxysterols receptor LXR UniPro TAS PM : 7926814 beta, protein from Homo Sapiens NR1E3 HUMAN Splice Isoform 1 of UniPro TAS PM : 7744246 Oxysterols receptor LXR alpha, protein from Homo Sapiens NR2E3 HUMAN Splice Isoform Long of UniPro TAS PM : 1022O376 Photoreceptor-specific nuclear receptor, protein from Homo sapiens NR4A2 HUMAN Orphan nuclear receptor UniPro TAS PM : 7877627 NR4A2, protein from Homo sapiens NR4A3 HUMAN Nuclear receptor UniPro NAS PM : 8634690 Subfamily 4, group A, member 3 isoform b, protein from Homo sapiens NRSA2 HUMAN Splice Isoform 2 of Orphan UniPro TAS PM : 9786908 nuclear receptor NR5A2, protein from Homo sapiens NRIF3 HUMAN Splice Isoform 2 of UniPro TAS PM : 10490654 Nuclear receptor interacting factor 3, protein from Homo sapiens NRIP1 HUMAN Nuclear receptor UniPro IDA PM : 7641693 interacting protein 1, protein from Homo sapiens IDA PM : 12773562 IDA PM : 11266503 NRL HUMAN Neural retina-specific UniProt TAS PM : 89398.91 eucine Zipper protein, protein from Homo sapiens US 8,148,129 B2 91 92 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence NSBP1. HUMAN Nucleosomal binding protein Oni NAS PM D: 11161810 protein from Homo sapiens NSG1 HUMAN Neuron-specific protein Oni TAS PM D:901.3775 amily member 1, protein rom Homo sapiens NTSCHUMAN Splice Isoform 1 of 5"(3)- Oni TAS PM D: 10702291 deoxyribonucleotidase, cytosolic type, protein rom Homo sapiens NTHL1 HUMAN Endonuclease III-like protein Oni IDA PM D: 12531031 1, protein from Homo sapiens NUMA1 HUMAN Splice Isoform 1 of Nuclear Oni TAS PM D: 1541636 mitotic apparatus protein 1, protein from Homo sapiens NUPL2 HUMAN Splice Isoform 1 O Oni TAS PM D: 10358091 -like 2, protein from Homo sapiens NUPR1 HUMAN Nuclear protein 1, protein Oni IDA PM D: 10092.851 rom Homo sapiens NWL HUMAN Splice Isoform 1 O Oni TAS PM D: 9286697 Nuclear valosin containing protein-like, protein from Homo sapiens NXF2HUMAN Nuclear RNA export factor Oni ISS Oni Prot: Q9GZYO 2, protein from Homo sapiens NXF3 HUMAN Nuclear RNA export factor Oni IDA PM D: 11545741 3, protein from Homo sapiens NXFS HUMAN Splice Isoform A of Nuclear Oni IDA PM D: 11566096 RNA export factor 5, protein from Homo sapiens OOO290 Adenovirus E3-14.7K Oni IDA PM D: 11073942 interacting protein 1, protein from Homo sapiens OOO366 Putative p150, protein Oni ISS Oni Prot: OOO366 rom Homo sapiens O14777 Retinoblastoma Oni TAS PM D: 9315664 associated protein HEC, protein from Homo sapiens O14789 Testis-specific BRDT protein, Oni TAS PM D: 93.67677 protein from Homo sapiens O15125 Alternative spliced form Oni IDA PM D: 9230210 of p15 CDK inhibitor, protein from Homo sapiens O15150 Cerebrin-50, protein from Oni TAS PM D: 93.73037 Homo sapiens O151.83 Trinucleotide repeat DNA Oni TAS PM D: 8626781 binding protein p20-CGGBP, protein from Homo sapiens CAGH3, protein from Oni TAS PM D: 922598O Homo sapiens O43148 MRNA (Guanine-7-) Oni TAS PM D: 9705270 methyltransferase, protein from Homo sapiens O43.245 Protein pé5, protein from Oni NAS PM D: 87O6045 Homo sapiens O43663 Protein regulating Oni TAS PM D: 98.85575 cytokinesis 1, protein rom Homo sapiens O43719 HIV TAT specific factor Oni TAS PM D: 104.54543 protein from Homo sapiens O43.809 Pre-mRNA cleavage Oni TAS PM D: 965992.1 actor I 25 kDa subunit, protein from Homo sapiens O43812 Homeobox protein DUX3, Oni TAS PM D: 97.36770 protein from Homo sapiens O60519 Crebinding protein-like 2, Oni TAS PM D: 9693O48 protein from Homo sapiens O60592 Arg, Abl-interacting Oni TAS PM D: 921 1900 protein ArgBP2a, protein rom Homo sapiens O60593 SORBS2 protein, protein Oni NAS PM D: 921 1900 rom Homo sapiens O60671 Cell cycle checkpoint Oni IC PM D: 966O799 protein Hrad1, protein rom Homo sapiens O60870 Kin17 protein, protein Oni TAS PM D: 1923.796 rom Homo sapiens US 8,148,129 B2 93 94 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence O755.25 T-Star, protein from Oni TAS PMID: 10332027 Homo sapiens O75530 Embryonic ectoderm Oni NAS PMID: 95841.99 development protein homolog, protein from Homo sapiens O75766 TRIP protein, protein Oni TAS PMID: 9705290 rom Homo sapiens O75799 Transcription repressor, Oni NAS PMID: 9705290 protein from Homo sapiens O75805 HOXA-9A, protein from Oni NAS UniProt: O758OS Homo sapiens O75806 HOXA-9B, protein from Oni NAS UniProt: O75806 Homo sapiens O94992 HEXIM1 protein, protein Oni IDA PMID: 12581153 rom Homo sapiens O95082 EH-binding protein, Oni TAS PMID: 10644451 protein from Homo sapiens O95133 SOX-29 protein, protein Oni NAS UniProt: O95133 rom Homo sapiens O95268 Origin recognition Oni NAS PMID: 9765232 complex subunit ORC5T, protein from Homo sapiens O95273 D-type cyclin-interacting Oni IDA PMID: 12437976 protein 1, protein from Homo sapiens O95391 Step II splicing factor SLU7, Oni NR UNIPROT: O95391 protein from Homo sapiens O95443 AT rich interactive domain 3B Oni NAS PMID: 104.46990 (BRIGHT-like) protein, protein from Homo sapiens O95480 Hypothetical protein, Oni NAS UniProt: O9548O protein from Homo sapiens O95926 Hypothetical protein Uni NAS PMID: 11118353 DKFZp564O2082, protein from Homo sapiens OGT1 HUMAN Splice Isoform 2 of UDP Oni TAS PMID:9083067 N-acetylglucosamine-- peptide N acetylglucosaminyltransferase 110 kDa subunit, protein from Homo sapiens OI106 HUMAN Splice Isoform 1 of 106 kDA Oni ISS UniProt: Q9UPV9 O-GlcNAc transferase interacting protein, protein from Homo sapiens ORC1 HUMAN Origin recognition Oni TAS PMID: 7502O77 complex subunit 1, protein from Homo sapiens ORC2 HUMAN Oni TAS PMID: 88O8289 protein from Homo sapiens ORC4 HUMAN Origin recognition Oni TAS PMID: 9353.276 complex subunit 4, protein from Homo sapiens ORCS HUMAN Origin recognition Oni TAS PMID: 9765232 complex subunit 5, protein from Homo sapiens OTX1 HUMAN Homeobox protein OTX1, Oni NAS UniProt: P32242 protein from Homo sapiens OTX2 UMAN Homeobox protein OTX2, Oni NAS UniProt: P32243 protein from Homo sapiens OVOL1 HUMAN Putative transcription Oni NAS UniProt: O147S3 actor Ovo-like 1, protein rom Homo sapiens OZF HUMAN Zinc finger protein OZF, Oni TAS PMID: 866S923 protein from Homo sapiens PS3 HUMAN Splice Isoform 1 of Oni IDA PMID: 7720704 Cellular tumor antigen p53, protein from Homo sapiens P66A HUMAN Splice Isoform 1 of Oni IDA PMID: 12183469 Transcriptional repressor p66 alpha, protein from Homo sapiens US 8,148,129 B2 95 96 -continued

Symbol Evi Refer Qualifier Sequence/GOst Information Source dence ence ISS UniProt: Q96F28 P73L HUMAN Splice Isoform 1 of UniPro IDA PMID: 12446.779 Tumor protein -like, protein from Homo sapiens P78365 Polyhomeotic 2 homolog, UniPro TAS PMID: 9121482 protein from Homo sapiens P80CHUMAN , protein from UniPro TAS PMID: 7971277 Homo sapiens PA2G4 HUMAN Proliferation-associated UniPro IDA PMID: 15073182 protein 2G4, protein from Homo sapiens PAPOAHUMAN Poly(A) Polymerase alPha, UniPro TAS PMID: 8302877 protein from Homo sapiens PAR6A HUMAN Splice Isoform 1 of UniPro ISS UniProt: Q9NPB6 Partitioning defective 6 homolog alpha, protein rom Homo sapiens PARK7 HUMAN Protein DJ-1, protein UniPro IDA PMID: 12446870 rom Homo sapiens PARN HUMAN Poly(A)-specific UniPro TAS PMID: 97.3662O ribonuclease PARN, protein from Homo sapiens PARP1 HUMAN Poly ADP-ribose UniPro TAS PMID: 2S13174 polymerase 1, protein rom Homo sapiens PARP4 HUMAN Poly ADP-ribose UniPro NAS PMID: 106444.54 polymerase 4, protein rom Homo sapiens PARP9 HUMAN Splice Isoform 1 of Poly UniPro TAS PMID: 11110709 ADP-ribose polymerase 9, protein from Homo sapiens PAWR HUMAN PRKC apoptosis WT1 UniPro NAS UniProt: Q96IZO regulator protein, protein rom Homo sapiens PAX8 HUMAN Splice Isoform 1 of Paired UniPro NAS UniProt: Q16339 box protein Pax-8, protein from Homo sapiens PAX9 HUMAN Paired box protein Pax-9, UniPro NAS UniProt: P55771 protein from Homo sapiens PBX1 HUMAN Splice Isoform PBX1a of UniPro ISS UniProt: P40424 Pre-B-cell leukemia transcription factor 1, protein from Homo sapiens PBX3 HUMAN Splice Isoform PBX3a of UniPro ISS UniProt: P40426 Pre-B-cell leukemia transcription factor 3, protein from Homo sapiens PBX4 HUMAN Pre-B-cell leukemia UniPro ISS UniProt: Q9BYU1 transcriPtion factor 4, protein from Homo sapiens PCAF HUMAN Histone acetyltransferase UniPro TAS PMID: 10891SO8 PCAF, protein from Homo sapiens PCBP1 HUMAN Poly(rC)-binding protein UniPro NAS UNIPROT: Q15365 protein from Homo sapiens PCBP2 HUMAN Poly(rC)-binding protein UniPro NAS UniProt: Q15366 2, protein from Homo sapiens PDCD8 HUMAN Splice Isoform 1 of UniPro TAS PMID: 998.9411 Programmed cell death protein 8, mitochondrial precursor, protein from Homo sapiens PDZK3 HUMAN Splice Isoform 1 of PDZ UniProt ISS PMID: 12671 68S domain containing protein 3, protein from Homo sapiens PEPP1 HUMAN Paired-like homeobox UniProt IDA PMID: 11980S63 protein PEPP-1, protein rom Homo sapiens PERMHUMAN Splice Isoform H17 of UniProt TAS PMID: 282922O Myeloperoxidase precursor, protein from Homo sapiens PFDS HUMAN Prefoldin subunit 5, UniProt TAS PMID:9792694 protein from Homo sapiens PFTK1 HUMAN Splice Isoform 1 of UniProt TAS PMID: 9202329 Serine/threonine-protein kinase PFTAIRE-1, protein from Homo sapiens US 8,148,129 B2 97 98 -continued

Symbol Evi Refer Qualifier Sequencef GOst information Source dence ence PGEA1 HUMAN Chibby protein, protein Oni IDA PM D: 12712206 rom Homo sapiens PGH1 HUMAN Cyclooxygenase 1b3, Oni ISS Oni Prot: P23219 protein from Homo sapiens PGH2 HUMAN Prostaglandin G/H Oni ISS Oni Prot: P35354 synthase 2 precursor, protein from Homo sapiens PHB HUMAN Prohibitin, protein from Oni TAS PM D: 16130169 Homo sapiens PHC1 HUMAN Polyhomeotic-like protein Oni TAS PM D: 9121482 protein from Homo sapiens PHF12 HUMAN Splice Isoform 2 of PHD Oni IDA PM D: 11390640 finger protein 12, protein rom Homo sapiens PHF2HUMAN PHD finger protein 2, Oni TAS PM D: 10051327 protein from Homo sapiens PIAS1 HUMAN Protein inhibitor of Oni TAS PM D: 9724754 activated STAT protein 1, protein from Homo sapiens PIAS4 HUMAN Protein inhibitor of Oni IDA PM D: 1124.8056 activated STAT protein 4, protein from Homo sapiens NAS PM : 9724754 PIN1 HUMAN Peptidyl-prolyl cis-trans Oni TAS PM : 8606777 isomerase NIMA-interacting protein from Homo sapiens PIREHUMAN Pirin, protein from Homo Oni TAS PM D:90796.76 Sapiens PKP1 HUMAN Splice Isoform 2 of Oni NAS PM D: 936.9526 Plakophilin-1, protein rom Homo sapiens PKP2 HUMAN Splice Isoform 2 of Oni NAS PM D: 8922383 Plakophilin-2, protein rom Homo sapiens PLCB1 HUMAN Splice Isoform A of 1 Oni NAS PM D:1076O467 phosphatidylinositol-4,5- bisphosphate phosphodiesterase beta 1, protein from Homo sapiens PMLHUMAN Splice Isoform PML-1 of Oni IDA PM D: 92941.97 Probable transcription actor PML, protein from Homo sapiens PMS1 HUMAN PMS1 protein homolog 1, Oni TAS PM D: 80725.30 protein from Homo sapiens PMS2 HUMAN Postmeiotic segregation Oni TAS PM D: 80725.30 increased 2 nirs variant 2, protein from Homo sapiens PNKP HUMAN Bifunctional polynucleotide Oni IDA PM D: 10446193 phosphatase kinase, protein from Homo sapiens PNRC1 HUMAN Proline-rich nuclear Oni TAS PM D: 7578250 receptor coactivator 1, protein from Homo sapiens PO2F1 HUMAN Splice Isoform 1 of POU Oni IDA PM D: 11891224 domain, class 2, transcription factor 1, protein from Homo sapiens POSFL HUMAN POU domain, class 5, Oni TAS Oni Prot: Q06416 transcription factor 1-like protein 1, protein from Homo sapiens PO6F2HUMAN Splice Isoform 1 of POU Oni IC PM D: 86O1806 domain, class 6, transcription factor 2, protein from Homo sapiens POLS HUMAN DNA polymerase sigma, Oni IDA PM D: 1 OO66793 protein from Homo sapiens POP7 HUMAN Ribonuclease P protein Oni TAS PM D: 963O247 subunit p20, protein from Homo sapiens PP2AA HUMAN Serine/threonine protein Oni NAS PM D: 11007961 phosphatase 2A, catalytic Subunit, alpha isoform, protein from Homo sapiens US 8,148,129 B2 99 100 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence PP2CD HUMAN Protein phosphatase 2C Oni TAS PM D: 9177166 isoform delta, protein rom Homo sapiens PP2CE HUMAN Splice Isoform 1 of Oni ISS Oni Prot: Q96 MI6 Protein phosphatase 2C isoform eta, protein from Homo sapiens PP2CG HUMAN Protein phosphatase 2C Oni TAS PM 92.71424 isoform gamma, protein rom Homo sapiens PP4CHUMAN Serine/threonine protein Oni NAS Oni Prot: P60510 phosphatase 4 catalytic Subunit, protein from Homo sapiens PPARAHUMAN Peroxisome proliferator Oni TAS PM : 16271 724 activated receptor alpha, protein from Homo sapiens PPARB HUMAN Splice Isoform 1 of Oni NAS PM : 11551955 Peroxisome proliferator activated receptor delta, protein from Homo sapiens PPIE HUMAN Splice Isoform A of Oni IDA PM : 11313484 Peptidyl-prolyl cis-trans isomerase E, protein from Homo sapiens Splice Oni TAS PM : 91.533O2 Peptidyl-prolyl cis-trans isomerase G., protein rom Homo sapiens Splice Isoform 1 of Oni TAS PM Peptidyl-prolyl cis-trans isomerase-like 2, protein rom Homo sapiens PPP5 HUMAN Serine/threonine protein Oni TAS PM : 7925273 phosphatase 5, protein rom Homo sapiens PPRB. HUMAN Splice Isoform 1 of Oni IDA PM : 10235.267 Peroxisome proliferator activated receptor-binding protein, protein from Homo sapiens PQBP1 HUMAN Splice Isoform 1 of Oni TAS PM : 10198427 Polyglutamine-binding protein 1, protein from Homo sapiens PRD1S HUMAN PR-domain zinc finger Oni NAS Oni Prot: P57071 protein 15, protein from Homo sapiens PRD16 HUMAN Splice Isoform 1 of PR Oni IC PM 11OSOOOS domain Zinc finger protein 16, protein from Homo sapiens PRDM2 HUMAN Splice Isoform 1 of PR Oni NAS PM : 7590293 domain Zinc finger protein 2, protein from Homo sapiens NAS PM : 7538672 PREB HUMAN Prolactin regulatory Oni TAS PM : 10194769 element-binding protein, protein from Homo sapiens PRGC1 HUMAN Peroxisome proliferator Oni TAS PM : 12588.810 activated receptor gamma coactivator 1-alpha, protein from Homo sapiens PRP16 HUMAN Pre-mRNA splicing factor Oni NAS PM : 95241.31 ATP-dependent RNA helicase PRP16, protein rom Homo sapiens PRS6A HUMAN 26S protease regulatory Oni TAS PM : 2194290 subunit 6A, protein from Homo sapiens PSA1 HUMAN Splice Isoform Short of Oni TAS PM : 7681.138 Proteasome subunit alpha type 1, protein from Homo sapiens PSA3 HUMAN Proteasome subunit alpha Oni TAS PM : 16130169 type 3, protein from Homo sapiens US 8,148,129 B2 101 102 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence PSB4 HUMAN Proteasome subunit beta Uni Prot TAS PM D: 16130169 type 4 precursor, protein rom Homo sapiens PSF1 HUMAN DNA replication complex Uni Prot IDA PM D: 109425.95 GINS protein PSF1, protein from Homo sapiens PTDSRHUMAN Splice Isoform 1 of Uni Prot IDA PM D: 1472906S Protein PTDSR, protein rom Homo sapiens PTHR1 HUMAN Parathyroid Uni Prot TAS PM D: 10709993 hormone/parathyroid hormone-related peptide receptor precursor, protein from Homo sapiens PTMAHUMAN Prothymosin alpha, Oni TAS PM D: 10854O63 protein from Homo sapiens PTMS HUMAN Parathymosin, protein Oni TAS PM D: 10854O63 rom Homo sapiens PTTG1 HUMAN Securin, protein from Oni TAS PM D: 9811450 Homo sapiens PTTG HUMAN Pituitary tumor-transforming Oni IDA PM D: 10781-616 gene 1 protein-interacting protein precursor, protein rom Homo sapiens PWP1 HUMAN Periodic tryptophan Oni TAS PM D: 7828893 protein 1 homolog, protein from Homo sapiens Q02313 Kruppel-related Zinc Oni NAS Oni Prot: Q02313 finger protein, protein rom Homo sapiens Q03989 ARID5A protein, protein Oni IC PM D: 1564O446 rom Homo sapiens 2771 P37 AUF1, protein from Oni NAS PM D: 8246982 Homo sapiens 2869 Rkappa B, protein from Oni NR Oni Prot: Q12869 Homo sapiens 3028 Homeo box protein, Oni NAS PM D: 7647458 protein from Homo sapiens 3051 , protein Oni NAS PM D: 87992OO rom Homo sapiens 3127 REST protein, protein Oni TAS PM D: 7697725 rom Homo sapiens 3137 NDP52, protein from Oni TAS PM D: 7S4O613 Homo sapiens 3395 TAR RNA loop binding Oni TAS PM D: 8846792 protein, protein from Homo sapiens 3826 Autoantigen, protein Oni IDA PM D: 752O377 rom Homo sapiens 3862 DNA-binding protein, Oni TAS PM D: 7887923 protein from Homo sapiens 3901 Hypothetical protein C1D, Oni TAS PM D: 94698.21 protein from Homo sapiens 4211 E4BP4 protein, protein Oni TAS PM D: 75.65758 rom Homo sapiens 4333 Facioscapulohumeral Oni NAS Oni Prot: Q14333 muscular dystrophy, protein from Homo sapiens HCREM 1alpha protein, Oni NAS PM D: 82O6879 protein from Homo sapiens 4503 HCREM 2beta-a protein, Oni NAS PM D: 82O6879 protein from Homo sapiens 4548 HOX2.8 protein, protein Oni IDA PM D: 1871139 rom Homo sapiens 4561 HPX-5 protein, protein Oni NAS PM D: 7518789 rom Homo sapiens 46SS C-MYC promoter-binding Oni NAS Oni Prot: Q14655 protein IRLB, protein rom Homo sapiens Q14820 ZFM1 protein, alternatively Oni NAS PM D: 79.121.30 spliced product, protein from Homo sapiens Q14869 MSSP-2 protein, protein Oni NAS PM D: 78.38710 from Homo sapiens Q14901 Myc protein, protein from Oni NAS PM D: 2834731 Homo sapiens US 8,148,129 B2 103 104 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence Q15156 PML-RAR protein, UniPro NAS UniProt: Q15156 protein from Homo sapiens 5170 Pp21 protein, protein UniPro TAS PMID: 7971997 rom Homo sapiens 5270 HPX-153 protein, protein UniPro NAS UniProt: Q15270 rom Homo sapiens S288 No distinctive protein UniPro NAS PMID: 8543184 motifs; ORF, protein rom Homo sapiens 5299 RARB protein, protein UniPro IDA PMID: 2177841 rom Homo sapiens 5325 DNA-binding protein, UniPro TAS PMID: 3174636 protein from Homo sapiens 5327 Nuclear protein, protein UniPro TAS PMID: 7730328 rom Homo sapiens S361 Transcription factor, UniPro NAS PMID: 7597036 protein from Homo sapiens 5376 Y-chromosome RNA UniPro NAS PMID: 9598.316 recognition motif protein, protein from Homo sapiens S381 Y-chromosome RNA UniPro TAS PMID: 9598.316 recognition motif protein, protein from Homo sapiens 5435 Yeastsds22 homolog, UniPro TAS PMID: 7498485 protein from Homo sapiens 5552 CACCC box-binding protein, UniPro TAS PMID: 8355710 protein from Homo sapiens 5574 Hypothetical protein TAF1B, UniPro NAS PMID: 78O1123 protein from Homo sapiens 5736 Zinc finger protein 223, UniPro NAS UniProt: Q9UMWO protein from Homo sapiens S936 Zinc-finger protein, UniPro NAS UniProt: Q15936 protein from Homo sapiens 6247 Histone H1 transcription UniPro NAS PMID: 79691.68 actor large subunit 2A, protein from Homo sapiens GATA-4 transcription factor, UniPro NAS PMID: 7791790 protein from Homo sapiens Q16464 Chromosome 17q21 UniPro NAS UniProt: Q16464 mRNA clone 694:2. protein from Homo sapiens Q16624 Long overlapping ORF, UniPro NAS PMID: 3265124 protein from Homo sapiens Q16630 HPBRII-4 mRNA, UniPro TAS PMID: 96599.21 protein from Homo sapiens Q16670 Transcriptional regulator UniPro NAS PMID: 1569959 SCAN domain containing protein, protein from Homo sapiens OTTHUMPOOOOOO28668, UniPro ISS UniProt: QSW1B6 protein from Homo sapiens Hypothetical protein UniPro ISS UniProt: Q6ZNA8 FLJ16262, protein from Homo sapiens ZNF367, protein from UniPro IDA PMID: 15344908 Homo sapiens Discs large homolog7; UniPro ISS UniProt: Q86T11 Drosophila Discs large-1 tumor Suppressor-like; hepatoma up-regulateD protein, protein from Homo sapiens TCFL5 protein, protein UniProt ISS UniProt: Q86TP4 rom Homo sapiens BRUNOL4 protein, UniProt ISS UniProt: Q86XB9 protein from Homo sapiens DNA cytosine UniProt ISS PMID: 12138111 methyltransferase 3alpha, isoforma, protein from Homo sapiens P621, protein from Homo UniProt IDA PMID: 1266SS82 Sapiens Peroxisome proliferator UniProt ISS UniProt: Q86YN6 activated receptor gamma coactivator 1beta-1a, protein from Homo sapiens US 8,148,129 B2 105 106 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence CGI-121 L1 isoform, Oni Pro NAS PM D: 1265983O protein from Homo sapiens Early hematopoietic Zinc Oni Pro DA PM D: 12393497 finger, protein from Homo sapiens DNA cytosine Oni DA PM D: 12138111 methyltransferase 3alpha isoform b, protein from Homo sapiens KLF4 protein, protein Oni SS Oni Prot: Q8N717 rom Homo sapiens Hypothetical protein Oni SS Oni Prot: Q8N9B5 FLJ37870, protein from Homo sapiens Homeoprotein MBX-L, Oni SS Oni protein from Homo sapiens Homeoprotein MBX-S, Oni SS Oni protein from Homo sapiens V- musculoaponeurotic Oni DA PM D: 12368,292 fibrosarcoma oncogene homologA, protein from Homo sapiens Q8TALO PPARGC1B protein, Oni SS Oni Prot: Q8TALO protein from Homo sapiens TRAF6-binding zinc Oni DA PM D: 11751921 finger protein, protein rom Homo sapiens PGC-1-related estrogen Oni SS PM D: 1071316S receptor alpha coactivator short isoform, protein rom Homo sapiens DA PM D: 11854.298 Adaptor protein FE65a2, Oni SS Oni Prot: Q8TEY4 protein from Homo sapiens Myoneurin, protein from Oni DA PM D: 11598.191 Homo sapiens Brain-muscle-ARNT-like Oni DA PM D: 12055078 transcription factor 2a, protein from Homo sapiens Q92657 HP8 peptide, protein Oni NAS PM D: 8758458 rom Homo sapiens Q92728 RB1 protein, protein from Oni NAS PM D: 3413073 Homo sapiens S100P binding protein Oni DA PM D: 15632OO2 Riken, isoform a protein rom Homo sapiens Hypothetical protein Oni DA PM D: 158434OS FLJ32915, protein from Homo sapiens Transcription factor Oni DA PM D: 15994.933 RAM2 splice variant c, protein from Homo sapiens TRAP/Mediator complex Oni DA PM D: 10235.267 component TRAP25, protein from Homo sapiens JADE1L protein, protein Oni SS Oni rom Homo sapiens Muscle alpha-kinase, Oni SS Oni Prot: Q96L96 protein from Homo sapiens Hypothetical protein Oni SS Oni Prot: Q96MH2 FLJ32384, protein from Homo sapiens Mid-1-related chloride Oni SS PM D: 11279057 channel 1, protein from Homo sapiens Hypothetical protein Oni DA PM D: 12169691 FLJ14714, protein from Homo sapiens Q994.19 ICSAT transcription factor, Oni SS Oni Prot: Q994.19 protein from Homo sapiens Q99638 RAD9A protein, protein Oni TAS PM D: 8943O31 from Homo sapiens Q99718 ESE-1a, protein from Oni NAS PM D: 92347OO Homo sapiens LOC55974 protein, Oni IC PM protein from Homo sapiens US 8,148,129 B2 107 108 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence Breast cancer antigen NY UniProt NAS PMID: 11280766 BR-1, protein from Homo Sapiens BHLH factor Hes7, UniProt NAS PMID: 1126O262 protein from Homo sapiens Nucleophosmin B23.2, UniProt ISS UniProt: Q9BYG9 protein from Homo sapiens MORF/CBP protein, UniProt NAS PMID: 111578O2 protein from Homo sapiens Putative chromatin UniProt NAS UniProt: Q9BZ95 modulator, protein from Homo sapiens Bruno-like 4, RNA UniProt ISS Binding protein; RNA Binding protein BRUNOL-5; CUG-BP and ETR-3 like factor 4, protein from Homo sapiens Trinucleotide repeat UniPro NAS PMID: 11158314 containing 4, protein rom Homo sapiens Kappa B and V(D)J UniPro NAS UniProt: Q9BZSO recombination signal sequences binding protein, protein from Homo sapiens NK6 transcription factor UniPro NAS PMID: 112101.86 related, locus 2, protein rom Homo sapiens CTCL tumor antigen se20-4, UniPro IDA PMID: 1139S479 protein from Homo sapiens alpha, UniPro NAS UniProt: Q9H2M1 protein from Homo sapiens Cycle-like factor CLIF, UniPro NAS PMID: 11018023 protein from Homo sapiens Zinc finger transcription UniPro TAS PMID: 10978.333 actor Eos, protein from Homo sapiens ARTS protein, protein UniPro NAS PMID: 11146656 rom Homo sapiens Probable ATP-dependent UniPro NAS PMID: 110241.37 RNA helicase DDX47, protein from Homo sapiens DJ875K15.1.1, protein UniPro ISS UniProt: Q9H509 rom Homo sapiens GTPase-interacting protein 2, UniPro IDA PMID: 11073942 protein from Homo sapiens Beta protein 1 BP1, UniPro NAS PMID: 11069021 protein from Homo sapiens Lim-homeobox transcription UniPro NAS UniProt: Q9HBU2 actor LHX3, protein from Homo sapiens Pre-B-cell leukemia UniPro NAS PMID: 1082S160 transcription factor interacting protein 1, protein from Homo sapiens High-mobility group 20A UniPro NAS PMID: 10773667 variant, protein from Homo sapiens Mesenchymal stem cell UniPro NAS PMID: 1111832O protein DSC92, protein rom Homo sapiens OTTHUMPOOOOOO16853, UniPro IDA PMID: 11073942 protein from Homo sapiens Doublesex and mab-3 UniPro NAS UniProt: Q9NQL9 related transcription factor 3, protein from Homo sapiens Ashl, protein from Homo UniPro TAS PMID: 10860993 Sapiens UniPro TAS PMID: 10878.360 p21SNFT, protein from Homo sapiens K562 cell-derived UniPro NAS PMID: 10873651 leucine-zipper-like protein 1, protein from Homo sapiens Hypothetical protein UniPro ISS UniProt: Q9NX07 FLJ20503, protein from Homo sapiens US 8,148,129 B2 109 110 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence RB-associated KRAB Oni TAS PM D: 10702291 repressor, protein from Homo sapiens Ets domain transcription Oni ISS Oni Prot: Q9NZC4 actor, protein from Homo sapiens THY28 protein, protein Oni ISS PM D: 146O1557 rom Homo sapiens open Oni NAS PM D: 10570909 reading frame 5, protein rom Homo sapiens KIAA1536 protein, Oni ISS Oni protein from Homo sapiens SRp25 nuclear protein Oni TAS PM D: 10708573 isoform 2, protein from Homo sapiens Hypothetical protein Oni NR Oni Prot: Q9P2S7 FLJ11063, protein from Homo sapiens 22 Kruppel-related zinc Oni NAS Oni Prot: Q9UC05 finger protein, protein rom Homo sapiens Nuclear receptor subfamily Oni NAS PM D: 7479914 5, group A, member 1, protein from Homo sapiens GHDTA = GROWTH Oni NAS PM D: 7642589 hormone gene-derived transcriptional activatorihepatic nuclear actor-1 alpha homolog, protein from Homo sapiens Surfactant protein B Oni NAS PM D: 7887923 binding protein, protein rom Homo sapiens LBP-1A transcription Oni TAS PM D: 81.14710 actor protein, protein rom Homo sapiens ATF-AO transcription Oni NAS PM D: 8288576 actor protein, protein rom Homo sapiens Cell cycle checkpoint protein, Oni ISS Oni Prot: Q9UEP1 protein from Homo sapiens Putative secreted ligand, Oni NR Oni Prot: Q9UGK6 protein from Homo sapiens RB-binding protein, Oni TAS PM D: 12657635 protein from Homo sapiens Bromodomain protein Oni TAS PM D: 105261.52 CELTIX1, protein from Homo sapiens Nuclear fragile X mental Oni TAS PM D: 105563.05 retardation protein interacting protein 1, protein from Homo sapiens Basal transcriptional Oni TAS PM D: 10648.625 activator haBT1, protein rom Homo sapiens Zinc finger protein 2, isoform Oni NAS Oni Prot: Q9UMC5 a, protein from Homo sapiens ASF1A protein, protein Oni NAS PM D: 107598.93 rom Homo sapiens P53TG1-B, protein from Oni NAS Oni Prot: Q9Y2A1 Homo sapiens P53TG1-C, protein from Oni NAS Oni Prot: Q9Y2A2 Homo sapiens P53TG1-D, protein from Oni NAS Oni Prot: Q9Y2A3 Homo sapiens Testis Zinc finger protein, Oni TAS PM D: 10572087 protein from Homo sapiens CGI-21 protein, protein Oni NR Oni Prot: Q9Y310 rom Homo sapiens My019 protein, protein Oni IDA PM D: 1265983O rom Homo sapiens Androgen-induced Oni TAS PM D: 1021 SO36 prostate proliferative shutoff associated protein, protein from Homo sapiens US 8,148,129 B2 111 112 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence Zinc-finger motif Oni TAS PM D: 93.05772 enhancer binding-protein 1, protein from Homo sapiens Centrosomal protein 1, Oni TAS PM D: 10359848 protein from Homo sapiens Zinc-finger helicase, Oni NAS PM D: 96.88266 protein from Homo sapiens MAB21L2 protein, Oni NR Oni Prot: Q9Y586 protein from Homo sapiens Splice Isoform 1 of Oni TAS PM D: 26O1707 Myelin expression factor 2, protein from Homo sapiens Actin-associated protein Oni TAS PM : 1372044 2E4/kaptin, protein from Homo sapiens SNRPN upstream reading Oni NAS PM : 10318.933 rame protein, protein rom Homo sapiens PTD014, protein from Oni IDA PM : 11073990 Homo sapiens MORC1 protein, protein Oni TAS PM : 1036986S rom Homo sapiens HUEL, protein from Oni IDA PM : 10409434 Homo sapiens Collectin Sub-family Oni ISS PM : 1245O124 member 10, protein from Homo sapiens RS1A1 HUMAN Splice Isoform 1 of RAD51 Oni IC PM : 93968O1 associated protein 1, protein rom Homo sapiens RAS 1B. HUMAN Splice Isoform 2 of DNA Oni TAS PM : 95.12535 repair protein RAD51 homolog2, protein from Homo sapiens RAS1C HUMAN DNA repair protein RAD51 Oni TAS PM : 9469824 homolog 3, protein from Homo sapiens RAS1D HUMAN Splice Isoform 1 of DNA Oni TAS PM : 9570954 repair protein RAD51 homolog4, protein from Homo sapiens RAB3IHUMAN Splice Isoform 2 of RAB3A Oni IDA PM : 12007 189 interacting protein, protein from Homo sapiens RAD18 HUMAN Postreplication repair Oni NAS PM : 1088.4424 protein RAD18, protein rom Homo sapiens RADS1 HUMAN Splice Isoform 1 of DNA Oni ISS Oni Prot: Q06609 repair protein RAD51 homolog 1, protein from Homo sapiens IDA PM D: 12442171 RADS2 HUMAN RAD52 homolog isoform Oni TAS PM D: 7774.919 alpha, protein from Homo Sapiens RADS4 HUMAN DNA repair and Oni TAS PM D: 88053O4 recombination protein RAD54-like, protein rom Homo sapiens RAE1L HUMAN mRNA-associated protein Oni TAS PM D: 93.70289 mirnp 41, protein from Homo sapiens RAG2 HUMAN V(D)J recombination Oni NAS Oni Prot: P55895 , protein from Homo sapiens RANB3 HUMAN Splice Isoform 1 of Ran Oni TAS PM D: 9637251 binding protein 3, protein from Homo sapiens RANB9 HUMAN Splice Isoform 1 of Ran Oni IDA PM D: 12220523 binding protein 9, protein from Homo sapiens RANG HUMAN Ran-specific GTPase Oni TAS PM D: 16130169 activating protein, protein from Homo sapiens RASF1 HUMAN Splice Isoform D of Ras Oni IEP PM D: 14743218 association domain family 1, protein from Homo sapiens US 8,148,129 B2 113 114 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence RASF7 HUMAN Splice Isoform 1 of Ras Oni NAS Oni Prot: Q02833 association domain protein 7, protein from Homo sapiens B. HUMAN Retinoblastoma Oni TAS PM D: 3657987 associated protein, protein from Homo sapiens R BBP4 HUMAN Chromatin assembly Oni TAS PM D: actor 1 subunit C, protein from Homo sapiens R BBPS HUMAN Retinoblastoma-binding Oni IDA PM D: 151991 22 protein 5, protein from Homo sapiens R BBP8 HUMAN RBBP8 protein, protein Oni TAS PM D: 1076481.1 rom Homo sapiens R BM10 HUMAN RNA binding motif Oni NAS Oni Prot: P981.75 protein 10, isoform 1, protein from Homo sapiens R BMS HUMAN RNA-binding protein 5, Oni TAS PM 10352938 protein from Homo sapiens R BM6 HUMAN RNA-binding protein 6, Oni TAS PM D: 10352938 protein from Homo sapiens R BM8A HUMAN Splice Isoform 1 of RNA Oni NAS PM : 11013075 binding protein 8A, protein from Homo sapiens NAS PM : 11030346 R BM9 HUMAN Splice Isoform 1 of RNA Oni IDA PM : 11875103 binding protein 9, protein rom Homo sapiens R BX2 HUMAN Splice Isoform 1 of Oni NAS PM : 10O82581 RING-box protein 2, protein from Homo sapiens BY1A HUMAN RNA-binding motif Oni TAS PM : 9598.316 protein, Y chromosome, amily 1 member A1, protein from Homo sapiens RCL HUMAN c-Myc-responsive protein Oni TAS PM : 9271375 Rcl, protein from Homo Sapiens R UV excision repair protein Oni TAS PM : 81684.82 RAD23 homolog B, protein from Homo sapiens ECQ1 HUMAN ATP-dependent DNA Oni TAS PM : 796.1977 helicase Q1, protein from Homo sapiens ED1 HUMAN Splice Isoform 1 of Oni TAS PM : 899.5285 Double-stranded RNA specific editase 1, protein rom Homo sapiens EN3A HUMAN Splice Isoform 1 of Oni NAS PM 11631.87 Regulator of nonsense transcripts 3A, protein rom Homo sapiens EN3BHUMAN Splice Isoform 1 of Oni NAS PM 11631.87 Regulator of nonsense transcripts 3B, protein rom Homo sapiens ERE HUMAN Splice Isoform 1 of Oni NAS PM O814707 Arginine-glutamic acid dipeptide repeats protein, protein from Homo sapiens ERGHUMAN Ras-related and estrogen Oni IDA PM 1533059 regulated growth inhibitor, protein from Homo sapiens EXO4 HUMAN Splice Isoform 1 of RNA Oni NAS PM O908561 exonuclease 4, protein rom Homo sapiens R FX3 HUMAN Splice Isoform 1 of Oni IC PM 241.1430 Transcription factor RFX3, protein from Homo sapiens R FXS HUMAN DNA-binding protein Oni TAS PM D: 9806546 RFX5, protein from Homo sapiens R HOBHUMAN Rho-related GTP-binding Oni ISS Oni Prot: P62745 protein RhoB, protein rom Homo sapiens US 8,148,129 B2 115 116 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence RING1 HUMAN Polycomb complex Oni IDA PM D: 9.199346 protein RING1, protein rom Homo sapiens RM19 HUMAN 39S ribosomal protein Oni IDA PM D: 109425.95 L19, mitochondrial precursor, protein from Homo sapiens RM4O HUMAN 39S ribosomal protein Oni TAS PM D: 9790763 L40, mitochondrial precursor, protein from Homo sapiens RMP HUMAN RNA polymerase II Oni TAS PM D: 9878255 Subunit 5-mediating protein, protein from Homo sapiens RNF14 HUMAN RING finger protein 14, Oni IDA PM D: 11322894 protein from Homo sapiens RNF4 HUMAN RING finger protein 4, Oni TAS PM D: 9710597 protein from Homo sapiens RNPS1 HUMAN Splice Isoform 1 of RNA Oni TAS PM D: 9580558 binding protein with serine-rich domain 1, protein from Homo sapiens RP14 HUMAN Ribonuclease P protein Oni TAS PM D: 10O24167 Subunit p14, protein from Homo sapiens RP3OHUMAN Ribonuclease P protein Oni TAS PM D: 963O247 subunit p30, protein from Homo sapiens RPB1 HUMAN DNA-directed RNA Oni NAS PM D: 7622O68 polymerase II largest Subunit, protein from Homo sapiens RPB8EHUMAN DNA-directed RNA Oni TAS Oni Prot: PS2434 polymerases I, II, and III 7.1 kDa polypeptide, protein from Homo sapiens RPGFS HUMAN Splice Isoform 1 of Rap Oni IDA PM D: 10486569 guanine nucleotide exchange factor 5, protein rom Homo sapiens RPP38 HUMAN Ribonuclease P protein Oni TAS PM D: 963O247 subunit p38, protein from Homo sapiens RPP4OHUMAN Ribonuclease P protein Oni TAS PM D: 963O247 subunit p40, protein from Homo sapiens RRPS HUMAN RRP5 protein homolog, Oni IDA PM D: 14624448 protein from Homo sapiens RSSAHUMAN 40S ribosomal protein SA, Oni TAS PM D: 16130169 protein from Homo sapiens RUNX1 HUMAN Splice Isoform AML-1B Oni NAS Oni Prot: O60473 of Runt-related transcription actor 1, protein from Homo sapiens RUNX3 HUMAN Splice Isoform 1 of Runt Oni NAS Oni Prot: Q13761 related transcription factor 3, protein from Homo sapiens RUVB1 HUMAN RuvB-like 1, protein from Oni IDA PM D: 98.43967 Homo sapiens RUVB2 HUMAN RuvB-like 2, protein from Oni IDA PM D: 10524211 Homo sapiens S1 OOPHUMAN S-100P protein, protein Oni IDA PM D: 15632OO2 rom Homo sapiens S1 OAB. HUMAN CalgizZarin, protein from Oni TAS PM D: 10851017 Homo sapiens S14L2 HUMAN Splice Isoform 1 of Oni NAS PM D: 11444841 SEC14-like protein 2, protein from Homo sapiens S2A4RHUMAN Splice Isoform 1 of Oni NAS PM D: 1082516.1 GLUT4 enhancer factor DNA binding domain, protein from Homo sapiens SAFEB1 HUMAN Scaffold attachment factor Oni TAS PM D: 1324,173 B, protein from Homo sapiens SALL2HUMAN Sal-like protein 2, protein Oni NAS Oni from Homo sapiens US 8,148,129 B2 117 118 -continued

Symbol Evi Refer Qualifier Sequence/GOst information Source dence ence SAM68 HUMAN Splice Isoform 1 of KH UniProt IDA PMID: 13746.86 domain containing, RNA binding, signal transduction associated protein 1, protein from Homo sapiens SAS10 HUMAN Something about silencing Oni ISS protein 10, protein from Homo sapiens SATB1 HUMAN DNA-binding protein SATB1, Oni TAS PMID: 15OSO28 protein from Homo sapiens SCMH1 HUMAN Splice Isoform 1 of Oni IC PMID: 10524249 Polycomb protein SCMH1, protein from Homo sapiens SCND1 HUMAN SCAN domain-containing Oni ISS UniProt: PS7086 protein 1, protein from Homo sapiens SCRN1 HUMAN Secernin-1, protein from Oni IDA PMID: 10942595 Homo sapiens SCRT1 HUMAN Transcriptional repressor Oni NAS Scratch 1, protein from Homo sapiens SDCB1 HUMAN Syntenin-1, protein from Oni NAS PMID: 111794.19 Homo sapiens SELEB HUMAN Selenocysteine-specific Oni NAS UniProt: P57772 elongation factor, protein from Homo sapiens SENP1 HUMAN Sentrin/SUMO-specific Oni TAS PMID: 10652325 protease 1, protein from Homo sapiens SENP7 HUMAN Similar to SUMO-1- Oni ISS PMID: 10652325 specific protease, protein from Homo sapiens SEPT2 HUMAN Septin-2, protein from Oni IDA PMID: 10942595 Homo sapiens SEPT7 HUMAN Septin-7, protein from Oni IDA PMID: 15485874 Homo sapiens SESN1 HUMAN Splice Isoform T1 of Oni TAS PMID: 9926927 Sestrin-1, protein from Homo sapiens SET7 HUMAN Histone-lysine N Oni NAS UniProt: Q8WTS6 methyltransferase, H3 ysine-4 specific SET7, protein from Homo sapiens SET HUMAN Splice Isoform 1 of SET Oni IDA PMID: 11555662 protein, protein from Homo sapiens SFR11 HUMAN Splicing factor Oni TAS PMID: 1896467 arginine serine-rich 11, protein from Homo sapiens SFRS2 HUMAN Splicing factor, Oni IDA PMID: 156.52350 arginine serine-rich 2, protein from Homo sapiens SFRS4 HUMAN Splicing factor, Oni TAS PMID: 8321209 arginine serine-rich 4, protein from Homo sapiens SFRS7 HUMAN Splice Isoform 1 of Splicing Oni TAS PMID: 8O13463 actor, arginine?serine-rich 7, protein from Homo sapiens SH3L1 HUMAN SH3 domain-binding glutamic Oni TAS PMID: 16130169 acid-rich-like protein, protein from Homo sapiens SIN3A HUMAN Paired amphipathic helix Oni ISS UniProt: Q96ST3 protein Sin3a, protein rom Homo sapiens SIP1 HUMAN Zinc finger homeobox Oni IC PMID: 98.53615 protein 1b, protein from Homo sapiens SIPA1 HUMAN Signal-induced proliferation Oni IC PMID: 9183624 associated protein 1, protein from Homo sapiens SIRT6 HUMAN Splice Isoform 1 of Mono Oni ISS ADP-ribosyltransferase sirtuin-6, protein from Homo sapiens SKIHUMAN Ski oncogene, protein Oni NAS UniProt: P1275S rom Homo sapiens US 8,148,129 B2 119 120 -continued Symbol Evi- Refer 8tlifi Seq,equence fGO GOSt Information S OCC d (Ce Ce SKIL HUMAN Splice Isoform SNON of UniProt ISS UniProt: P12757 Ski-like protein, protein from Homo sapiens SLUG HUMAN Zinc finger protein SLUG, UniProt TAS PMID: 1086666S protein from Homo sapiens SMAD1 HUMAN Mothers agains UniProt ISS UniProt: Q15797 decapentaplegic homolog 1, protein from Homo sapiens NAS PMID: 975.9503 SMAD2 HUMAN Splice Isoform Long of UniProt ISS UniProt: Q15796 Mothers agains decapentaplegic homolog 2, protein from Homo sapiens SMAD4 HUMAN Mothers agains UniProt TAS PMID: 10980615 decapentaplegic homolog 4, protein from Homo sapiens SMAD5 HUMAN Mothers agains UniProt NAS PMID: 975.9503 decapentaplegic homolog 5, protein from Homo sapiens SMC1A HUMAN Structural maintenance of UniProt IDA PMID: 1107696.1 chromosome 1-like 1 protein, protein from Homo sapiens SMC2 HUMAN Splice Isoform 1 of UniProt TAS PMID: 9789013 Structural maintenance of chromosome 2-like 1 protein, protein from Homo sapiens SMC4 HUMAN Splice Isoform 1 of UniProt TAS PMID: 11850403 Structural maintenance of chromosomes 4-like 1 protein, protein from Homo sapiens SMCA1 HUMAN SWI/SNF related, matrix UniProt TAS PMID: 1408766 associated, actin dependent regulator of chromatin, Subfamily a, member 1, protein from Homo sapiens SMCA4 HUMAN Possible global transcription UniProt TAS PMID: 8232556 activator SNF2L4, protein from Homo sapiens SMCAS HUMAN SWISNF-related matrix UniProt IDA PMID: 12972596 associated actin dependent regulator of chromatin Subfamily A member 5, protein from Homo sapiens SMRA3 HUMAN SWISNF-related matrix- UniProt TAS PMID: 7876228 associated actin-dependent regulator of chromatin Subfamily A member 3, protein from Homo sapiens SMRD3 HUMAN Splice Isoform 1 of UniProt IDA PMID: 147.01856 SWISNF-related matrix associated actin-dependent regulator of chromatin Subfamily D member 3, protein from Homo sapiens SMUF2 HUMAN Smad ubiquitination UniProt NAS PMID: 11163210 regulatory factor 2, protein from Homo sapiens SND1 HUMAN Staphylococcal nuclease UniProt TAS PMID: 7651391 domain-containing protein 1, protein from Homo sapiens SNPC2 HUMAN SnRNA-activating protein UniProt TAS PMID: 7715707 complex subunit 2, protein from Homo sapiens SNPC3 HUMAN snRNA-activating protein UniProt TAS PMID: 7715707 complex subunit 3, protein from Homo sapiens SNPC5 HUMAN Splice Isoform 1 of UniProt TAS PMID: 97.32265 SnRNA-activating protein complex subunit 5, protein from Homo sapiens SOX15 HUMAN SOX-15 protein, protein UniProt NAS PMID: 8332SO6 from Homo sapiens SOX1 HUMAN SOX-1 protein, protein UniProt NAS PMID: 9337405 from Homo sapiens SOX21 HUMAN Transcription factor SOX-21, UniProt NAS PMID: 1614875 protein from Homo sapiens US 8,148,129 B2 121 122 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence HUMAN Transcription factor SOX-2, Oni NAS PM D: protein from Homo sapiens SOX6 HUMAN HMG1/2 (high mobility Oni NAS PM D: 1614875 group) box family protein, protein from Homo sapiens SOX9 HUMAN Transcription factor SOX-9, Oni TAS PM D: 10805756 protein from Homo sapiens SP1 OO HUMAN Splice Isoform Sp100-HMG Oni TAS PM D: 2258622 of Nuclear autoantigen Sp-100, protein from Homo sapiens SP110 HUMAN Splice Isoform 1 of Sp110 Oni TAS PM D: 7693701 nuclear body protein, protein from Homo sapiens SP1 HUMAN Transcription factor Sp1, Oni NAS Oni Prot: PO8047 protein from Homo sapiens SP3 HUMAN Transcription factor Sp3, Oni NAS Oni Prot: Q02447 protein from Homo sapiens SPAST HUMAN Splice Isoform 1 of Spastin, Oni TAS PM 106101.78 protein from Homo sapiens SPNXA HUMAN Sperm protein associated Oni TAS PM D: 10906052 with the nucleus on the X chromosome A, protein rom Homo sapiens SPNXB HUMAN Sperm protein associated Oni TAS PM : 10906052 with the nucleus on the X chromosome B.F. protein rom Homo sapiens SPNXCHUMAN Sperm protein associated Oni TAS PM : 10626816 with the nucleus on the X chromosome C, protein rom Homo sapiens SPOPHUMAN Speckle-type POZ protein, Oni TAS PM : 9414087 protein from Homo sapiens SPT6H HUMAN Splice Isoform 1 of Oni NAS PM : 8786132 Transcription elongation actor SPT6, protein from Homo sapiens SRBS1 HUMAN Splice Isoform 1 of Sorbin Oni IDA PM : 11371513 and SH3 domain containing protein 1, protein from Homo sapiens SRF HUMAN , Oni TAS PM : 32O3386 protein from Homo sapiens SRPK1 HUMAN Splice Isoform 1 of Oni IDA PM : 11509566 Serine/threonine-protein kinase SRPK1, protein rom Homo sapiens SRPK2 HUMAN Serine/threonine-protein Oni IDA PM : 9472028 kinase SRPK2, protein rom Homo sapiens SRY HUMAN Sex-determining region Y Oni NAS PM : 8265659 protein, protein from Homo sapiens NAS PM D: 1425584 SSBP2 HUMAN Single-stranded DNA Oni NAS Oni Prot: P81877 binding protein 2, protein rom Homo sapiens SSBP3 HUMAN Splice Isoform 1 of Oni ISS Oni Single-stranded DNA binding protein 3, protein rom Homo sapiens SSF1 HUMAN Splice Isoform 1 of Oni IDA PM 153O2935 Suppressor of SWI41 homolog, protein from Homo sapiens SSNA1 HUMAN Sjogren's syndrome Oni TAS PM D: 943,0706 nuclear autoantigen 1, protein from Homo sapiens SSX1 HUMAN Protein SSX1, protein Oni TAS PM D: 1OO72425 rom Homo sapiens SSXT HUMAN Splice Isoform 1 of SSXT Oni TAS PM D: 1OO72425 protein, protein from Homo sapiens ST17A HUMAN Serine/threonine-protein Oni IEP PM D: 9786912 kinase 17A, protein from Homo sapiens US 8,148,129 B2 123 124 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence ST17B HUMAN Serine/threonine-protein Oni IEP PM : 9786912 kinase 17B, protein from Homo sapiens ST6SG HUMAN Splice Isoform 1 of STAGA Oni NAS PM : 1098.7294 complex 65 gamma Subunit, protein from Homo sapiens STABP HUMAN STAM-binding protein, Oni TAS PM : 103834.17 protein from Homo sapiens STAG1 HUMAN Cohesin subunit SA-1, Oni TAS PM : 93.05759 protein from Homo sapiens STAG2 HUMAN Cohesin subunit SA-2, Oni TAS PM : 93.05759 protein from Homo sapiens STAG3 HUMAN Splice Isoform 1 of Oni TAS PM : 10698974 Cohesin subunit SA-3, protein from Homo sapiens STAT1 HUMAN Splice Isoform Alpha of Oni TAS PM : 1082O245 Signal transducer and activator of transcription -alphabeta, protein rom Homo sapiens STAT3 HUMAN Splice Isoform 1 of Signal Oni TAS PM : 7512451 transducer and activator of transcription 3, protein rom Homo sapiens STF1 HUMAN , Oni IDA PM : 1056.7391 protein from Homo sapiens STIP1 HUMAN Stress-induced Oni TAS PM : 16130169 phosphoprotein 1, protein rom Homo sapiens STK19 HUMAN Splice Isoform 1 of Oni TAS PM : 9812991 Serine/threonine-protein kinase 19, protein from Homo sapiens STK38 HUMAN Serine/threonine-protein Oni IDA PM : 12493.777 kinase 38, protein from Homo sapiens STK39 HUMAN STE20 SPS1-related Oni NAS PM : 10980603 proline-alanine-rich protein kinase, protein rom Homo sapiens STK6 HUMAN Serine/threonine-protein Oni TAS PM 91S3231 kinase 6, protein from Homo sapiens STRN3 HUMAN Splice Isoform Alpha of Oni IDA PM : 7910562 Striatin-3, protein from Homo sapiens SUFU HUMAN Splice Isoform 1 of Oni TAS PM : 10559.945 Suppressor of fused homolog, protein from Homo sapiens SUHHUMAN Splice Isoform APCR-2 of Oni NAS Oni Prot: Q06330 Recombining binding protein suppressor of hairless, protein from Homo sapiens IDA PM : 9874765 SUPT3 HUMAN Splice Isoform 1 of Oni IEP PM : 9726987 Transcription initiation protein SPT3 homolog, protein from Homo sapiens SUV91 HUMAN Histone-lysine N Oni TAS PM : 10949293 methyltransferase, H3 lysine-9 specific 1, protein from Homo sapiens SVILHUMAN Splice Isoform 1 of Oni IDA PM : 12711699 Supervillin, protein from Homo sapiens SYCP2 HUMAN Synaptonemal complex Oni NAS PM : 10341,103 protein 2, protein from Homo sapiens TAD3L HUMAN Splice Isoform 1 of Oni TAS PM : 9674.425 Transcriptional adapter 3-like, protein from Homo sapiens TADBP HUMAN TAR DNA-binding protein Oni TAS PM 43, protein from Homo sapiens US 8,148,129 B2 125 126 -continue

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence TAF1 HUMAN Splice Isoform 1 of UniProt TAS PMID: 7680771 Transcription initiation actor TFIID subunit 1, protein from Homo sapiens TAF1L HUMAN Transcription initiation UniProt ISS PMID: 12217962 actor TFIID 210 kDa Subunit, protein from Homo sapiens TAF4B HUMAN PREDICTED: TAF4b UniProt NAS UniProt: Q92750 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 05 kDa, protein rom Homo sapiens TB182 HUMAN 82 kDa tankyrase 1 UniPro NAS PM D: 11854288 binding protein, protein rom Homo sapiens TBX18 HUMAN T-box transcription factor UniPro NAS Oni Prot: O95935 TBX18, protein from Homo sapiens TBX21 HUMAN T-box transcription factor UniPro NAS Oni Prot: Q9UL17 TBX21, protein from Homo sapiens TBX22 HUMAN T-box transcription factor UniPro NAS Oni Prot: Q9Y458 TBX22, protein from Homo sapiens TBX4 HUMAN T-box transcription factor UniPro NAS Oni Prot: P57082 TBX4, protein from Homo sapiens TCF2O HUMAN Splice Isoform 1 of UniPro NAS PM D: 10995.766 Transcription factor 20, protein from Homo sapiens TCFLS HUMAN TranscripTion facTor-like UniPro IDA PM D: 9763657 5 proTein, protein from Homo sapiens TCRG1 HUMAN Transcription elongation UniPro TAS PM D: 9315662 regulator 1, protein from Homo sapiens TEAD2 HUMAN Transcriptional enhancer UniPro NAS PM D: 8702974 factor TEF-4, protein from Homo sapiens TERA HUMAN Transitional endoplasmic UniPro IDA PM D: 10855792 reticulum ATPase, protein from Homo sapiens TAS PM D: 16130169 TERF1 HUMAN Splice Isoform TRF1 of UniPro NAS PM D: 97.39097 Telomeric repeat binding factor 1, protein from Homo sapiens NAS PM TESK2 HUMAN Splice Isoform 1 of Dual UniPro ISS Oni specificity testis-specific protein kinase 2, protein from Homo sapiens TF6S HUMAN Splice Isoform 1 of UniPro IDA PM D: 314038O Transcription factor pé5, protein from Homo sapiens TF7L1 HUMAN ranscription factor 7-like UniPro NAS PM D: 11085512 protein from Homo sapiens NAS PM : 1741298 TF7L2 HUMAN Splice Isoform 1 of UniPro NAS PM : 10919662 ranscription factor 7-like 2, protein from Homo sapiens TFE2 HUMAN Splice Isoform E12 of UniPro NAS PM D: 2493990 ranscription factor E2-alpha, protein from Homo sapiens TFEBHUMAN Splice Isoform 1 of UniPro NAS PM D: 2115126 Transcription factor EB, protein from Homo sapiens TGIF2HUMAN Homeobox protein TGIF2, UniPro TAS PM D: 11 OO6116 protein from Homo sapiens THEB1 HUMAN Thyroi hormone receptor UniPro TAS PM D: 1618799 beta-1, protein from Homo sapiens THEB2 HUMAN hormone receptor UniPro TAS PM D: 1618799 beta-2, protein from Homo sapiens US 8,148,129 B2 127 128 -continued Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence THOC1 HUMAN THO complex subunit 1, UniProt TAS PMID: 752.5595 protein from Homo sapiens TIAF1 HUMAN TCFB1-induced anti- UniProt NAS PMID: 9918798 apoptotic factor 1, protein rom Homo sapiens TIF1A HUMAN Splice Isoform Long of UniProt TAS PMID: 9115274 Transcription intermediary actor 1-alpha, protein from Homo sapiens TIF1G HUMAN Splice Isoform Alpha of UniProt NAS UniProt: Q9UPN9 Transcription intermediary actor 1-gamma, protein rom Homo sapiens TIM HUMAN Splice Isoform 1 o UniProt IC PMID: 985646S Timeless homolog, protein from Homo sapiens TIP60 HUMAN Splice Isoform 1 of Histone UniProt TAS PMID: 860726S acetyltransferase HTATIP, protein from Homo sapiens TITF1 UMAN Splice Isoform 1 of Thyroid UniProt NAS UniProt: P43699 transcription factor 1, protein from Homo sapiens TLE1 HUMAN Transducin-like enhancer UniProt TAS PMID: 1303260 protein 1, protein from Homo sapiens TLE2 HUMAN Transducin-like enhancer UniProt TAS PMID: 1303260 protein 2, protein from Homo sapiens TLE3 HUMAN Splice Isoform 1 o UniProt TAS PMID: 1303260 Transducin-like enhancer protein 3, protein from Homo sapiens TLE4 HUMAN TLE4 protein, protein UniProt NAS PMID: 1303260 rom Homo sapiens TLK1 HUMAN Splice Isoform 1 of UniProt IEP PMID: 10523312 Serine/threonine-protein kinase tousled-like 1, protein from Homo sapiens TAS PMID: 9427565 TLK2 HUMAN Splice Isoform 1 of UniProt IEP PMID: 9427565 Serine/threonine-protein kinase tousled-like 2, protein from Homo sapiens NAS PMID: 98O87437 TNAP3 HUMAN Tumor necrosis factor, UniProt IDA PMID: 11463,333 alpha-induced protein 3, protein from Homo sapiens TNPO1 HUMAN Importin beta-2 subunit, UniProt TAS PMID: 9144189 protein from Homo sapiens TNPO2 HUMAN Splice Isoform 1 of UniProt TAS PMID: 92.98975 Transportin-2, protein rom Homo sapiens TOB2 HUMAN Tob2 protein, protein UniProt TAS PMID: 106O2SO2 rom Homo sapiens TOP2A HUMAN Splice Isoform 1 of DNA UniProt TAS PMID: 6267071 opoisomerase 2-alpha, protein from Homo sapiens TOP3A HUMAN Splice Isoform Long of DNA UniProt TAS PMID: 8622991 opoisomerase III alpha, protein from Homo sapiens TOP3B HUMAN Splice Isoform 1 of DNA UniProt TAS PMID: 9786843 opoisomerase III beta-1, protein from Homo sapiens TOPB1 HUMAN DNA II UniProt TAS PMID: 9461304 binding protein 1, protein rom Homo sapiens TPX2 HUMAN Targeting protein for Xklp2, UniProt TAS PMID: 92O7457 protein from Homo sapiens TR100 HUMAN - UniProt IDA PMID: 10235.267 associated protein complex 100 kDa component, protein from Homo sapiens TR150 HUMAN Thyroid hormone receptor- UniProt IDA PMID: 10235.267 associated protein complex 150 kDa component, protein from Homo sapiens US 8,148,129 B2 129 130 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence TR240 HUMAN Thyroid hormone receptor UniProt IDA PMID: 10235.267 associated protein complex 240 kDa component, protein from Homo sapiens TR95 HUMAN Splice Isoform 1 of Thyroid UniProt NAS PMID: 101986.38 hormone receptor-associated protein complex 95 kDa component, protein from Homo sapiens TRA2A HUMAN Splice Isoform Long of UniPro IDA PMID: 9546399 Transformer-2 protein homolog, protein from Homo sapiens TRA2B. HUMAN Splice Isoform 1 O UniPro IDA PMID: 9546399 Arginine?serine-rich splicing factor 10, protein from Homo sapiens TRABD HUMAN TRABID protein, protein UniPro IDA PMID: 11463,333 from Homo sapiens TRAF4 HUMAN Splice Isoform 1 of TNF UniPro TAS PMID: 7592751 receptor-associated factor 4, protein from Homo sapiens TRBP2 HUMAN TAR RNA-binding protein 2, UniPro TAS PMID: 2O11739 protein from Homo sapiens TREF1 HUMAN Splice Isoform 1 O UniPro IDA PMID: 11349124 Transcriptional-regulating factor 1, protein from Homo sapiens TRI22 HUMAN Splice Isoform 1 of Tripartite UniPro TAS PMID: 7797467 motif protein 22, protein from Homo sapiens TRI32 HUMAN Tripartite motif protein 32, UniPro TAS PMID: 7778269 protein from Homo sapiens TRIB3 HUMAN Tribbles homolog 3, UniPro ISS UniProt: Q96RU7 protein from Homo sapiens TRIP4HUMAN Activating signal cointegrator UniPro IDA PMID: 10454.579 1, protein from Homo sapiens TRP13 HUMAN Splice Isoform 1 of Thyroid UniPro TAS PMID: 7776974 receptor-interacting protein 13, protein from Homo Sapiens TRRAPHUMAN Splice Isoform 1 of UniPro IDA PMID: 9708738 Transformation transcription domain-associated protein, protein from Homo sapiens TRUAHUMAN tRNA pseudouridine UniPro NAS UniProt: Q9Y606 synthase A, protein from Homo sapiens TSN HUMAN Translin, protein from UniPro TAS PMID: 7663S11 Homo sapiens TUB HUMAN Tubby protein homolog, UniPro TAS PMID: 11000483 protein from Homo sapiens TULP3 HUMAN Tubby related protein 3, UniPro NAS PMID: 11375483 protein from Homo sapiens TWST2HUMAN Twist-related protein 2, UniPro IDA PMID: 11062344 protein from Homo sapiens TYDP1 HUMAN Tyrosyl-DNA UniPro NAS PMID: 10521354 phosphodiesterase 1, protein from Homo sapiens U2AFL HUMAN U2 Small nuclear UniPro NAS UniProt: Q15695 ribonucleoprotein auxiliary factor 35 kDa Subunit related-protein 1, protein from Homo sapiens U360 HUMAN Hypothetical protein UniProt NAS PMID: 10873569 DKFZp586NO222, protein from Homo sapiens UB2R1 HUMAN Ubiquitin-conjugating UniProt NAS PMID: 8248.134 enzyme E2-32 kDa complementing, protein rom Homo sapiens UB2V1 HUMAN Splice Isoform 1 of UniProt TAS PMID: 93.05758 Ubiquitin-conjugating enzyme E2 variant 1, protein from Homo sapiens US 8,148,129 B2 131 132 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence UB7I1 HUMAN Splice Isoform 1 of E3 Oni NR Oni ubiquitin ligase TRIAD3, protein from Homo sapiens UBIQ HUMAN Ubiquitin, protein from Oni IC PM D: 145283O4 Homo sapiens UBP18 HUMAN Ubl carboxyl-terminal Oni TAS PM D: 10777664 hydrolase 18, protein rom Homo sapiens UBP4 HUMAN Splice Isoform UNPEL of Oni TAS PM D: 8183569 Ubiquitin carboxyl erminal hydrolase 4, protein from Homo sapiens UBP7 HUMAN Ubiquitin carboxyl Oni TAS PM D: 9130697 erminal hydrolase 7, protein from Homo sapiens UBQL4 HUMAN Ubiquilin-4, protein from Oni IDA PM D: 11 OO1934 Homo sapiens UGTAPHUMAN Splice Isoform 1 of UGA Oni ISS Oni Prot: Q9HD40 Suppressor tRNA associated protein, protein from Homo sapiens UHMK1 HUMAN soform 1 of Oni ISS Oni Prot: Q8TAS1 Serine/threonine-protein kinase Kist, protein from Homo sapiens UK14 HUMAN Ribonuclease UK114, Oni TAS PM D: 853O410 protein from Homo sapiens ULE1A HUMAN Ubiquitin-like 1-activating Oni NAS PM D: 10187858 enzyme E1A, protein rom Homo sapiens ISS Oni Prot: O95717 ISS Oni Prot: Q9PO20 UNGHUMAN Splice Isoform 2 of Oni NAS PM D:9016624 Uracil-DNA glycosylase, protein from Homo sapiens USF1 HUMAN Upstream stimulatory factor Oni TAS PM D: 2249772 protein from Homo sapiens UTP11 HUMAN Probable U3 small nucleolar Oni IDA PM D: 12559088 RNA-associated protein 11, protein from Homo sapiens WAVEHUMAN Vav proto-oncogene, Oni NR Oni Prot: P15498 protein from Homo sapiens WCX1 HUMAN Variable charge X-linked Oni IDA PM D: 12826317 protein 1, protein from Homo sapiens WCX3 HUMAN Variable charge X-linked Oni ISS Oni Prot: Q9NNX9 protein 3, protein from Homo sapiens WCXCHUMAN VCX-C protein, protein Oni ISS Oni Prot: Q9H321 rom Homo sapiens WGLL1 HUMAN Transcription cofactor Oni NAS PM D: 10518497 vestigial-like protein 1, protein from Homo sapiens WHL HUMAN Splice Isoform 1 of Von Oni TAS PM Hippel-Lindau disease tumor Suppressor, protein rom Homo sapiens WBP11 HUMAN WW domain-binding Oni TAS PM D: 10593949 protein 11, protein from Homo sapiens W D FY1 HUMAN WD repeat and FYVE Oni IDA PM D: 11739631 domain containing protein 1, protein from Homo sapiens DR33 HUMAN WD-repeat protein 33, Oni IDA PM D: 11162572 protein from Homo sapiens DR3 HUMAN WD-repeat protein 3, Oni TAS PM D: 103958O3 protein from Homo sapiens DRSO HUMAN WD-repeat protein 50, Oni IDA PM D: 151991 22 protein from Homo sapiens E E 1 HUMAN Weel-like protein kinase, Oni TAS PM D: 8348.613 protein from Homo sapiens RB HUMAN Tryptophan-rich protein, Oni TAS PM D: 9544840 protein from Homo sapiens RIP1 HUMAN Splice Isoform 1 of ATPase Oni ISS Oni WRNIP1, protein from Homo Sapiens US 8,148,129 B2 133 134 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence WRNHUMAN Werner syndrome ATP Oni TAS PM D: 9288.107 dependent helicase, Ocil rom Homo sapiens WT1 HUMAN Wilms tumor 1 isoform D, Oni NAS Oni Prot : Q16256 Ocil rom Homo sapiens NAS Oni Prot : P19544 WTAP HUMAN Splice Isoform 2 of Wilms Oni IDA PM D: O9425.95 tumor 1-associating protein, protein rom Homo sapiens WWTR1 HUMAN WW domain containing Oni NAS PM 1118213 transcription regulator Ocil protein from Homo sapiens XAB2 HUMAN XPA-binding protein 2, Oni IC PM O944529 protein rom Homo sapiens XPAHUMAN DNA-re pair protein Oni TAS PM complementing XP-A cells, Ocil rom Homo sapiens XPO7 HUMAN Exportin-7, protein from Oni IDA PM 1071879 Homo sapiens XRN2 HUMAN 5'-3' exoribonuclease 2, Oni ISS Oni Prot protein rom Homo sapiens YAF2 HUMAN Splice Isoform 2 of YY1 Oni IDA PM 1593398 associated factor 2, protein rom Homo sapiens YBOX1 HUMAN Nuclease Sensitive Oni NAS Oni Prot : P67809 element binding protein 1, protein rom Homo sapiens YBOX2 HUMAN Y-box binding protein 2, Oni TAS PM D: 10100484 protein rom Homo sapiens YETS4 HUMAN YEATS domain Oni TAS PM D: 9302258 containi ng protein 4, protein rom Homo sapiens YL1 HUMAN Protein YL-1, protein Oni TAS PM D: 7702631 rom Homo sapiens YYY1 HUMAN Hypothetical protein, Oni TAS PM D: 8121495 Ocil rom Homo sapiens ZBT16 HUMAN Splice Isoform PLZFB of Oni IDA PM D: 92941.97 Zinc finger and BTB domain containi ng protein 16, protein rom Homo sapiens ZBT38 HUMAN Zinc finger and BTB domain Oni ISS Oni Prot containi ng protein 38, protein rom Homo sapiens ZBT7A HUMAN Zinc finger and BTB domain Oni ISS PM D: 15337766 containi ng protein 7A, protein rom Homo sapiens HUMAN Splice Isoform 1 of CSL Oni IDA PM D: 1498OSO2 type ZIn c finger containi ng protein 2, protein rom Homo sapiens P1 HUMAN Zinc finger protein 40, Oni TAS PM D: 2106471 Ocil rom Homo sapiens Z. E P2HUMAN Human immunodeficiency Oni NAS Oni Prot : P31629 virus type I enHancer binding protein 2, protein rom Homo sapiens HUMAN Zinc finger protein 161 Oni TAS PM D: 9177479 homolog, protein from Homo sapiens Z. P37 HUMAN Zinc finger protein 37 Oni NAS Oni Prot homolog, protein from Homo sapiens ZFP38 HUMAN Hypothetical protein Oni NAS Oni Prot DKFZp686H10254, protein from Homo sapiens Z. P95EHUMAN Zinc finger protein 95 Oni NAS PM D: 10585779 homolog, protein from Homo sapiens ZFPL1 HUMAN Splice Isoform 1 of Zinc Oni NAS PM D: 9653652 finger protein-like 1, protein from Homo sapiens ZHANG HUMAN Host cell factor-binding Oni IDA PM D: 15705566 transcription factor Zhangfei, protein from Homo sapiens US 8,148,129 B2 135 136 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence ZHX1 HUMAN Zinc fingers and Oni IDA PM D: 12237128 protein 1, protein from Homo sapiens ZHX2HUMAN Zinc fingers and Oni IDA PM D: 12741956 homeoboxes protein 2, protein from Homo sapiens ZHX3 HUMAN Zinc fingers and Oni IDA PM D: 12659632 homeoboxes protein 3, protein from Homo sapiens ZIC1 HUMAN Zinc finger protein ZIC1, Oni IDA PM D: 8542595 protein from Homo sapiens ZKSC1 HUMAN Zinc finger with KRAB Oni NAS PM D: 7557990 and SCAN domain containing protein 1, protein from Homo sapiens ZMY11 HUMAN Zinc finger MYND domain Oni TAS PM D: 7621829 containing protein 11, protein from Homo sapiens ZN117 HUMAN Zinc finger protein 117, Oni NAS Oni Prot: Q03924 protein from Homo sapiens ZN11 AHUMAN Zinc finger protein 11A, Oni NAS PM D: 84.64732 protein from Homo sapiens ZN11B. HUMAN Zinc finger protein 11B, Oni NAS Oni Prot: Q06732 protein from Homo sapiens ZN123 HUMAN Zinc finger protein 123, Oni NAS PM D: 1339395 protein from Homo sapiens ZN12S HUMAN Zinc finger protein 125, Oni NAS PM D: 1339395 protein from Homo sapiens ZN126 HUMAN Zinc finger protein 126, Oni NAS PM D: 1339395 protein from Homo sapiens ZN131 HUMAN Splice Isoform 1 of Zinc Oni NAS PM D: 7557990 finger protein 131, protein from Homo sapiens ZN134 HUMAN Zinc finger protein 134, Oni NAS PM D: 7557990 protein from Homo sapiens ZN13SHUMAN Similar to Zinc finger Oni NAS PM D: 7557990 protein 135, protein from Homo sapiens ZN138 HUMAN Zinc finger protein 138, Oni NAS PM D: 7557990 protein from Homo sapiens ZN154 HUMAN Zinc finger protein 154, Oni NAS PM D: 7557990 protein from Homo sapiens ZN16S HUMAN Zinc finger protein 165, Oni NAS Oni Prot: P4991 O protein from Homo sapiens ZN169 HUMAN KRAB box family protein, Oni NAS Oni Prot: Q14929 protein from Homo sapiens ZN184 HUMAN Zinc finger protein 184, Oni NAS Oni Prot: Q99676 protein from Homo sapiens ZN195 HUMAN Hypothetical protein Oni NAS Oni Prot: O14628 DKFZp666D035, protein rom Homo sapiens ZN2OO HUMAN Zinc finger protein 200, Oni NAS Oni Prot: P98.182 protein from Homo sapiens ZN2OSHUMAN Zinc finger protein 205, Oni NAS Oni Prot: O952O1 protein from Homo sapiens ZN2O7 HUMAN Splice Isoform 1 of Zinc Oni NAS PM D:97996.12 finger protein 207, protein from Homo sapiens ZN2O8 HUMAN Zinc finger protein 208, Oni NAS PROT: O43345 protein from Homo sapiens ZN211 HUMAN Zinc finger protein 211 Oni NAS Oni Prot: Q13398 isoform 2, protein from Homo sapiens ZN212 HUMAN Zinc finger protein 212, Oni NAS Oni Prot: Q9UDV6 protein from Homo sapiens ZN214 HUMAN Zinc finger protein 214, Oni NAS Oni Prot: Q9UL59 protein from Homo sapiens ZN21S HUMAN Zinc finger protein 215, Oni NAS Oni Prot: Q9UL58 protein from Homo sapiens ZN219 HUMAN Zinc finger protein 219, Oni TAS PM D: 1081933O protein from Homo sapiens ZN236 HUMAN Similar to Mszf28, Oni NAS Oni Prot: Q9UL36 protein from Homo sapiens ZN2S3 HUMAN Zinc finger protein 253, Oni NAS Oni Prot: O7S346 protein from Homo sapiens US 8,148,129 B2 137 138 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence ZN2S7 HUMAN Zinc finger protein 257, Oni NAS UniProt: Q9Y2O1 protein from Homo sapiens ZN26S HUMAN Splice Isoform ZIS-1 of Oni TAS PMID: 993.1435 Zinc finger protein 265, protein from Homo sapiens ZN268 HUMAN Splice Isoform A of Zinc Oni NAS PMID: 11311945 finger protein 268, protein from Homo sapiens ZN277 HUMAN Zinc finger protein 277, Oni NAS UniProt: Q9NRM2 protein from Homo sapiens ZN278 HUMAN Splice Isoform 1 of Zinc Oni TAS PMID: 10713105 finger protein 278, protein from Homo sapiens ZN282 HUMAN Zinc finger protein 282, Oni NAS UniProt: Q9UDV7 protein from Homo sapiens ZN297 HUMAN Zinc finger protein 297, Oni TAS PMID: 9545376 protein from Homo sapiens ZN331 HUMAN Zinc finger protein 331, Oni NAS UniProt: Q9NQX6 protein from Homo sapiens ZN33A HUMAN Zinc finger protein 33A, Oni NAS UniProt: Q06730 protein from Homo sapiens ZN33BHUMAN Zinc finger protein 33B, Oni NAS UniProt: Q06731 protein from Homo sapiens ZN346 HUMAN Splice Isoform 1 of Zinc Oni TAS PMID: 10488071 finger protein 346, protein from Homo sapiens ZN37A HUMAN Zinc finger protein 37A, Oni NAS PMID: 8464732 protein from Homo sapiens ZN396 HUMAN Splice Isoform 1 of Zinc Oni IMP UniProt: Q96N95 finger protein 396, protein from Homo sapiens ZN398 HUMAN Splice Isoform 1 of Zinc Oni NAS PMID: 11779858 finger protein 398, protein from Homo sapiens ZN482 HUMAN Zinc finger protein 482, Oni TAS PMID: 79.58847 protein from Homo sapiens ZNF19 HUMAN Zinc finger protein 19, Oni NAS PMID: 7557990 protein from Homo sapiens ZNF22 HUMAN Zinc finger protein 22, Oni ISS UniProt: P17026 protein from Homo sapiens ZNF24 HUMAN Zinc finger protein 24, Oni IC PMID: 10585455 protein from Homo sapiens ZNF38 HUMAN KRAB box family protein, Oni IC PMID: 2288909 protein from Homo sapiens NAS UniProt: Q9NNX8 ZNF41 HUMAN Splice Isoform 1 of Zinc Oni NAS UniProt: PS1814 finger protein 41, protein rom Homo sapiens ZNF69 HUMAN Zinc finger protein 69, Oni NAS UniProt: Q9UC07 protein from Homo sapiens ZNF7O HUMAN Zinc finger protein 70, Oni NAS UniProt: Q9UCO6 protein from Homo sapiens ZNF71 HUMAN Endothelial zinc finger Oni NAS UniProt: Q9UC09 protein induced by tumor necrosis factor alpha, protein from Homo sapiens ZNF73 HUMAN Zinc finger protein 73, Oni NAS UniProt: O4383.0 protein from Homo sapiens ZNF7S HUMAN Hypothetical protein Oni NAS UniProt: PS1815 DKFZp667L2223, protein from Homo sapiens ZNF79 HUMAN Zinc finger protein 79, Oni NAS UniProt: Q15937 protein from Homo sapiens ZNF8O HUMAN Zinc finger protein 80, Oni NAS UniProt: PS1504 protein from Homo sapiens ZNF81 HUMAN Zinc finger protein 81, Oni NAS UniProt: PS1508 protein from Homo sapiens ZNF83 HUMAN Zinc finger protein 83, Oni NAS UniProt: PS1522 protein from Homo sapiens ZNF84 HUMAN Zinc finger protein 84, Oni NAS UniProt: PS1523 protein from Homo sapiens ZNF8S HUMAN Zinc finger protein 85, Oni TAS PMID: 98398O2 protein from Homo sapiens ZNF8 HUMAN Zinc finger protein 8, Oni NAS UniProt: P17098 protein from Homo sapiens ZNF90 HUMAN Zinc finger protein 90, Oni NAS UniProt: Q03938 protein from Homo sapiens US 8,148,129 B2 139 140 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence ZNF91 HUMAN Zinc finger protein 91, Oni NAS Oni Prot: Q05481 protein from Homo sapiens ZNF92 HUMAN Splice Isoform 1 of Zinc Oni NAS Oni Prot: Q03936 finger protein 92, protein rom Homo sapiens ZNF93 HUMAN Splice Isoform 1 of Zinc Oni NAS Oni Prot: P35789 finger protein 93, protein rom Homo sapiens ZPR1 HUMAN Zinc-finger protein ZPR1, Oni TAS PM protein from Homo sapiens ZRF1 HUMAN Zuotin-related factor 1, Oni NAS Oni Prot: Q99543 protein from Homo sapiens ZW10 HUMAN Centromerefkinetochore Oni NAS PM D: 1114666O protein Zw 10 homolog, protein from Homo sapiens ZWIA HUMAN ZW10 interactor, antisense, Oni IDA PM D: 8885239 protein from Homo sapiens ZXDA HUMAN Zinc finger X-linked Oni NAS Oni Prot: P981.68 protein ZXDA, protein rom Homo sapiens ZXDBHUMAN Zinc finger X-linked Oni NAS Oni Prot: P98169 protein ZXDB, protein rom Homo sapiens ACF HUMAN Splice Isoform 1 of Oni IDA PM D: 10781591 APOBEC1 complementation factor, protein rom Homo sapiens HILS1 HUMAN Sperma id-specific linker Oni IDA PM D: 12920187 histone H1-like protein, protein rom Homo sapiens HNRH1 HUMAN Heterogeneous nuclear Oni TAS PM D: 74994O1 ribonuc eoprotein H1, protein rom Homo sapiens HNRH2 HUMAN Heterogeneous nuclear Oni TAS PM D: 74994O1 ribonuc eoprotein H', protein rom Homo sapiens HNRH3 HUMAN Splice Isoform 1 of Oni NAS PM D: 10858537 Heterogeneous nuclear ribonuc eoprotein H3, protein rom Homo sapiens HNRPCHUMAN Full-len gth cDNA clone Oni NR Oni Prot: PO7910 CSODAO09YKO8 of Neurob astoma of Homo Sapiens, protein from Homo sapiens HNRPF HUMAN deterogeneous nuclear Oni TAS PM D: 74994O1 eoprotein F, rom Homo sapiens HNRPG HUMAN deterogeneous nuclear Oni NAS PM D: 769.2398 eoprotein G, rom Homo sapiens HNRPLHUMAN deterogeneous nuclear Oni TAS PM D: 2687284 eoprotein L a, protein from Homo sapiens HNRPRHUMAN Heterogeneous nuclear Oni TAS PM D: 9421497 eoprotein R, rom Homo sapiens HNRPUHUMAN Splice Isoform Long of Oni TAS PM D: 75091.95 Heterogenous nuclear ribonuc eoprotein U, protein rom Homo sapiens HNRU2 HUMAN deterogeneous nuclear Oni NAS Oni Prot: PO7029 ribonuc eoprotein UP2, protein rom Homo sapiens O14979 KTBP2, protein from Oni TAS PM D: 9538234 Homo sapiens E1B-SS kDa-associated Oni TAS PM D: 9733834 protein, protein from Homo sapiens PTBP1 HUMAN Splice Isoform 1 of Oni TAS PM D: 1641332 Polypyrimidine tract binding protein 1, protein rom Homo sapiens D(TTAGGG)N-binding Oni TAS PM D: 8321232 protein B37 = TYPEA-B heterogeneous nuclear US 8,148,129 B2 141 142 -continued

Symbol Evi Refer Qualifier Sequence/GOst information Source dence ence ribonucleoprotein homolog, protein from Homo sapiens RALY HUMAN RNA binding protein, Oni TAS PM D: 93.76O72 protein from Homo sapiens ROAO HUMAN Heterogeneous nuclear Oni TAS PM D: 758.5247 ribonucleoprotein AO, protein from Homo sapiens ROA1 HUMAN Heterogeneous nuclear Oni TAS PM D: 8521471 ribonucleoprotein A1 isoform b, protein from Homo sapiens ROA2 HUMAN Splice Isoform B1 of Oni TAS PM D: 7789969 Heterogeneous nuclear ribonucleoproteins A2/B1, protein from Homo sapiens O60934 Nibrin, protein from Oni IDA PM D: 959O181 Homo sapiens Hypothetical protein Oni ISS Oni DKFZp686G19151, protein from Homo sapiens RADSO HUMAN Splice Isoform 1 of DNA Oni TAS PM D: 15279769 repair protein RAD50, protein from Homo sapiens BARX1 HUMAN Homeobox protein BarH-like Oni NAS Oni Prot: Q9HBU1 protein from Homo sapiens GBX1 HUMAN Homeobox protein GBX-1, Oni NAS Oni Prot: Q14549 protein from Homo sapiens HDAC8 HUMAN Splice Isoform 3 of Oni TAS PM D: 10748112 Histone deacetylase 8, protein from Homo sapiens HMG2 HUMAN High mobility group protein Oni TAS PM D: 1551873 2, protein from Homo sapiens HXD12 HUMAN Homeo box D12, protein Oni NAS Oni Prot: P354.52 rom Homo sapiens JUN HUMAN Transcription factor AP-1, Oni TAS PM D: 109.1858O protein from Homo sapiens PRRX2 HUMAN Paired mesoderm Oni NAS Oni Prot: Q99811 homeobox protein 2, protein from Homo sapiens SMC3 HUMAN Structural maintenance of Oni NR Oni chromosome 3, protein rom Homo sapiens SMCE1 HUMAN Splice Isoform 1 of Oni TAS PM D: 943S219 SWISNF-related matrix associated actin-dependent regulator of chromatin Subfamily E member 1, protein from Homo sapiens TE2IPHUMAN Telomeric repeat binding Oni TAS PM D: 10850490 actor 2 interacting protein 1, protein from Homo sapiens ZBED1 HUMAN Zinc finger BED domain Oni TAS PM D: 988.7332 containing protein 1, protein from Homo sapiens ZN238 HUMAN Zinc finger protein 238, Oni TAS PM D: 9756912 protein from Homo sapiens CHK1 HUMAN Serine/threonine-protein Oni TAS PM D: 93828SO kinase Chk1, protein rom Homo sapiens CHM1A HUMAN Splice Isoform 1 of Oni IDA PM D: 1155.9747 Charged multivesicular body protein 1a, protein rom Homo sapiens DMC1 HUMAN Meiotic recombination Oni TAS PM D: 86O2360 protein DMC1LIM15 homolog, protein from Homo sapiens MCP33 HUMAN Metaphase chromosomal Oni IDA PM D: 954301.1 protein 1, protein from Homo sapiens MK67IHUMAN MKI67 FHA domain Oni IDA PM D: 11342549 interacting nucleolar phosphoprotein, protein rom Homo sapiens NOL6 HUMAN Splice Isoform 1 of Oni ISS PM D: 1189S476 Nucleolar protein 6, protein from Homo sapiens US 8,148,129 B2 143 144 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence Titin, protein from Homo Oni ISS PM D: 954.8712 Sapiens TAS PM D: 10481.174 RCC1 HUMAN RCC1 protein, protein Oni IDA PM D: 15O14043 rom Homo sapiens RGS12 HUMAN Splice Isoform 1 of Regulator Oni TAS PM D: 10869340 of G-protein signaling 12, protein from Homo sapiens SMC1A HUMAN Structural maintenance of Oni TAS PM D: 7757074 chromosome 1-like 1 protein, protein from Homo sapiens SUV91 HUMAN Histone-lysine N Oni TAS PM D: 10202156 methyltransferase, H3 ysine-9 specific 1, protein from Homo sapiens TBG1 HUMAN Tubulingamma-1 chain, Oni ISS UN PROT:P23258 protein from Homo sapiens NOSS HUMAN Nucleolar autoantigen No55, Oni TAS PM D: 8862S17 protein from Homo sapiens Hypothetical protein Oni ISS Oni Prot: Q6ZNA8 FLJ16262, protein from Homo sapiens RADS1 HUMAN Splice Isoform 1 of DNA Oni ISS Oni Prot: Q06609 repair protein RAD51 homolog 1, protein from Homo sapiens STAG3 HUMAN Splice Isoform 1 of Oni TAS PM D: 10698974 Cohesin subunit SA-3, protein from Homo sapiens SYCP2 HUMAN Synaptonemal complex Oni NAS PM D: 1034.1103 protein 2, protein from Homo sapiens NAS PM D: 95921.39 PREDICTED: hypothetical Oni SS PM D: 159444O1 protein XP 497609, protein from Homo sapiens Conserved hypothetical Oni SS PM D: 159444O1 protein, protein from Homo sapiens SYCP1 HUMAN Synaptonemal complex Oni SS PM D: 159444O1 protein 1, protein from Homo sapiens NPM2 HUMAN Nucleoplasmin-2, protein Oni DA PM D: 1271.4744 rom Homo sapiens Hypothetical protein Oni SS Oni Prot: Q8N 7S8 FLJ40400, protein from Homo sapiens Q96GH7 KLHDC3 protein, protein Oni SS Oni Prot: Q96GH7 rom Homo sapiens RCC1 HUMAN RCC1 protein, protein Oni DA PM D: 15O14043 rom Homo sapiens ATRXHUMAN Splice Isoform 4 of Oni TAS PM D: 10570185 Transcriptional regulator ATRX, protein from Homo sapiens CBX1 HUMAN Chromobox protein homolog Oni TAS PM D: 9169582 protein from Homo sapiens CBXS HUMAN Chromobox protein homolog Oni TAS PM D: 86.63349 5, protein from Homo sapiens Heterochromatin-specific Oni ISS Oni Prot: Q9Y654 nonhistone protein, protein from Homo sapiens TB182 HUMAN 82 kDa tankyrase 1 Oni NAS PM D: 11854288 binding protein, protein rom Homo sapiens H2AWHUMAN Core histone macro-H2A.2, Oni IDA PM D: 11331621 protein from Homo sapiens H2AY HUMAN H2A histone family, Oni IDA PM D: 11331621 member Y, isoform 3, protein from Homo sapiens 24432 protein, protein Oni IDA PM D: 15181449 rom Homo sapiens O95268 Origin recognition Oni NAS PM D: 9765232 complex subunit ORC5T, protein from Homo sapiens Replication initiator 1, Oni TAS PM D: 10606657 protein from Homo sapiens US 8,148,129 B2 145 146 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence MCM3 HUMAN DNA replication licensing Oni TAS PM D: 15494-68 actor MCM3, protein rom Homo sapiens DPOD3 HUMAN DNA polymerase delta Oni NAS PM D: 10219083 subunit 3, protein from Homo sapiens PCNA HUMAN Proliferating cell nuclear Oni TAS PM D: 2565339 antigen, protein from Homo sapiens RFC3 HUMAN Activator 138 kDa subunit, Oni TAS PM D: 7774928 protein from Homo sapiens RFC4 HUMAN Activator 137 kDa subunit, Oni TAS PM D: 7774928 protein from Homo sapiens RFCS HUMAN Activator 136 kDa subunit, Oni NAS PM D: 89998.59 protein from Homo sapiens RFA1 HUMAN Replication protein A70 kDa Oni TAS PM D: 8756712 DNA-binding subunit, protein from Homo sapiens RFA2 HUMAN Replication protein A 32 kDa Oni TAS PM D: 24O6247 Subunit, protein from Homo sapiens RFA3 HUMAN Replication protein A 14 kDa Oni TAS PM D: 845.4588 Subunit, protein from Homo sapiens RFA4 HUMAN Replication protein A 30 kDa Oni TAS PM D: 7760808 Subunit, protein from Homo sapiens CHRC1 HUMAN Chromatin accessibility Oni NAS PM D: 10880450 complex protein 1, protein from Homo sapiens Q9P288 TOK-1alpha, protein Oni IDA PM D: 10878006 from Homo sapiens AKAP6 HUMAN A-kinase anchor protein 6, Oni IDA PM D: 1041368O protein from Homo sapiens ANX11 HUMAN Annexin A11, protein Oni NAS PM D: 125773.18 from Homo sapiens ATF6A HUMAN Cyclic AMP-dependent Oni TAS PM D: 10866666 transcription factor ATF-6 alpha, protein from Homo Sapiens CBXS HUMAN Chromobox protein homolog Oni TAS PM D: 8663349 5, protein from Homo sapiens CENPF HUMAN CENP-F kinetochore protein, Oni IDA PM D: 12154O71 protein from Homo sapiens CLIC1 HUMAN Chloride intracellular Oni IDA PM D: 913971 O channel protein 1, protein rom Homo sapiens EMD HUMAN Emerin, protein from Oni TAS PM D: 85897 15 Homo sapiens GNAZHUMAN Guanine nucleotide-binding Oni TAS PM D: 2117645 protein G(Z), alpha Subunit, protein from Homo sapiens HAX1 HUMAN HS1-associating protein X-1, Oni TAS PM D: 90S8808 protein from Homo sapiens LAP2A HUMAN Lamina-associated Oni TAS PM D: 8530O26 polypeptide 2 isoform alpha, protein from Homo sapiens LAP2B. HUMAN ThymopoieTin isoform beTa, Oni TAS PM D: 8530O26 protein from Homo sapiens LIS1 HUMAN Platelet-activating factor Oni IDA PM D: 1194O666 acetylhydrolase IB alpha Subunit, protein from Homo sapiens LY1O HUMAN Splice Isoform LYSp100-B of Oni TAS PM D: 8695863 Nuclear body protein SP140, protein from Homo sapiens MYOF HUMAN Splice Isoform 1 of Oni TAS PM D: 106O7832 MyOferlin, protein from Homo sapiens PE2R3 HUMAN Splice Isoform EP3A of Oni TAS PM D: 10336471 Prostaglandin E2 receptor, EP3 subtype, protein from Homo sapiens PTGDS HUMAN Prostaglandin-H2 D Oni ISS Oni Prot: P41222 isomerase precursor, protein from Homo sapiens US 8,148,129 B2 147 148 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence Gametogenetin protein 1a, Oni ISS Oni Prot: Q86UU5 protein from Homo sapiens Nurim, protein from Oni TAS PM D: 10402458 Homo sapiens RTN4 HUMAN Splice Isoform 1 of Oni IDA PM D: 11126360 Reticulon-4, protein from Homo sapiens S1 OA6 HUMAN Calcyclin, protein from Oni NAS PM D: 125773.18 Homo sapiens SRBP1 HUMAN Sterol regulatory element Oni TAS PM D: 8156598 binding transcription factor 1, isoforma, protein from Homo sapiens SYNE1 HUMAN Splice Isoform 1 of Nesprin-1, Oni IDA PM D: 11792814 protein from Homo sapiens TIP3OHUMAN Conserved hypothetical Oni IDA PM D: 1528.2309 protein, protein from Homo sapiens TREX1 HUMAN Splice Isoform 1 of Three Oni ISS Oni prime repair exonuclease 1, protein from Homo sapiens ISS Oni Prot: Q8TEU2 NAS PM D: 10391904 UN84B HUMAN Sad1 func-84-like protein 2, Oni TAS PM D: 10375507 protein from Homo sapiens LMNB2 HUMAN Lamin B2, protein from Oni NAS Oni Prot: Q03252 Homo sapiens Nuclear prelamin A Oni TAS PM D: 105.14485 recognition factor, protein from Homo sapiens LMNA HUMAN Splice Isoform A of Oni TAS PM D: 1008018O Lamin AC, protein from Homo sapiens LMNB1 HUMAN Lamin B1, protein from Oni TAS PM D: 7557986 Homo sapiens Nuclear prelamin A Oni IDA PM D: 105.14485 recognition factor, isoform b, protein from Homo sapiens RM19 HUMAN 39S ribosomal protein L19, Oni IDA PM D: 109425.95 mitochondrial precursor, protein from Homo sapiens SCRN1 HUMAN Secernin-1, protein from Oni IDA PM D: 109425.95 Homo sapiens TAGL2HUMAN Transgelin-2, protein Oni IDA PM D: 109425.95 rom Homo sapiens WTAP HUMAN Splice Isoform 2 of Wilms Oni IDA PM D: 109425.95 tumor 1-associating protein, protein from Homo sapiens AT11B. HUMAN Probable phospholipid Oni NAS PM D: 11790799 transporting ATPase IF, protein from Homo sapiens MATR3 HUMAN Matrin-3, protein from Oni TAS PM D: 2033075 Homo sapiens LBRHUMAN Lamin-B receptor, Oni TAS PM D: 8157662 protein from Homo sapiens MAN1 HUMAN inner nuclear membrane Oni TAS PM D: 10671519 protein Man1, protein rom Homo sapiens PSN1 HUMAN Splice Isoform 1 O Oni TAS PM D: 9298903 Presenilin-1, protein from Homo sapiens PSN2 HUMAN Splice Isoform 1 O Oni TAS PM D: 9298903 Presenilin-2, protein from Homo sapiens DHCR7 HUMAN 7-dehydrocholesterol Oni IDA PM D: 98.782SO reductase, protein from Homo sapiens GUC2D HUMAN Retinal guanylyl cyclase 1 Oni TAS PM D: 7777544 precursor, protein from Homo sapiens GUC2F HUMAN Retinal guanylyl cyclase 2 Oni TAS PM D: 7777544 precursor, protein from Homo sapiens All-trans-13,14 Oni ISS PM D: 15358783 dihydroretinol saturase, protein from Homo sapiens US 8,148,129 B2 149 150 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence RAE1L HUMAN mRNA-associated protein Oni TAS PM D: 92S6445 mirnp 41, protein from Homo sapiens colocalizes with AAASHUMAN , protein from Oni IDA PM D: 1273O363 Homo sapiens DD19B. HUMAN Splice Isoform 1 of ATP Oni TAS PM D: 10428971 dependent RNA helicase DDX19B, protein from Homo sapiens RAE1L HUMAN mRNA-associated protein Oni TAS PM D: 92S6445 mirnp 41, protein from Homo sapiens MA1 HUMAN importin alpha-1 subunit, Oni TAS PM D: 8052633 protein from Homo sapiens MA3 HUMAN importin alpha-3 subunit, Oni TAS PM D: 9154134 protein from Homo sapiens MB1HUMAN importin beta-1 subunit, Oni TAS PM D: 7627554 protein from Homo sapiens MB3 HUMAN importin beta-3, protein Oni TAS PM D: 9271386 rom Homo sapiens colocalizes with PO4 HUMAN Splice Isoform 1 of Oni NAS PM D: 11823430 importin-4, protein from Homo sapiens PO7 HUMAN importin-7, protein from Oni TAS PM D: 9214382 Homo sapiens NU107 HUMAN complex Oni IDA PM D: 11564755 protein Nup107, protein rom Homo sapiens IDA PM : 116847OS NU133 HUMAN Nuclear pore complex Oni IDA PM : 116847OS protein Nup133, protein rom Homo sapiens IDA PM : 11564755 NU153 HUMAN Nuclear pore complex Oni TAS PM : 8110839 protein Nup153, protein rom Homo sapiens NU16O HUMAN kDa, Oni IDA PM D: 11564755 protein from Homo sapiens IDA PM : 116847OS NU2OSHUMAN Nuclear pore complex Oni NAS PM : 934854O protein Nup205, protein rom Homo sapiens NU214 HUMAN Nuclear pore complex Oni TAS PM O8440 protein Nup214, protein rom Homo sapiens NUPSOHUMAN kDa, Oni TAS PM D: 10449902 protein from Homo sapiens NUPS4 HUMAN Nucleoporin 54 kDa variant, Oni TAS PM D: 870784 protein from Homo sapiens NUP62 HUMAN Nuclear pore glycoprotein Oni IDA PM D: 1915414 p62, protein from Homo Sapiens NUP88 HUMAN Nuclear pore complex Oni TAS PM D:9049309 protein Nup88, protein rom Homo sapiens NUP98 HUMAN Splice Isoform 1 of Nuclear Oni IDA PM D: 934854O pore complex protein Nup98-Nup96 precursor, protein from Homo sapiens NAS PM : 10O872S6 NUPL HUMAN Nucleoporin-like protein RIP, Oni TAS PM D: 7637788 protein from Homo sapiens NXT1 HUMAN NTF2-related export protein Oni TAS PM D: 10567585 1, protein from Homo sapiens O75761 Ranbp3 protein, protein Oni NAS PM D: 9637251 rom Homo sapiens Q6GTM2 kDa, Oni ISS Oni Prot: Q6GTM2 protein from Homo sapiens RAE1L HUMAN mRNA-associated protein Oni TAS PM D: 92S6445 mirnp 41, protein from Homo sapiens RANHUMAN GTP-binding nuclear Oni NAS PM D: 8421051 protein RAN, protein from Homo sapiens RBP17 HUMAN Ran-binding protein 17, Oni NAS PM D: 11024021 protein from Homo sapiens US 8,148,129 B2 151 152 -continued

Symbol Evi- Refer Qualifier Sequence/GOst information Source dence ence RBP23 HUMAN Ran-binding protein 2-like UniPro NAS PMID: 948.0752 3, protein from Homo sapiens RBP2 HUMAN Ran-binding protein 2, UniPro TAS PMID: 7603572 protein from Homo sapiens RGP1 HUMAN Ran GTPase-activating UniPro TAS PMID: 897881S protein 1, protein from Homo sapiens RNUT1 HUMAN SNURPORTIN1, protein UniPro TAS PMID: 9670026 rom Homo sapiens SENP2 HUMAN Sentrin-specific protease 2, UniPro IDA PMID: 12192048 protein from Homo sapiens TPRHUMAN Translocated promoter UniPro TAS PMID: region, protein from Homo sapiens XPO7 HUMAN Exportin-7, protein from UniPro IDA PMID: 11024021 Homo sapiens EXOS3 HUMAN Exosome complex UniPro IDA PMID: 11110791 exonuclease RRP40, protein from Homo sapiens EXOS9 HUMAN Polymyositisfscleroderma UniPro NAS PMID: 11879549 autoantigen 1, protein rom Homo sapiens O60934 Nibrin, protein from UniPro DA PMID: 12447371 Homo sapiens Hypothetical protein UniPro SS UniProt: Q63HR6 DKFZp686G19151, protein from Homo sapiens Nuclear prelamin A UniPro DA PMID: 105.14485 recognition factor, isoform b, protein from Homo sapiens CENPF HUMAN CENP-F kinetochore protein, UniPro DA PMID: 7542657 protein from Homo sapiens CHM1A HUMAN Splice Isoform 1 of UniPro DA PMID: 1155.9747 Charged multivesicular body protein 1a, protein rom Homo sapiens DNM3A HUMAN DNA, protein from Homo UniPro SS PMID: 12138111 Sapiens ERCC8 HUMAN Splice Isoform 1 of DNA UniPro DA PMID: 11782547 excision repair protein ERCC-8, protein from MYB HUMAN Splice Isoform 1 of Myb UniPro NAS PMID: 301.4652 proto-oncogene protein, protein from Homo sapiens PS3 HUMAN Splice Isoform 1 of UniPro IDA PMID: 1108O164 Cellular tumor antigen p53, protein from Homo sapiens PMILHUMAN Splice Isoform PML-1 of UniPro TAS PMID: 92941.97 Probable transcription actor PML, protein from Homo sapiens DNA cytosine UniPro ISS PMID: 12138111 methyltransferase 3alpha, isoforma, protein from Homo sapiens DNA cytosine UniPro IDA PMID: 12138111 methyltransferase 3alpha isoform b, protein from Homo sapiens SMC3 HUMAN Structural maintenance of UniPro IDA PMID: 115901.36 chromosome 3, protein from Homo sapiens SMRCD EHUMAN SWI/SNF-related, matrix UniPro NAS PMID: 11031099 associated, actin-dependent regulator of chromatin Subfamily A containing DEAD H box 1, protein from Homo sapiens SPTN4 HUMAN Splice Isoform 1 of UniProt IDA PMID: 11294.830 Spectrin beta chain, brain 3, protein from Homo sapiens TEP1 HUMAN Splice Isoform 1 of UniProt IDA PMID: 7876352 Telomerase protein component 1, protein from Homo sapiens US 8,148,129 B2 153 154 -continued

Symbol Evi- Refer Qualifier Sequence/GOst Information Source dence ence ARSA1 HUMAN Arsenical pump-driving Oni TAS PM D: 97.36449 ATPase, protein from Homo sapiens EXOS9 HUMAN Polymyositisfscleroderma Oni TAS PM D: 2007859 autoantigen 1, protein rom Homo sapiens PS3 HUMAN Splice Isoform 1 of Oni IDA PM D: 1208O348 Cellular tumor antigen p53, protein from Homo sapiens DDX21 HUMAN Splice Isoform 2 of Oni TAS PM D: 8614622 Nucleolar RNA helicase 2, protein from Homo sapiens DDXS4 HUMAN ATP-dependent RNA Oni ISS Oni Prot: Q9BRZ1 helicase DDX54, protein rom Homo sapiens DDX56 HUMAN Probable ATP-dependent Oni TAS PM D: 10749921 RNA helicase DDX56, protein from Homo sapiens DEDD2 HUMAN Splice Isoform 1 of DNA Oni IDA PM D: 11741985 binding death effector domain-containing protein 2, protein from Homo sapiens DEDD HUMAN Splice Isoform 1 of Death Oni ISS Oni Prot: O75618 effector domain containing protein, protein from Homo sapiens DKC1 HUMAN H/ACA ribonucleoprotein Oni TAS PM D: 105563OO complex subunit 4, protein from Homo sapiens DNB9 HUMAN DnaJ homolog subfamily Oni ISS Oni Prot: Q9UBS3 B member 9, protein rom Homo sapiens EXOS1 HUMAN 3'-5' exoribonuclease Oni IDA PM D: 11812149 CSL4 homolog, protein rom Homo sapiens EXOS4 HUMAN Exosome complex Oni NAS PM D: 11110791 exonuclease RRP41, protein from Homo sapiens EXOSS HUMAN Exosome complex Oni NAS PM D: 11110791 exonuclease RRP46, protein from Homo sapiens EXOS9 HUMAN Polymyositisfscleroderma Oni TAS PM D: 2007859 autoantigen 1, protein rom Homo sapiens EXOSX HUMAN Splice Isoform 1 of Oni TAS PM D: 1383.382 Exosome component 10, protein from Homo sapiens FXR1 HUMAN Fragile X mental retardation Oni TAS PM D: 10888S99 syndrome-related protein 1, protein from Homo sapiens GEM4 EHUMAN Component of gems 4, Oni TAS PM D: 10725331 protein from Homo sapiens GNL3 HUMAN Splice Isoform 1 of Guanine Oni SS Oni nucleotide binding protein-like 3, protein from Homo sapiens F16 HUMAN Splice Isoform 2 of Gamma Oni DA PM D: 146S4789 interferon-inducible protein fi-16, protein from Homo sapiens LF2HUMAN interleukin enhancer-binding Oni DA PM D: 11790298 actor 2, protein rom Homo sapiens MP3 HUMAN O3 Small nucleolar Oni DA PM D: 126SSOO4 ribonucleoprotein protein MP3, protein from Homo sapiens MP4 HUMAN O3 Small nucleolar Oni DA PM D: 126SSOO4 ribonucleoprotein protein MP4, protein from Homo sapiens KI67 HUMAN Splice Isoform Long of Oni NR Oni Prot: P46O13 Antigen KI-67, protein rom Homo sapiens MBB1A HUMAN Splice Isoform 1 of Myb Oni ISS Oni binding protein 1A, protein from Homo sapiens