An Ontology-Based Approach for Facilitating Information Retrieval from Disparate Sources: Patent System As an Exemplar Kincho H. Law
Total Page:16
File Type:pdf, Size:1020Kb
An Ontology-Based Approach for Facilitating Information Retrieval from Disparate Sources: Patent System as an Exemplar Kincho H. Law Professor of Civil and Environmental Engineering Engineering Informatics Group Stanford University Collaborators: Jay P. Kesan, Professor , College of Law, UIUC Siddharth Taduri (Former Student), Stanford University Gloria Lau, Consulting Assoc. Professor, Stanford University Ontology Summit March 10, 2016 Ref: S. Taduri, Information Retrieval Across Multiple Information Sources Using Knowledge- Based Approach, Engineering Degree Thesis, Stanford University, March, 2012. Motivation Patents: Can we obtain all relevant (validity, enforceability, and infringement) information related to patent(s) in a particular sector/category/market segment and analyze that information? In the patent context: What are the issued patents in a given space? What is the legal scope of protection for same/similar patents? Who are the competitors? Have any same/similar patents been challenged in court? Are there any relevant scientific literature, prior court decisions, laws and regulations that can potentially be used to challenge and to invalidate some patent claims? Focus: Biomedical Patents Other Similar Problems: integrating administrative agencies, courts, technical/scientific literature, and technical product literature in a host of law and science areas (Pharmaceuticals; Biofuels;….) Problem Statement Issued Patents and Applications File Wrappers Court Cases Technical Regulations Publications and Laws Patent Validity and Infringement/Enforcement Questions involves analysis of documents in various domains – Patents, USPTO File Wrappers, Court Documents, Scientific/Technical Publications, and Technical Product Literature Owned by disparate public (government) and private sectors The information is often available online, but siloed into several diverse information sources Today, the analysis is done manually and poorly by companies offering various patent research and strategy services Use-Case: Erythropoietin (Repository) Synthetic production of the hormone has made it possible to treat diseases such as Anemia Core patents – U.S. Patents 5,621,080, 5,756,349, 5,955,422, 5,547,933, 5,618,698 135 directly related patents and over 3000 related publications Around 30 court cases, patent litigation involving major companies including Amgen, Hoechst Marion Roussel, Inc., Transkaryotic Therapies, Inc. Over 162,000 full-text scientific publications from 49 prominent journals in biomedicine from the TREC 2007 Genome Dataset (http://ir.ohsu.edu/genomics/2007protocol.html) Comprehensive domain knowledge available Domain Terminology is Everywhere Excerpt from scientific publication Excerpt from U.S. Patent# 5,441,868 Regional variability in the incidence of end-stage renal disease: an epidemiological approach. Title: Production of recombinant erythropoietin …. Regional variability in the incidence of end-stage Abstract renal disease (ESRD) in Austria is reported. Our aim was …. low rates in the state of Tyrol. Disclosed are novel polypeptides possessing part or all …. of the primary structural conformation and one or moreESRD incidence data were obtained from …. of the biological properties of mammalian …. Between 1995 and 1999, 4811 new cases of ESRD were recorded; erythropoietin ("EPO") which are characterized in the state of Tyrol (T) …. incidence of ESRD patients with type 2 preferred forms by being the product of procaryotic or diabetes mellitus …. the difference in the overall ESRD eucaryotic host expression of an exogenous DNA incidence …. prevalence of DM, a highly significant correlation was sequence. Illustratively, genomic DNA, cDNA and Excerptfound from between court ESRD case incidence – Amgen, and DM Inc. v/s Chugai manufactured DNA sequences coding for part orPharm all of .… . the sequence of amino acid residues of EPO or for variability in the ESRD incidence in Austria is explained mainly by analogs thereof are incorporated into autonomouslyOn Juneregional 30, differences1987, thein DM United-2. Data Statesfrom similar Patent studies and …. allocation replicating plasmid or viral vectors employed toTrademark for ESRD Office … (PTO). issued to Dr. Rodney Hewick transform or transfect suitable procaryotic or U.S. Patent…. 4,677,195, entitled "Method for the eucaryotic host cells such as bacteria, yeast or Purification of Erythropoietin and Erythropoietin vertebrate cells in culture. Upon isolation from Compositions"culture (the '195 patent). The patent claims both media or cellular lysates or fragments, productshomogeneous of EPO and compositions thereof and a expression of the … method for purifying human EPO using reverse phase high performance liquid chromatography. The method claims are not before us. Problem Statement Knowledge Knowledge Issued Source 1: Source 2: Patents and Patent System Bio Ontology Applications File Ontology Wrappers Court Cases Specific Technical Domain Technical Regulations Publications and Laws Integration Sources are diverse in structure, formats, semantics and syntax How to retrieve patent information in a particular technological space? A knowledge-driven (Ontology-based) approach • Knowledge of scientific/technical domain • Knowledge of patent system domain Why Ontology? An ontology is an explicit description of a domain: concepts properties and attributes of concepts constraints on properties and attributes An ontology defines a common vocabulary a shared understanding Domain (Bio) Ontologies Bio Ontologies serve as standards for terminology in Bio-Medical (Science) domain (Ref: Bioportal.bioontology.org, accessed March 2012) Using Concept Hierarchy to Determine Relevancy Doc 1 Bio Ontology … erythropoietin …colony Hematopoietic stimulating factor Growth Factor Use of super class … concept for relevancy Colony No direct similarity Stimulating Factor Erythropoietin EPO Doc 2 … EPO …growth factor … Direct term based matching cannot relate the two documents Bio-ontology reveals that EPO and erythropoietin are synonymous Class hierarchy provides concepts (such as colony simulating factor) useful for determining relevance between documents (with appropriate weighting scheme) Expanded Query (with domain ontology) Original Term: Erythropoietin Synonyms: Erythropoietin, Recombinant Erythropoietin, erythropoietin receptor binding, Hematopoietin, Recombinant EPO, Erythrocyte Colony Stimulating Factor, Epoetin, EPO … Children: Darbopoietin Alfa, Epoetin Alfa, Epoetin Beta … Parents: Colony Stimulating Factors, cytokine receptor binding, recombinant hematopoietic growth factors… Grand-Parents: hematopoietic growth factor, receptor binding, recombinant growth factor … An appropriate ranking function is applied to balance the more general terms. Heuristically, we assign a higher weight to synonyms, and a lower weight as we traverse away from the concept node Resulting Query: “original term” OR [synonyms]^weight OR [children]^weight OR …. Patent System Ontology (patent documents, court cases, file wrappers) Competency Questions Patent Domain: • Return all patent documents which contain the phrase ‘recombinant erythropoietin receptor’ in the claims • Return all the patent documents which contain the phrase ‘recombinant erythropoietin receptor’, at least 3 claims, issued before 02-02-1999 and assigned to Genetics Inc. Court Case Domain: • Return all court cases which contain the term – ‘erythropoietin’ • Return all court cases which involve the company Amgen Inc. either as the plaintiff or defendant, and from the District Court of Massachusetts Multi-domain: • Return all patents which contain the term – ‘erythropoietin’ in their claims, which are involved in at least one court litigation. • Return all court cases with the term ‘erythropoietin’. From these court cases, return the patents involved. From these patents, follow the backward and forward citations to identify more important patents. Patents Documents Around 8+ million U.S. patents (2.2 million in force today) In 2009, 485,312 patent applications were filed Information is contained in various sections of the documents; a full-text search alone is not sufficient –- other metrics such as classification, citations etc... need to be considered Documents are available in HTML Format and can be easily parsed Patent System Ontology Conceptual View of Patent Documents 927 F.2d 1200 (1991) Court Cases AMGEN, INC., Plaintiff/Cross-Appellant, v. CHUGAI PHARMACEUTICAL CO., LTD., and Genetics Institute, Inc., Defendants- Appellants. Court Cases are not very well Nos. 90-1273, 90-1275. structured United States Court of Appeals, Federal Circuit. March 5, 1991. Comparatively more difficult Suggestion for Rehearing Declined May 20, 1991. … to parse information … Before MARKEY, LOURIE and CLEVENGER, Circuit Judges. … PACER – public access to court THE PATENTS On June 30, 1987, the United States Patent and Trademark Office (PTO) issued to Dr. Rodney electronic records (database) Hewick U.S. Patent 4,677,195, entitled "Method for the Purification of Erythropoietin and Erythropoietin Compositions" (the '195 patent). The patent claims both homogeneous EPO and system for U.S. Courts - compositions thereof and a method for purifying human EPO using reverse phase high performance liquid chromatography. The method claims are not before us. The relevant claims requires one to know judicial of the '195 patent are: 1. Homogeneous erythropoietin characterized by a molecular weight of about 34,000 district, party/assignee