Patents OPEN

PATENTS OPEN Transparency tools in gene patenting for informing policy and practice Osmat A Jefferson, Deniz Köllhofer, Thomas H Ehrich & Richard A Jefferson The Supreme Court’s decision in Myriad highlights the need for tools enabling nuanced and precise analysis of gene patents at the global level. n the recent decision Association for Mapping the patent landscape Within ‘The Biological Lens’ facility, the IMolecular Pathology v. Myriad Genetics1, There have been numerous studies and pub- sequence database currently holds 147,565,858 the US Supreme Court held that naturally lications about the scope, social or economic million nucleotide and amino-acid sequences occurring sequences from human genomic impact, and policy and practice implications disclosed in 323,721 global patent documents DNA are not patentable subject matter. Only of patenting of biological sequences—com- comprising both applications and grants. Of certain complementary DNAs (cDNA), monly known as ‘gene patenting’3–6. Many these sequences, 67% are repeated at least once modified sequences and methods to use of these studies contain incomplete data sets, in the corpus. Some level of redundancy is to be sequences are potentially patentable. It is use analytical tools that cannot distinguish expected, as the same sequence may be either likely that this distinction will hold for all the nature of the sequence similarities or fail referenced in a single patent document for dif- DNA sequences, whether animal, plant or to parse and analyze patent claims. Few of ferent purposes or mentioned in many related microbial2. However, it is not clear whether these studies make the primary data available or unrelated patent documents. Although a this means that other naturally occurring in a form allowing review by others. In an majority of patent documents list only one informational molecules, such as polypep- effort to move beyond opinion pieces and to or a few sequences, a substantial number list tides (proteins) or polysaccharides, will also provide a facility that can be used to ask and thousands or even millions of sequences. For be excluded from patents. answer specific questions in an open, veri- example, US Pat. No. 7,777,022 discloses 4.2 The decision underscores a pressing need fiable manner, we have created a biological million sequences. As millions more sequences for precise analysis of patents that disclose facility (http://www.lens.org/lens/biologi- become available, patent offices face a difficult © 2013 Nature America, Inc. All rights reserved. America, Inc. © 2013 Nature and reference genetic sequences, especially cal_search) as a public resource within ‘The challenge to render that information accessible in the claims. Similarly, data sets, standards Lens,’ an open, global cyber infrastructure to and useable by the public. compliance and analytical tools must be dedicated to increasing the efficiency and Major patent offices claim to have sophis- npg improved—in particular, data sets and ana- fairness of the innovation system by making ticated search tools and databases that likely lytical tools must be made openly accessi- access to patent documents more transparent comprise a very substantial set of sequences; ble—in order to provide a basis for effective and inclusive. We have used this facility to however, information about the effectiveness decision making and policy setting to sup- create tools that allow for dynamic mapping of these algorithms and the scope of these port biological innovation. Here, we present and shared analysis of the scope of patent- sequence databases are not generally available a web-based platform that allows such data ing over several genomes, beginning with the to the public, and they may even be off limits aggregation, analysis and visualization in human genome. to the dozens of patent offices in jurisdictions an open, shareable facility. To demonstrate The single most important consideration with emerging intellectual property (IP) pro- the potential for the extension of this plat- in gene patenting is the critical difference tection or with limited budgets. Some com- form to global patent jurisdictions, we dis- between disclosure of sequences and claim- mercial vendors claim to offer comprehensive cuss the results of a global survey of patent ing of sequences. Before the recent decision data and sophisticated analysis, but this is an offices that shows that much progress is still in Myriad, the literature on gene patenting expensive means of accessing what is funda- needed in making these data freely available led to more confusion than insight. Although mentally public information, and provides one for aggregation in the first place. some claim that the concerns about gene pat- of many entry barriers that disadvantage small- ents were exaggerated and based on outliers to-medium enterprises (SMEs) and innova- and wrong perceptions5,7,8, others maintain tion-focused and impact-driven public sector Osmat A. Jefferson, Deniz Köllhofer, Thomas that patent protection for genetic sequences and philanthropy. In addition, these commer- H. Ehrich and Richard A. Jefferson are at was excessive and led to obvious inventions, cial databases are incomplete14. For example, Cambia, Canberra, Australia, and Queensland questionable patents and opaque innovation the millions of sequences published in the University of Technology, Brisbane, Australia. systems that may have harmed the integrity US Patent and Trademark Office’s (USPTO; e-mail: [email protected] and of the market and constrained scientific prog- Washington, DC) Patent Applications since [email protected] ress9–13. 2001 are not incorporated within GenBank 1086 VOLUME 31 NUMBER 12 DECEMBER 2013 NATURE BIOTECHNOLOGY patENTS a b 250 ◼ R ◼ E-value = 0, 75% minimum hit coverage 200 ◼ NR c ◼ E-value = 0, 50% minimum hit coverage ◼ Unique patent sequences ◼ No. of docs 150 80 ◼ E-value = 0.001, 75% minimum hit coverage 25 ◼ Declared human sequences 100 Thousands ◼ E-value = 0.001, 50% minimum hit coverage 70 50 20 0 60 90% 98% 100% 50% 95% 100% 50 15 Similarity and query length coverage 40 (millions) 10 30 % known genes 20 5 10 Nonredundant sequence counts Nonredundant 0 0 90% 50% 98% 95% 100% 100% RefSeq Gencode V16 CCDS (20130430) Similarity and query length coverage Transcriptome/proteome datasets Figure 1 Patent sequences mapped on the human genome (GRCh37 at http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/index.shtml). (a) Mapping was based on various similarity and query length coverage rates (90% 50%, 98% 95%, and 100% 100%). Unique patent sequences refer to sequences with only unique mapping locus. Although the majority of nonredundant sequences were declared human in the patent documents, around 20% were unspecified or nonhuman. (b) The internal chart shows only mapped sequences that are referenced in the granted claims (1% of the data); redundant sequence counts (R), nonredundant sequence counts (NR) and their corresponding patent grants counts. (c) Homology-based human transcriptome and proteome analysis based on two filters of E-value and percentage of minimum hit coverage. (http://www.ncbi.nlm.nih.gov/genbank/), or number of matching nucleotides between the because of the chosen Expect value (E value) in any other global public facility, yet as pub- patent sequence, and the reference genome and and minimum hit coverage percentage and lished data they must clearly be considered as the sequence coverage reflects the proportion the data set used for comparison (Fig. 1c). For potential prior art. This lack of access is also of the patent sequence that was included in the example, under our most stringent condition problematic for patent applicants, who may not alignment. Because of the high repeat rate in (75% minimum hit coverage and E value of 0), know whether the sequence for which they seek the sequence listing corpus, a non-redundant the percentage of known genes was calcu- protection has been previously claimed or not. data set of patent sequences was used for the lated as 26% based on RefSeq, 32% based on To provide a basis for better understanding mapping against the reference human genome GENENCODE v16, or 37% based on CCDS the complex landscape of gene patenting, we (assembly GRCh37) of the GRC (http://www. data set, whereas with 50% minimum hit have mapped patent-disclosed sequences onto ncbi.nlm.nih.gov/projects/genome/assembly/ coverage and 0.001 E value, the percentage the human genome and developed a patent- grc/index.shtml). For mapping highly homolo- of known genes reached 49%, 57% or 62%, sequence (PatSeq) toolkit to find, align, browse gous genomic sequences, we used the Burrows- respectively. and explore these sequences. To illuminate Wheeler Aligner suite18. Potential mRNA and A 2005 paper by Jensen and Murray20 forms © 2013 Nature America, Inc. All rights reserved. America, Inc. © 2013 Nature the scope of patenting of known genes on the protein sequences were mapped to the refer- the basis for the widespread assertion that human genome, we selected those mapped ence genome using BLAT19. 20% of the human genome had been patented sequences referenced in granted claims (GC) Under stringent conditions (100% simi- before Myriad20,21. But as Holman observed5, npg of the USPTO and performed homology- larity and 100% coverage rate), 15.6 mil- Jensen and Murray’s 20% coverage conflated based analysis with three publicly available lion sequences were matched to the human genetic sequences that were merely referenced transcriptome or proteome data sets: RefSeq15, genome at one locus or more (Fig. 1a). These in patent claims with genetic sequences that GENCODE16 and Ensembl’s Consensus correspond to 31.4 million sequence listing were explicitly claimed. In 2008, Cook-Deegan CDS17. We found that the percentage of known entries after reintroduction of the redundancy provided a more conservative estimate that genes referenced—not necessarily claimed— in the corpus. Although the majority of these ~3,000 to 5,000 human genes had been pat- ranges from 26% to 62%, depending on the sequences were declared within the patent as ented in the United States6.

Patents OPEN

The Impact of Academic Patenting on the Rate, Quality, and Direction of (Public) Research Output

Request for Comments on Proposed Amendments to Patent

Applying Library Values to Emerging Technology Decision-Making in the Age of Open Access, Maker Spaces, and the Ever-Changing Library

The Northern Powerhouse in Health Health Science Alliance Ltd Research – a Science and Innovation Audit

Understanding Research Metrics INTRODUCTION Discover How to Monitor Your Journal’S Performance Through a Range of Research Metrics

The Patent System and Research Freedom: a Comparative

Proteomics Research in India

Elsevier Research Platforms

Inventions and Patenting in Africa: Empirical Trends from 1970 to 2010

The Humanist Perspective, in Social Science : the Case of Erich Fromm.

Thomas Fritz Action Proposal ICORE 2008 1 of 24 Through the Looking

University Microfilms International 300 N, ZEEB RD., ANN ARBOR, Ml 48106