Public Disclosure of Biological Sequences in Global Patent Practice
Total Page:16
File Type:pdf, Size:1020Kb
World Patent Information 43 (2015) 12e24 Contents lists available at ScienceDirect World Patent Information journal homepage: www.elsevier.com/locate/worpatin Public disclosure of biological sequences in global patent practice * Osmat A. Jefferson a, b, , Deniz Kollhofer€ a, b, Prabha Ajjikuttira a, b, Richard A. Jefferson a, b a Queensland University of Technology, Brisbane, QLD 4000, Australia b Cambia, P.O Box 3200, Canberra, ACT 2601, Australia article info abstract Article history: Biological sequences are an important part of global patenting, with unique challenges for their effective Received 5 January 2015 and equitable use in practice and in policy. Because their function can only be determined with Received in revised form computer-aided technology, the form in which sequences are disclosed matters greatly. Similarly, the 20 July 2015 scope of patent rights sought and granted requires computer readable data and tools for comparison. Accepted 23 August 2015 Critically, the primary data provided to the national patent offices and thence to the public, must be Available online 22 October 2015 comprehensive, standardized, timely and meaningful. It is not yet. The proposed global Patent Sequence (PatSeq) Data platform can enable national and regional jurisdictions meet the desired standards. Keywords: © Patent 2015 Cambia. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license Biological patent (http://creativecommons.org/licenses/by-nc-nd/4.0/). Patent sequence Patent office Sequence listings Patent sequence data Patent sequence download PatSeq tools Patent disclosure 1. Introduction information-encoding elements, and within the context of patent eligibility or infringement issues, their structure and function value In the traditional working of the patent system, an inventor has gained more importance as various jurisdictions e including secures governmental rights to exclude others from making, using, the United States and Europe - attempt to balance competing in- or selling his/her invention for a limited time in exchange for terests either in favor of the inventors, as the case in Europe, or the publicly disclosing the full details of the invention - what is called public, as the case in USA [3]. ‘the teachings’. The teachings derived from the disclosure and the As genetic sequences are made up of combinations of four bases practice of an invention enable the public to use the invention e designated as A, C, G, and T (U), in the case of DNA (RNA) e or 20 through licensing, to use the invention freely without license amino acids each with different chemical properties - designated outside the jurisdiction, scope and timeframe of protection, build with single or triple letter codes - in the case of protein, they can upon the invention through research and development, improve only be interpreted using specialized computer software tools. Such upon it, or design around it to advance scientific and technological tools clarify the structure, function and similarity of any sequence capabilities and ultimately to benefit society. relevant to other sequences. Therefore, during the disclosure pro- In the contemporary use of patents to secure rights over genetic cess, the applicant, the patent office, and upon publication, the material, the quality of these teachings has come under public public should be able to access the disclosed sequence data and use scrutiny and the role of patent offices in the disclosure process has the computer tools to interrogate it within the context of all known been challenged [1,2]. sequence listings to interpret, understand, and value their com- Within patent documents, genetic sequences have been viewed bined effect on biological innovations. While some patent offices both legally and practically as either chemical compounds or as claim to have internal computer-mediated searching, analysis and visual tools to interpret the contextual value or meaning of patent sequences, public access is still lacking. Moreover, creating patent * Corresponding author. Cambia/QUT, P.O. Box 3200, Canberra ACT 2601, landscapes that can integrate sequence information with global Australia. patent rights and disclosures remain expensive, slow and E-mail address: [email protected] (O.A. Jefferson). http://dx.doi.org/10.1016/j.wpi.2015.08.005 0172-2190/© 2015 Cambia. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). O.A. Jefferson et al. / World Patent Information 43 (2015) 12e24 13 cumbersome to the public and to those professionals who cannot databases. The non-redundant databases are created at two levels afford the costly services of commercial providers. and contain additional annotation, patent family information and Rules for handling of sequences in patent prosecution, imple- links to patent literature [19,20]. mented by United States Patent and Trademark Office (USPTO) and Sequence listings disclosed in published patent documents from other major patent offices in the 1990s, required the submission of Japan Patent Office (JPO) and Korean Intellectual Property Office any sequence (nucleotide or peptide) disclosed in any national or (KIPO) are shared through DDBJ, which is administered by the foreign application [4]. At that time, the disclosure standard format, Center for Information Biology of the National Institute of Genetics known as “Sequence Listing”, was simple and file submissions were in Japan. The Databank includes the nucleotide-based sequence accepted either electronically or on paper [5]. As sequence disclo- listings from patent documents published in Japan and Korea since sures grew exponentially over time, more legal rulings were 1997 [21]. In 2010, two amendments were introduced into this introduced regarding submissions and with respect to compliance database. First, the NCBI taxonomy ID was added to each sequence with standard formats. While the major offices recommended in- listing based on the original organism declared for that sequence in ternational standards such as ST. 25 [6], for the disclosure of the patent application and the newly revised entries for nucleotides sequence listings in the submitted patent applications, the sub- and proteins were released in May 2010 with a scheduled update mitted file formats remained flexible until recently (Table 1). Full once per year [22]. The second amendment included the release of compliance with ST. 25 and the inclusion of the associated meta- protein sequence listings from JPO and KIPO for ftp downloading data such as the origin of the sequence, its length and type, func- and later the availability of a sequence similarity search facility for tion, and other markup in a computer readable format [7], were protein sequence listings from USPTO, EPO, JPO, and KIPO [23]. actually achieved in only a few offices; variations in the readability Other public databases that provide access to and search facility of file formats of disclosed sequence data and in its accurate of yet smaller collections of published patent sequences include matching when transferred to public databases persist [8,9]. For Patome@Korea database serving nucleotide and protein patent example, from 2001 until 2007, most international applications did sequences provided by the Korean Intellectual Property Office not comply with ST. 25 text format rules and the disclosed se- (KIPO) [24] from 2004 to 2008 and maintained by the Korean quences were in tiff or pdf files and contained NON ASCII binary Bioinformation Center (KOBIC). Similarly, NASDAP, a semi-public data (Table 1, “Format of published sequences” category at WIPO in Chinese database, provided free sequence search services to 2007). explore Chinese gene patents (applications and grants from 1999eFeb 2006), but it seems it is no longer available in our latest 2. Availability of published patent sequences to the public search of May 2015. The database covered 123,218 sequence listings from 8563 Chinese patents acquired from State Intellectual Prop- Each of the major patent offices adopted a strategy regarding the erty Office as hard copies or images [25]. publication and provision of sequence listings to the public. Table 1, column “Format of published sequence listings” depicts the prac- 3. Why do we need a global and transparent patent sequence tice adopted over time by USPTO, World Intellectual Property Office dataset? (WIPO), and European Patent Office (EPO). Throughout the past 25 years, variations have existed among these offices. For example, the As NCBI, EMBL-EBI, and DDBJ decide which sequence listing data published sequence data from US patent documents is available for to include in their databases and what sequence search facility to bulk downloads under various file formats, however USPTO does provide on what data and when, accurate and comprehensive ac- not offer a sequence search facility to interrogate the data. The counting of published sequence data as disclosed in patents is then office passes its published data to the National Center for hard to achieve. Upon reviewing the maze of the available patent Biotechnology Information (NCBI) [10]. This center provides a sequences from the public or commercial sources, Andree et al. comprehensive public sequence search facility, BLAST, allowing (2008) reported that each public database has still a unique dataset contextual interrogation of sequence data and a world class data- and for any comprehensive searching and analysis, users may need base, GenBank