Downloaded in MS Word Format (.Docx)
Total Page:16
File Type:pdf, Size:1020Kb
Wang and Tseng J Cheminform (2019) 11:78 https://doi.org/10.1186/s13321-019-0401-4 Journal of Cheminformatics SOFTWARE Open Access IntelliPatent: a web-based intelligent system for fast chemical patent claim drafting Pei‑Hua Wang1 and Yufeng Jane Tseng1,2* Abstract The frst step of automating composition patent drafting is to draft the claims around a Markush structure with sub‑ stituents. Currently, this process depends heavily on experienced attorneys or patent agents, and few tools are avail‑ able. IntelliPatent was created to accelerate this process. Users can simply upload a series of analogs of interest, and IntelliPatent will automatically extract the general structural scafold and generate the patent claim text. The program can also extend the patent claim by adding commonly seen R groups from historical lists of the top 30 selling drugs in the US for all R substituents. The program takes MDL SD fle formats as inputs, and the invariable core structure and variable substructures will be identifed as the initial scafold and R groups in the output Markush structure. The results can be downloaded in MS Word format (.docx). The suggested claims can be quickly generated with IntelliPatent. This web‑based tool is freely accessible at https ://intel lipat ent.cmdm.tw/. Keywords: Web server, Markush structure, Pharmaceutical patent Introduction of visualizing and analyzing Markush structures from a Claims are the most important sections in composition composition patent [7]. Its web based application, iMa- patents [1]. In the pharmaceutical industry, claims should rVis, revised the underlying R group numbering system include key compounds and all structural derivatives that to deal with nested R group presentation [8]. An algo- are likely to have the same efects [2]. Tis is achieved by rithm based on SMIRK language is introduced to solve Markush structures [3] that describe a series of chemical the query for a compound within a Markush structure compounds having a common core structure and vari- [9]. able substituents called R groups [4]. Te next step in drafting compositional patent claims is To ensure the novelty of a new composition patent, one to maximize patent coverage. Te algorithm implemented needs to make sure that no compound in the claims is in ChemAxon can automatically generate Markush struc- taken by prior arts. One common routine is to perform tures [10]. Periscope system helps generating Markush structure searches in chemical databases [5]. One of the structures from compounds, visualization, and searching most useful free online databases for chemical prior-art specifc chemical structures [11]. However, both systems search is SureChEMBL [6]. Performing a structure search cannot expand the coverage by adding new variations using the Markush structure as input is the most efcient to the Markush structure. If the patent coverage is fully method. To generate the input Markush structure, one maximized, a monopoly state of frst-in-class drugs can be needs to study the compounds they want to patent to achieved and profts will be ensured [12, 13]. identify the input scafold and R group variations. Adding variations beyond the current Markush struc- Previous studies focused on interpretation or searching ture relies on the experience of patent attorneys. Hence, of Markush structure. For an example, MarVis is capable patent coverage becomes more or less dependent on the *Correspondence: [email protected] writers. Despite the advance in chemical informatics, 1 Graduate Institute of Biomedical Electronics and Bioinformatics, National drafting and describing Markush structures still requires Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei 106, Taiwan substantial manual efort. In this study, we introduce a Full list of author information is available at the end of the article publicly available server, IntelliPatent, for rapid chemical © The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Wang and Tseng J Cheminform (2019) 11:78 Page 2 of 7 patent drafting that includes the option to recommend Expanding the coverage of the Markush structure Markush structures to expand composition patent Te scafold of the initial Markush structure, includ- claims. With an R group library built from 30 composi- ing the scafold and collections of R groups attached to tion patents, the defnition of output Markush structure the same atoms, is generated by RDKit using the Find- can be extended based on the input compounds. MCS function from the input compounds. For every R group structure, the presence of common R groups will be detected. If the R group comprises several com- Implementation mon groups, the text term describing the R group will be Dataset generated from the most distant common group mov- A library containing 269 R groups was constructed from ing to the ones near the attachment point. For R groups 30 patents of the top-selling drugs in the US in 2005. attached to a carbon chain or ring in the scafold, if the Te compounds in the “Biological Study” or “Prepara- R group structure exactly matches any of the common tion” sections were retrieved from SciFinder with the substituents on the corresponding main structure in our patent IDs listed in the study by Hattori et al. [14]. Te library, the rest of the common substituents will be added structures of the 30 drug entities were downloaded from to the same ring or carbon chain. DrugBank [15] using the trade names. Case study data Building the R group library To demonstrate how IntelliPatent works, the chemi- cal composition patents of bromazine, orphenadrine, Te scafolds of the 30 drug entities were identifed using omeprazole and timoprazole were retrieved from the Scafold Hunter-2.4.1 [16]. Using JChem Base API [17], European Patent Ofce using patent IDs US2527963A, the scafolds of the compounds were removed to obtain US2567351A, EP0005129B1 and US4045563A, respec- 5589 structural fragments, which were saved as R groups. tively. Te example compounds for bromazine and ome- We removed 5183 duplicates including optical isomers prazole and the Markush structures of the subsets from and fltered out 2 R groups having deuterium, 2 hav- the claims of orphenadrine and timoprazole were con- ing phosphorus and 133 having aliphatic carbon chains structed manually. with lengths longer than 6 atoms to have the fnal 269 R groups. Next, we identifed 28 common groups along with their User’s privacy text terms from the patent claims. Te common groups Te input fle from user is deleted immediately after include functional groups with defned structures (e.g., computation at the end of the session. Linux crontab hydroxyl, cyano and halo) and general groups that incor- command was used to check all the output fles twice porate a series of structures (e.g., alkyl, aryl, and heter- a day and remove the fles that were created more than oaryl). Te SMARTS notations for detecting the common 12 h ago. Terefore, the output data including graph, text groups were generated to categorize the R groups in the and MS Word fles will be deleted in less than a day. Since library into 57 categories, including composite catego- the server does not generate or retain any log fle, the IP ries such as alkyl–aryl. Te common groups, which are address of users will not be traceable via the server of defned as the substituents on carbon chains or rings, IntelliPatent. All web trafc is encrypted via HTTPS. were also recorded. Results and discussion R groups from approved drugs R group library analysis Te structures of 2413 approved drugs were retrieved 269 R groups are generated using the example com- from DrugBank on 2019/11/19. For those structures pounds from the patents of 30 top-selling drugs and their with more than one molecules, only the compound with scafolds were processed by Scafold Hunter, based on the the highest molecular weights were retained. We fl- concept that compounds in a Markush structure can be tered out big molecules (MW > 800 Da), tiny molecules divided into a scafold plus several R groups. (MW < 100 Da and molecules with less than 6 carbon Te top 10 R groups and their percentage of counts atoms) and duplicates including optical isomers to have in all the 218 R groups showing up at least twice among 1921 drugs. Te R groups from 1808 drugs were obtained all 269 groups are shown in Table 1. Te top dupli- by removing the scafolds from compounds as the meth- cate R group, methyl, alone accounts for nearly 20% of ods we used in building R group library in this work. occurrence. Te top 10 groups account for 63% among Tere are 364 diferent R groups from approved drugs. all the duplicate R groups and they are all small groups Wang and Tseng J Cheminform (2019) 11:78 Page 3 of 7 Table 1 The top 10 R groups in library R group SMILES Occurrence (%) Methyl *C 19.9 Fluoro *F 8.4 Methoxy *OC 7.0 Hydroxy *O 6.3 Sulfamoyl *S( O)( O)N 5.6 = = Chloro *Cl 5.1 Hydrogen *H 3.8 Ethynyl *C#C 2.8 Trifuoromethyl *C(F)(F)F 2.2 Carboxy *C( O)O 1.9 = Table 2 The top 10 most used R groups from approved drugs R group SMILES Occurrence (%) Both top 10 Methyl *C 23.6 O Fig.