<<

This is an open access article published under a Creative Commons Non-Commercial No Derivative Works (CC-BY-NC-ND) Attribution License, which permits copying and redistribution of the article, and creation of adaptations, all for non-commercial purposes.

Article

Cite This: ACS Omega 2019, 4, 86−94 http://pubs.acs.org/journal/acsodf

Substance-Based : Identifying Gaps by Counting and Analyzing Substances Robert Tomaszewski*

California State University, Fullerton, 800 North State College Blvd, Fullerton, California 92831 United States

*S Supporting Information

ABSTRACT: Identifying research gaps and generating research questions are often a first step in developing ideas for writing a research paper or grant proposal. The concept of substance-based bibliometrics uses the counts of substances in the scientific literature to better understand, assess, and clarify the state and impact of information in the chemical . Connecting substances indexed to specific bioactivity or target indicators can lead to assessing the biochemical, biological, and medicinal relevance of substances as well as developing ideas for expanding drug design and discovery through identifying and modifying the structural features of molecules. This study uses Chemical Abstracts through the SciFinder database to count for the occurrence of substances in the scientific literature. The study sets out search strategies for discovering potential research gaps and new ideas through visualization of chemical structures with known bioactivity and target indicators. The author recommends that subject librarians integrate research gap training in their bibliographic instruction classes, particularly to upper-level undergraduate and graduate chemistry students.

1. INTRODUCTION is often the first step to finding all literature and A research gap is a question or a problem in a subject field that information about a research topic. Browsing through review has been answered incompletely or insufficiently or that has not papers is one approach for an overview, perspective, and understanding of a research topic as well as finding leading been answered at all. Research gap spotting and generating fi fi 1 research questions are often a first key step to identifying and researchers in a speci c eld. It is also important to developing ideas for writing a research grant proposal or paper. continuously analyze recent review papers to keep abreast on The initial research questions scientists often ask include how, the current state of a research area. Systematic reviews are types Downloaded via 172.113.141.62 on August 20, 2020 at 17:40:20 (UTC). who, what, which, and why: How far back does the literature of literature reviews containing many research studies that dwell fi deep into the literature with an objective to analyze the trends extend on a scienti c topic, element, substance, or reaction? fi 2,3 Who are the leading scientists working in a certain research field? and changes in a eld of study. When conducting a systematic What are the potential applications for specificchemical review, there are often protocols involved in deciding what See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles. structures? Which substances show bioactivity? Which struc- information is included and not included as well as more tural features are crucial for specific bioactivity or target affinity? evaluative statements made about the studies. In more Which reactions do not work? Why do certain functional groups traditional review articles, the tone is more neutral and often or elemental species affect bioactivity? Many degree there is no indication of how the articles referenced were chosen. programs require students to write and defend a research Searching peer-reviewed articles for information on future proposal; however, the process for discovering gaps in the research (i.e., occasionally found in the conclusion section or “ ” ff literature seldom receives any formal training. As a result, this under a separate heading like Future Work )oers another creates an outreach opportunity for science librarians to become approach to identifying gaps in the literature. The research more involved with researchers and students through found in articles may also spark new ideas for communication modes such as bibliographic instruction and alternative approaches to a problem. The funding or acknowl- online guides. edgment section can provide information about the grant source 1.1. Research Gaps: Literature Orientated. Searching the for the research and could be an indicator where to submit literature through different document types is always a good similar research ideas. Searching through dissertations and starting point for finding information about a research topic and its current state. To this regard, although many journals are Received: August 28, 2018 dedicated to publishing review papers solely, high profile Accepted: November 5, 2018 journals generally answer and solve a problem. Performing a Published: January 2, 2019

© 2019 American Chemical Society 86 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94 ACS Omega Article browsing laboratory notebooks can provide more information With the emergence of the internet and chemical databases on the progress of research. Various reports such as content such as SciFinder, Reaxys, ChEMBL, and PubChem, it has now analysis reports, analysis reports, and meta-analysis become possible to quickly visualize and compare the structural reports occasionally contain information on ideas for further makeup of property-specific substances and brainstorm ideas for research studies. continued research work.11 To this regard, SciFinder is arguably 1.2. Research Gaps: Researcher Orientated. the most comprehensive database covering chemical literature Communi- − cating with experts in a field is another approach in (Table 1).12 18 SciFinder further allows for substance searching understanding the state of a research area. Attending subject- specific conferences conducted by experts in a research field Table 1. Coverage and Content Information in SciFinder provides an opportunity to network and ask questions about SciFinder extending research and debating ideas for future studies. In a particular, conferences offer opportunities for dialogue at coverage and CAplus (references, 1800s to present, updated daily) update CAS REGISTRY (chemical substances, 1800s to present, seminars, workshops, panel discussions, oral presentations, updated daily) poster sessions, and luncheons as well as various informal CASREACT (reactions, 1840 to present, updated daily) gatherings. Visiting websites of leading researchers can provide CHEMCATS (chemical suppliers, 2013 to present, updated ongoing information about their research and the challenges weekly) ahead. Connecting and corresponding with researchers through CHEMLIST (regulated chemicals, 1980 to present, updated email, social media, or face-to-face are one direct approach for weekly) MARPAT (Markush structures, 1961 to present, updated input about a research idea. Communication with the daily) dissertation author may provide more direct insight into the CIN (chemical industry notes, 1974 to present, updated state of the research in the laboratory. On the other hand, some weekly) scientists may be reluctant to discuss their ideas from being MEDLINE (1946 to present, updated daily) scooped with their research. Moreover, because of the publish or indexing over 50 000 scientific journals, publication dates go back to 1907, but pre-1907 content (>224 000 records) is also made perish nature in academia, some researchers feel it is best to accessible back to the 1800s publish selected results in small amounts or at times when they over 47 million records chemistry and other science-related feel they have fully researched the subject area. Sometimes, it is a research records, such as journals, patents matter of just communicating at the right time and right place. (63 patent authorities), reports, books, conference proceedings, dissertations, and synthetic preparations 1.3. Research Gaps: Field Orientated. Identifying and over 144 million organic and inorganic substances analyzing the state of a research topic or field through online over 67 million DNA and protein sequences tools can provide guidance and direction on the progress of over 8 billion property values, data tags, and spectra research and development. General online resources, such as millions of single and multi-step reactions Web of Science, Scopus, Google Scholar, Essential Science fi over 348 000 inventoried/regulated substances Indicators, and Google Trends, and subject-speci c resources, over 1.1 million Markush structures such as SciFinder, Reaxys, PubChem, PubMed, among many fi over 1.7 million record industry notes others, provide access to nding information about related or searchable bioactivity and target indicators most cited documents, and in many cases, the popularity in a information CAS registry number fi research eld. Bibliometrics through collecting, counting, and chemical structures analyzing publication and citation information can further be experimental and predicted property data used to address trends in the literature. The use of different Markush search strategies with databases can provide past and present molecular formula information on the state of research about a scientific topic as reactions 4−6 well as background data on substances. text or numeric Schummer (1997) used Chemical Abstracts, Beilstein, Gmelin, aCAplus Core Journal Coverage List consists of around 1500 journals and other chemical handbooks to show an essentially from which abstracts are added and bibliographic, substance, and exponential quantitative growth of substances between the reaction information are indexed within a few days of publication, years 1800 and 2000 with significant deviations stemming from (https://www.cas.org/support/documentation/references/ world wars and a “catching up phenomena” during postwar corejournals). periods.7 Schummer (1997) also pointed out that “there is still no saturation discernible after 200 years of exponential growth within the chemical literature.19 SciFinder has assigned physical ”8 of substances. property data as well as bioactivity and target indicators to − Barth and Marx (2012, 2013) have introduced the idea of substances through the reported literature (Table 2).20 22 compound-based bibliometrics by searching for compounds Bioactivity indicators are a set of controlled vocabulary used to containing rare-earth elements in Chemical Abstracts and linking report a particular biological activity in the literature (e.g., anti- 9,10 to their corresponding publications. They stated that “the inflammatory agents, antitumor agents, cardiovascular agents, method can be applied to analyze large amounts of compounds enzyme inhibitors, immune agents, nervous system agents, in combination with the corresponding chemical concepts, to reproductive control agents, wound healing promotors).16,20 identify gaps in research, and hence open the door to new Target indicators are a set of controlled vocabulary used to research in well-described compound-based areas.”10 It was report particular biological targets in the literature (e.g., concluded that “the compound-based bibliometric concept can albuminoids, apoptosis-regulating proteins, blood-coagulation easily be extended to organic chemistry by searching molecular factors, cytokines, enzymes, globulins, glycoproteins, protease substructures rather than element combinations, or to inhibitors, RNA formation factors).16,20 These indicators are biochemistry by searching protein or nucleic sequences.”10 reminiscent of National Therapeutic Indicators. A complete list

87 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94 ACS Omega Article

Table 2. Substance Indicator and Property Search Options in SciFinder

indicator or property type searchable options bioactivity over 260 bioactivity indicators (e.g., anti-inflammatory indicator agents, antitumor agents, cardiovascular agents) target indicator over 5800 target indicators (e.g., albuminoids, apoptosis- regulating proteins, blood-coagulation factors) experimental 13 different experimental property data (e.g., boiling point, property magnetic moment, melting point, tensile strength) predicted 21 different predicted property data (e.g., bioconcentration property factor, density, molar solubility, pKa, vapor pressure) of these indicators is however not publicly available.23 The presence or absence of bioactivity and target indicators can be Figure 1. Substance searches for titanocene dichloride with used to identify potential areas for further research and drug fi design. By comparing the number of substances to an associated corresponding bioactivity indicator identi ed as antitumor agents. bioactivity or target indicator, scientists can analyze and evaluate ≥ relevant structural features to extend and develop research ideas. Table 4. Similarity Searches (Score 90) for Titanocene Potential research gaps and new ideas can be singled out through Dichloride and Similar Structures Using SciFinder visual examinations and assessments of chemical structures with known bioactivity and target indicators. η5 This study uses titanocene dichloride, ( -C5H5)2TiCl2 (CAS registry number: 1271-19-8), as a model molecule to identify and compare substances that show antitumor activity using different search strategies with the SciFinder database. The substance searches conducted in the study include the following: • Exact structure, substructure, and similarity (a score of 90 as a cutoff point) search for titanocene dichloride (Table 3 and Figure 1). The search is used to compare and analyze biological activity information obtained from the three different search types.

Table 3. Bioactivity and Target Indicators for Titanocene Dichloride from an Exact Structure, Substructure, and Similarity (Score ≥90) Search Using SciFinder

bioactivity indicators (number target indicators (number search type of substances) of substances) exact Structure antitumor agents (9) enzymes (1) anti-infective agents (1) substructure antitumor agents (315) apoptosis-regulating proteins (13) anti-infective agents (51) ubiquitin (13) anti-inflammatory agents (27) enzymes (12) immune agents (27) transport proteins (7) antiproliferative agents (5) globulins (5) enzyme inhibitors (4) glycoproteins (5) reproductive control agents hemoproteins (2) (2) RNA formation factors (2) similarity antitumor agents (37) enzymes (1) 2. RESULTS AND DISCUSSION (score ≥90) anti-infective agents (3) 2.1. Substance Counts by Search Type. According to Miller (2002), “an exact-match search can be thought of as • Similarity searches on titanocene dichloride through looking up a complete word in a dictionary. A substructure search is analogous to a wild-carded text search, and a similarity addition and replacement of ligands around the central “ ” ”24 titanium metal (Table 4). The search uses the idea of search resembles a sound-like search. An exact structure search retrieves the substance exactly as drawn and component ligand substitution to analyze and evaluate substances for fi biological activity. systems. A substructure nds the exact substance and the searched structure embedded within other substances. A • Substructure searches on titanocene dichloride through similarity search finds similar structures based on similarity substitution of functional groups on the Cp ligand (Table scores from a highest (most similar) to lowest (least similar) 5). The search uses the idea of ligand modification to value. In SciFinder, a similarity search uses a Tanimoto analyze and evaluate substances for biological activity. algorithm to retrieve all structures based on similarity.25,26

88 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94 ACS Omega Article

Table 5. Substructure Searches for Analogues of Titanocene Performing an exact structure search in SciFinder does not Dichloride Using SciFinder allow for variations in atoms (e.g., generic groups or R-groups); however, variable bonds are allowed (e.g., unspecified bonds). On the other hand, similarity searches cannot be done with structures that contain R-groups, variables, repeating groups, variable attachment, multiple fragments, and stereo bonds.27,28 A substructure search is generally performed when searching for specific structural or atom requirements in a chemical structure, whereas a similarity search is used for exploring similar substances with greater variation in the literature.27,28 The results from an exact structure search are redundant with both the substructure and a similarity search because they form a subset of each of the latter two searches. However, by separating the searches, the results can be refined accordingly for focusing on the substance structure (e.g., substances with the discrete structures or embedded structures) as well as limit the number of substances in the set for easier visualization, interpretation, and comparative analysis. A similarity search also ensures finding substances within a component system such as copolymers, mixtures, and salts even though it may not be present as an exact form or contain a CAS registry number. Performing an exact structure or similarity search separately yields fewer substances that are structurally similar. Fewer substance counts can facilitate easier and better visualization for comparing closely related structures (e.g., viewing 9 or 37 substances is easier than 315 substances, Figure 1). Further, an exact structure search retrieves multicomponent substances, where the searched molecule is present and discrete. This can lead to identifying research gaps based on modifying substance components rather than the searched substance. The type of bioactivity and target indicators found from an exact structure, substructure, and similarity (score ≥90) search η5 using ( -C5H5)2TiCl2 are shown in Table 3. It is evident from Table 3 that antitumor agents are the most common type of bioactivity indicator from each search type. In addition, the majority of different target indicators were identified from a substructure search. Figure 1 shows the counts of substances fi η5 identi ed as antitumor agents for ( -C5H5)2TiCl2 to each η5 search type. An exact structure search for ( -C5H5)2TiCl2 yielded 190 substances where the majority were multi- component systems, whereas others were labeled with 13Cor D on the rings. From the 190 set of substances, two classes of bioactivity indicators were identified with nine substances as antitumor agents (Table 3). One substance was the parent compound, and eight substances were multicomponents in the form of copolymers. A slight modification of the backbone of the organic fragment of the polymer could inspire new research. Further, accessing articles to these substances can provide synthesis information as well as the funding sources to submit research proposals. η5 A substructure search for ( -C5H5)2TiCl2 yielded 3224 The algorithm uses statistical analysis to compare the structure substances from which were identified seven different bioactivity drawn to all other structures in the database. The similarity indicators (Table 3). A total of 315 substances were classified as scores have a scale range of 0−100 and are based on CAS antitumor agents (Figure 1). Retrieving references to each structure descriptors such as element composition, atom count, indicator can be found by going directly to the substance ring count, atom sequence, bond sequence, and degree of “handbook format” table of information by clicking on the “CAS connectivity.26 A similarity search could result in retrieving number” or via view “Substance Detail” (Figure 2). Information substances with functional groups or bonding arrangements that such as specific enzymes can be located from this table (e.g., may not be obvious to the researcher. The most similar target indicators for titanocene dichloride are enzymes to which structures are identified as a higher score (e.g., similarity score of is stated glutaminase with access to 13 references). Using the ≥99 contains the most similar structures). According to variable function from the structure editor, a substructure search η5 SciFinder (2005), substances with a similarity score above 60 conducted on ( -C5H5)2TiX2 (where X = any halide), yielded will usually be displayed.26 5728 substances from which identified a slight increase of 349

89 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94 ACS Omega Article

Figure 2. Accessing bioactivity and target indicators from a substance table in SciFinder. substances as antitumor agents (i.e., one of seven different adding or removing terms in a Boolean search allows to narrow bioactivity indicators). or broaden the resulting search to retrieve the most relevant hit ≥ η5 A similarity search (score 90) for ( -C5H5)2TiCl2, yielded set. This building search process provides a strategy to identify a 711 substances from which were identified 37 substances as parent species with a bioactivity or target indicator that holds the antitumor agents and 3 substances as anti-infective agents most promise for further investigation. To this regard, the (Table 3). By identifying the bioactivity and target type for each η5 obvious choice for ( -C5H5)2TiCl2 would be to remove or add structure, drug discovery scientists can obtain supportive more Cp and chloro groups and deduce the best combination of evidence and guidance for further studies. According to ligands around the titanium center for the highest counts of “ Garritano (2013), it is also possible to identify substances substances with antitumor activity. From Table 4, it can be that do not have indexed bioactivity or target indicators though deduced that two Cp rings and one or two chloro groups on they may be structurally similar to known compounds. These titanium result in the highest number of antitumor substances. substances might be chosen for further research or investigation This hints that modification of these structural features could as to whether they would have similar, as yet undiscovered, lead to other substances with potential biological activity and biological activity or not.”21 Others have further stated that sets out an opportunity for more synthetic ideas. Likewise, the “these bioactivity and target indicators guide drug discovery ff half-sandwiched species, CpTiCl3, was found to have one scientists to new uses for known drugs, possible side e ects, and fi the original literature where pharmaceutical information was substance identi ed as an antitumor activity. Consequently, an ”22 opportunity for further research could be envisioned through reported. fi 2.2. Substance Counts by Ligand Addition and replacement or modi cation of the Cp rings and ancillary η5 ligands. This in turn could lead to improved antitumor activity. Replacement. A structural comparison for ( -C5H5)2TiCl2 to other compounds through a similarity search (score ≥90) is A substructure or similarity search can be used to identify ff shown in Table 4. The strategy in Table 4 was to deduce the di erent metals, in addition to isotope elements and multi- assembly of the Cp rings and chloro groups needed for component systems such as counterions. Furthermore, trends bioactivity. This can be performed by simply adding or replacing from the periodic table (e.g., elements with the most similar functional groups or stereochemistry around the molecule, properties are located in the same group) can be used to identify followed by analyzing for the number of substances indexed to a other elements for replacement (i.e., Ti is in the same group as specific type of bioactivity or target indicator. This type of Zr and Hf). This can bring forth ideas for extending the synthesis refining search strategy is much like a keyword search, where to other potential substance analogues exhibiting bioactivity.

90 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94 ACS Omega Article

Figure 3. Refining a substance set by property data in SciFinder.

2.3. Substance Counts by Ligand and Chemical can emerge. This type of refining search strategy using functional Structure Modification. Table 5 shows the results of group modification can be envisioned like a truncation or wild substructure searches for analogues of titanocene dichloride card search where slight modification of the search term brings indexed as the antitumor activity. Strategies and rationales for up the highest number and most relevant hits. ligand substitution and modification can be considered when One strategy for ligand modification in Table 5 was to add a determining how to approach a substance search. Optimizing group to the Cp ring that would affect solubility and then alter drug metabolism and pharmacokinetics can occur through the group slightly to see if the effect would increase or decrease solubility, stability, side effects, steric hindrance (bulkiness) and the number of substances with antitumor activity. This effect dimension, electronic effects (resonance and intrinsic inductive was also examined by incorporating unreactive methoxy groups. − effects), and stereochemistry.29 31 Functional groups can have The counts of antitumor substances were further analyzed by an electronic, solubility, and a steric effect on the substance. To introducing chirality and linking two Cp rings. Replacing phenyl this extent, the cytotoxic activity of substances may well be with a benzyl group attached to one Cp increases the number of correlated with their structure.32 The unique properties of active substances from zero to 74. Similarly, placing two benzyl − transition metals can also affect the activity.33 35 As a result, groups on the same Cp ring compared to one benzyl group on various rationales can be drawn when deciding how to modify a each Cp ring increases the count of antitumor substances from structure, such as the influence of a specific group to help with zero to 62. A strategy to determine the best structurally fit active water and lipid solubility and prevent unwanted side effects; substance in Table 5 is to calculate the gap count between the adding polar groups to increase polarity yet decrease hydro- total number of substances and the number of substances with phobicity; bulkiness and electronic effects from functional activity (i.e., so long as there are substance counts with activity). groups to increase the chemical and metabolic stability of a drug Generally, the smaller the gap count, the more promise the [e.g., changing the position of the functional group on the active substance holds. Hence, over 45 percent of all substances phenyl ring (i.e., para, ortho, and meta positions) such as an with the methoxy groups has antitumor activity, and 60 percent ortho group could act as a steric shield and hinder hydrolysis]; of all substances from the ansa-bridged ligand shows antitumor maintaining an optimized ratio of ionized to unionized drug activity. Thus, through modification of the structural features on species by controlling the pKa through functional groups; and the Cp ligand (e.g., ansa-bridged Cp ligand, electron-donating or masking groups known for toxicity or side reactions. The electron-withdrawing groups, chirality), scientists can deduce strategy of incorporating a group onto a Cp ring to increase and design the structural makeup of substances that may hold bulkiness and electronic effects could be used to enhance promise for potential or improved bioactivity. bioactivity or linking two Cp rings to improve hydrolytic 2.4. Substance Set Refining by Property Values. stability. Some researchers have stated that incorporating polar SciFinder allows to search and to refine by experimental and electron-withdrawing groups on the Cp ring could help with the predicted property information (Table 2). Substance sets can be solubility as well as improve antitumor activity from increasing refined by physical property value from the refine menu and the Lewis acidity of the Ti(IV) center to enhance binding to selecting a property value such as H donors, H acceptors, DNA.36,37 Sometimes, it is a matter of trial and error to see molecular weight, and solubility, among other values within a which groups or structural modifications result in the highest specified range. As shown in Figure 3, selecting “Property Value” counts of substances with bioactivity. Moreover, from the in SciFinder allows to plug-in a range for two experimental substance counts showing activity, a sense of structural direction property values as well as 21 different predicted property values.

91 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94 ACS Omega Article

Selecting “Property Availability,” followed by “Any Selected 5. CONCLUSIONS Experimental Property” allows to retrieve any combination of The concept of substance-based bibliometrics uses the counting the 13 experimental property values without specifying a range. of elements or compounds rather than publications and The range for all substance properties (experimental and in the scientific literature to better understand, assess, and clarify predicted) can be specified from the “Property” search option in the state and impact of information in the chemical sciences. SciFinder; however, this is not limited to the substance set, but fi all substances within the database. Refining the substance sets by The assignment of speci c bioactivity and target indicators to specific property data can be an important factor for developing millions of substances in commercial and public databases drugs such as the drug-likeness of a substance outlined by provides an opportunity to identify gaps in drug design and ’ fi discovery. With the knowledge that such information is right at Lipinski s rule of ve (i.e., no more than 5 hydrogen donors, no fi more than 10 hydrogen acceptors, molecular weight less than our ngertips, it is hoped that scientists will become inspired to 500, and octanol−water partition coefficient less than 5)38 or the utilize and apply these searching tools and strategies to their rule of three for fragment-based drug discovery (i.e., no more research. than 3 hydrogen donors, no more than 3 hydrogen acceptors, molecular mass less than 300, and octanol−water partition 6. METHODOLOGY ffi 39 coe cient no more than 3). The SciFinder database is a product produced by Chemical Abstracts Service (CAS), a division of the American Chemical 3. APPLICATIONS IN TEACHING Society (ACS). The coverage and content in SciFinder is shown 12−18 fi It is recommended that subject librarians integrate research gap in Table 1. The Chemical Abstracts Plus (CAplus) le of training in their bibliographic instruction classes. This can be SciFinder contains millions of chemistry and other science- accomplished when demonstrating a search for a substance related research records. The registry numbers of substances are using SciFinder in the classroom, followed by analyzing the included in the records in the Chemical Abstracts file of articles substance set by specific bioactivity or target indicator. At this that employ or describe those substances. Each substance in the point in the class, the concept of the research gap could be CAS Registry file is linked to available literature references, introduced by asking students for potential ideas and reasons for spectra, experimental property data, commercial availability modifying the structural features in the resulting substance set to synthesis and reactions, CA index names, and regularity extend the research. That way, the research gap message is short information to the substance. CAS has further assigned and sweet, allowing students to critically think. From personal bioactivity and target indicators to substances for assessing − experiences, this type of approach has students engaged in their biological characteristics (Table 2).20 22 Bioactivity and creative brainstorming and sharing of ideas. target indicators are identified to substances in the CAS Synthesizing a new substance could involve something as REGISTRY database through documents in the CAplus simple as introducing or removing a methyl group into the database. These indicators are applied to a substance’s record known substance. Moreover, given that the experimental when at least one paper is indexed as attributing a specific procedure is already available for known substances, it could activity to that substance. All SciFinder searches in this study just be a matter of adding or changing a simple group in the were performed between July and September 2018. reactant that would result in a new substance. Introducing the Titanocene dichloride was used to compare the substance research gap is especially useful to chemistry students who are counts from an exact structure, substructure, and similarity required to write a research proposal as part of their degree (score of 90 as cutoff point) search. To this regard, from the program or come up with a quick project idea for a synthetic “Explore” menu in SciFinder, the “Substance Identifier” option organic or inorganic laboratory class. query was selected. The name of the substance, titanocene dichloride, was typed in and searched. Titanocene dichloride 4. FUTURE WORK was moved to the chemical editor by hovering over the chemical structure and selecting “Click for more options” (double arrow), Subscription-based resources such as SciFinder are seamless, “ ” “ user-friendly, and maintain high-quality control with the followed by selecting Explore by Structure and Chemical Structure.” This copied the substance into the chemical editor information content. These features make it all the more suited fi to use with the novice student researcher. However, not all which in turn allowed it to be further modi ed to other information is available solely from one resource. In addition, substances, as shown in Tables 4 and 5. It is also possible to draw some resources contain data that are geared toward a specific the substances from scratch in the chemical editor or draw it science discipline. To this regard, other subscription-based using chemical drawing software such as ChemDraw and import 40 fi resources such as the Reaxys Medicinal Chemistry database it as a mol le. Starting from a name or molecular formula search, also contain millions of bioactivity data points and thus worthy followed by copying to the chemical editor can sometimes be 3 for future study. In addition, publicly accessible resources such easier, particularly with drawing more complex groups such as η 5 as PubChem41,42 and ChEMBL43 could be compared as they or η ligands as well as leading to fewer drawing errors. provide searching capabilities to large volumes of substances and By respectively selecting exact structure, substructure, and easy access to curated and annotated data. Comparing similarity, the database retrieved all relevant substances. The substance-based bibliometrics between commercially and “Analyze by” dropdown option from the “Analyze” menu publicly available resources would, therefore, be useful for allowed to select different options such as “Bioactivity further study.44,45 Future work could also explore the bioactivity Indicators” or “Target Indicators.” From the bioactivity and target indicators from the elements of the periodic table indicator, all substances classified as “Antitumor agents” were (elemental-based bibliometrics)46 and reaction chemistry selected. By clicking on the “Show More” option, followed by (reaction-based bibliometrics) for applications to drug syn- “Apply,” the user can select multiple indicators and retrieve the thesis. respective substances of interest for viewing.

92 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94 ACS Omega Article

A similarity search (score of 90 as cutoff point) for (8) Schummer, J. Scientometric studies on chemistry II: Aims and comparison analysis was performed for “stable titanocene,” methods of producing new chemical substances. Scientometrics 1997, “ ” − (C20H20Ti2), searched via Molecular Formula ,whereas 39, 125 140. titanocene monochloride, titanocene trichloride, and titanium (9) Barth, A. Chemical bibliometrics; Chemistry World, April 25, 2013. tetrachlorideweresearchedvia“Substance Identifier.” A https://www.chemistryworld.com/opinion/chemical-bibliometrics/ substructure search for comparison analysis was performed by 6099.article (accessed Aug, 2018). modifying titanocene dichloride in the structure editor and (10) Barth, A.; Marx, W. Stimulation of ideas through compound- selecting the substructure option. The Supporting Information based bibliometrics: counting and mapping chemical compounds for analyzing research topics in chemistry, physics, and materials science. provides screenshots of the steps involved with searching ChemistryOpen 2012, 1, 276−283. SciFinder. (11) Wang, H.; Yin, Y.; Wang, P.; Xiong, C.; Huang, L.; Li, S.; Li, X.; Fu, L. Current situation and future usage of anticancer drug databases. ■ ASSOCIATED CONTENT Apoptosis 2016, 21, 778−794. *S Supporting Information (12) Chemical Abstracts Service. SciFinder, What is SciFinder? 2018. The Supporting Information is available free of charge on the https://www.cas.org/products/scifinder (accessed Aug, 2018). ACS Publications website at DOI: 10.1021/acsomega.8b02201. (13) Chemical Abstracts Service. SciFinder, Database and content training documentation, 2018. https://www.cas.org/support (accessed Screenshots of methodology (explore by substance fi Aug, 2018). identi er; export structure to the chemical editor; (14) Chemical Abstracts Service. SciFinder, Training: Need to know- searching the chemical structure by exact structure, Structure searching, 2018. http://support.cas.org/training/scifinder/ substructure, or similarity; analyzing chemical structures; need-to-know-structure-searching (accessed Aug, 2018). analyzing structures by bioactivity and target indicators; (15) Chemical Abstracts Service. SciFinder, How to... Create a selecting bioactivity indicators; and selecting similarity substance answer, 2014. http://download.cappchem.com/Tutorial/ score (PDF) SciFinder-Substance.pdf (accessed Aug, 2018). (16) Chemical Abstracts Service, American Chemical Society.  fi ■ AUTHOR INFORMATION References CAplus-Worldwide coverage of many scienti c disci- plines all in one source, 2018. https://www.cas.org/support/ Corresponding Author documentation/references (accessed Aug, 2018). *E-mail: [email protected]. Phone: 657-278-2976. (17) Chemical Abstracts Service, American Chemical Society. ORCID CAplus-Pre-1907 coverage, 2018. https://www.cas.org/support/ Robert Tomaszewski: 0000-0001-6916-1265 documentation/references/capluspre1907 (accessed Aug, 2018). (18) Chemical Abstracts Service, American Chemical Society. CAS Notes REGISTRYThe gold standard for chemical substance information, The author declares no competing financial interest. 2018. https://www.cas.org/support/documentation/chemical- substances (accessed Aug, 2018). ■ ACKNOWLEDGMENTS (19) Chemical Abstracts Service. SciFinder training videos: Structure fi I thank the reviewers for their very insightful and intelligent searching, 2018. https://www.cas.org/support/training/sci nder/ suggestions with the manuscript. I also thank Marissa Medeiros structure-search (accessed Aug, 2018). (20) ZBChem News. Zusatzinfos zum SciFinderUpdate vom for help with designing the graphical . Dezember 2011, 2012. https://zbchemnews.wordpress.com/2012/ 01/05/zusatzinfos-zum-scifinder-update-vom-dezember-2011/ (ac- ■ REFERENCES cessed Aug, 2018). (1) Müller-Bloch, C.; Kranz, J. A framework for rigorously identifying (21) Garritano, J. R. Evolution of SciFinder, 2011-2013: New features, research gaps in qualitative literature reviews. Proceedings of the 36th new content. Sci. Technol. Libr. 2013, 32, 346−371. International Conference on Information Systems, Fort Worth, Texas, (22) Schenck, R. J.; Zapiecki, K. R. Back to the Future: CAS and the 2015. https://pdfs.semanticscholar.org/89e6/ Shape of Chemical Information To Come. In The Future of the History a54cbe7240488d88ce49b51fc83c7186d564.pdf (accessed Aug, 2018). of Chemical Information; McEwen, L. R., Buntrock, R. E., Eds.; ACS (2) Robinson, K. A.; Saldanha, I. J.; Mckoy, N. A. Development of a Symposium Series; ACS Publications: Washington, DC, 2014; Vol. framework to identify research gaps from systematic reviews. J. Clin. − − 1164, Chapter 9, pp 156 157. Epidemiol. 2011, 64, 1325 1330. (23) Chemical Abstracts Service. Personal communication, 2018. (3) Robinson, K. A.; Saldanha, I. J.; Mckoy, N. A. Frameworks for (24) Miller, M. A. Chemical database techniques in drug discovery. determining research gaps during systematic reviews. Methods Future Nat. Rev. Drug Discovery 2002, 1, 220−227. Research Needs Report No. 2. AHRQ Publication No. 11-EHC043-EF; (25) Willett, P.; Barnard, J. M.; Downs, G. M. Chemical similarity Agency for Healthcare Research and Quality: Rockville, MD, 2011. searching. J. Chem. Inf. Comput. Sci. 1998, 38, 983−996. https://www.ncbi.nlm.nih.gov/books/NBK62478/ (accessed Aug, (26) SciFinder. Similarity search overview, 2005. http://202.127.145. 2018). 151/siocl/scifinder/SF_Help/sf_only/tanimoto2.htm (accessed Aug, (4) Tomaszewski, R. The concept of the imploded boolean search: A case study with undergraduate chemistry students. J. Chem. Educ. 2016, 2018). 93, 527−533. (27) Currano, J. N. Searching by Structure and Substructure. In (5) Wagner, A. B. Searching coordination and organometallic Chemical Information for Chemists: A Primer; Currano, J. N., Roth, D. L., compounds in SciFinder. Issues Sci. Technol. Librarian. 2011, 64. Eds.; The Royal Society of Chemistry: Cambridge, U.K., 2014; Chapter − DOI: 10.5062/F4G44N6W http://www.istl.org/11-fall/tips.html. 5, pp 109 145. (6) Wagner, A. B. Searching inorganic substances in SciFinder. Issues (28) Ridley, D. D. Information Retrieval: SciFinder, 2nd ed.; Wiley: Sci. Technol. Librarian. 2011, 64. DOI: 10.5062/F4QJ7F79 http:// Hoboken, NJ, 2009. www.istl.org/11-winter/tips.html. (29) Köpf-Maier, P.; Köpf, H. Transition and main-group metal (7) Schummer, J. Scientometric studies on chemistry I: The cyclopentadienyl complexes: preclinical studies on a series of antitumor exponential growth of chemical substances, 1800-1995. Scientometrics agents of different structural type. Struct. Bonding (Berlin) 1988, 70, 1997, 39, 107−123. 103−185.

93 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94 ACS Omega Article

(30) Metabolism, Pharmacokinetics and Toxicity of Functional groups: Impact of Chemical Building Blocks on ADMET; Smith, D. A., Ed.; Royal Society of Chemistry: Cambridge, U.K., 2010. (31) Dörwald, F. Z. Lead Optimization for Medicinal Chemists: Pharmacokinetic Properties of Functional Groups and Organic Com- pounds; Wiley-VCH: Weinheim, Germany, 2012. (32) Kerns, E. H.; Di, L. Drug-like Properties: Concepts, Structure Design and Methods from ADME to Toxicity Optimization, 2nd ed.; Elsevier: Amsterdam, 2016. (33) Gómez-Ruiz, S.; Maksimovic-Ivanić ,D.;Mijatović ,S.;́ Kaluđerovic,́ G. N. On the discovery, biological effects, and use of cisplatin and metallocenes in anticancer chemotherapy. Bioinorg. Chem. Appl. 2012, 2012, 140284. (34) Loza-Rosas, S. A.; Saxena, M.; Delgado, Y.; Gaur, K.; Pandrala, M.; Tinoco, A. D. A ubiquitous metal, difficult to track: towards an understanding of the regulation of titanium (IV) in humans. Metallomics 2017, 9, 346−356. (35) Ndagi, U.; Mhlongo, N.; Soliman, M. Metal complexes in cancer therapyAn update from drug design perspective. Drug Des. Devel. Ther. 2017, 11, 599−616. (36) Boyles, J. R.; Baird, M. C.; Campling, B. G.; Jain, N. Enhanced anti-cancer activities of some derivatives of titanocene dichloride. J. Inorg. Biochem. 2001, 84, 159−162. (37) Gao, L. M.; Hernandez,́ R.; Matta, J.; Melendez,́ E. Synthesis, Ti (IV) intake by apotransferrin and cytotoxic properties of functionalized titanocene dichlorides. J. Biol. Inorg. Chem. 2007, 12, 959−967. (38) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 1997, 23,3−25. (39) Congreve, M.; Carr, R.; Murray, C.; Jhoti, H. A “rule of three” for fragment-based lead discovery? Drug Discovery Today 2003, 8, 876− 877. (40) Elsevier. Reaxys Medicinal Chemistry, Empowering medicinal chemists with accurate bioactivity data, 2018. https://www.elsevier. com/solutions/reaxys/who-we-serve/pharma-rd/reaxys-medicinal- chemistry (accessed Oct, 2018). (41) Kim, S.; Thiessen, P. A.; Bolton, E. E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B. A.; Wang, J.; Yu, B.; Zhang, J.; Bryant, S. H. PubChem substance and compound databases. Nucleic Acids Res. 2016, 44, D1202−D1213. (42) Kim, S.; Thiessen, P. A.; Cheng, T.; Yu, B.; Shoemaker, B. A.; Wang, J.; Bolton, E. V.; Wang, Y.; Bryant, S. H. Literature information in PubChem: Associations between PubChem records and scientific articles. J. Cheminf. 2016, 8, 32. (43) Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A. P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L. J.; Cibrian-Uhalte,́ E.;Davies,M.;Dedman,N.;Karlsson,A.;Magariños,M.P.; Overington, J. P.; Papadatos, G.; Smit, I.; Leach, A. R. The ChEMBL database in 2017. Nucleic Acids Res. 2016, 45, D945−D954. (44) Lipinski, C. A.; Litterman, N. K.; Southan, C.; Williams, A. J.; Clark, A. M.; Ekins, S. Parallel worlds of public and commercial bioactive chemistry data. J. Med. Chem. 2014, 58, 2068−2076. (45) Southan, C.; Varkonyi,́ P.; Muresan, S. Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds. J. Cheminf. 2009, 1, 10. (46) Tomaszewski, R. Unpublished work, 2017.

94 DOI: 10.1021/acsomega.8b02201 ACS Omega 2019, 4, 86−94