Supporting Information

Chemical Information Literacy at a Liberal Arts College

George Greco

Department of Chemistry

Goucher College

1021 Dulaney Valley Road

Baltimore, MD 21204

1 Table of Contents

Course Syllabus 3-6 Handout 1: The Chemical Literature 7-8 Handout 2: Types of Papers 9 Handout 3: Citations 10-11 Handout 4: Secondary Sources 12-15 Handout 5: SciFinder 16-18 Handout 6: Free Public Databases 19-22 Handout 7: InChI and SMILES 23-25 Handout 8: Reaxys, Beilstein and Gmelin 26-27 Handout 9: Patents 28-32 Handout 10: Reading papers for current events 33 Handout 11: The publication process 34-37 Handout 12: Structure and sequence databases 38-41 Handout 13: Databases of Spectral and Thermodynamic Properties 42-43 HW #1: The Chemical Literature 44 SciFinder Assignment 45-46 HW #4: PubMed and PubChem 47 HW #5: SMILES and InChI 48 Current Events Assignment 49-50 Note about final exam 51

2 Chemistry 245 – Spring 2015 Chemical Information Literacy

Instructor : Dr. George Greco Office : Hoffberger 220 Phone : 410-337-6313 Email : [email protected] Class Meetings : Tuesdays 11:45– 12:35 Hoffberger 223 Office hours: Monday 11:00-12:30, Wednesday 1:30-3:00 or by appointment (e-mail me) Optional Text : Chemical Information for Chemists: A Primer edited by Judith N. Currano and Dana L. Roth. Published by RSC Publishing. ISBN 978-1-84973-551-3

Disclaimer: This is the first time we are offering this course. It is my first time teaching a course like this. I don’t expect everything to go 100% smoothly. Please be patient with me. Even if it is not perfect, you will learn something.

GoucherLearn : Please check Goucher Learn site for any assignments or information that I may need to post.

Grading Grades for this course will be computed as follows: Class Participation and classwork: 20% Homework Assignments: 25% Major Project (Annotated Bibliography): 25% Final Exam: 30%

Laptop Please bring a laptop to each class if you have one. If you do not have a laptop, consult the instructor. A tablet should be fine for most of what we will do in class, but I would recommend something larger than a phone.

Attendance and Promptness: Attendance is highly encouraged. Since this is a one credit course and many assignments and activities will be explained during class time, it is important to attend each class meeting. Also, part of your grade is based on classwork. In addition, it is expected you will be ready for class at the scheduled time and we will start class on time. This policy avoids the problem of having people wander in late and not being able to pick up the discussion.

Homework Assignments: There will be homework assignments given most weeks. Homework may be handed out, announced in class, or posted on Goucher Learn.

Major Project: Annotated Bibliography The final project will be to create an annotated bibliography covering either -The work of a particular chemist working in academia -A specific research topic

3 -A specific molecule, or class of molecules (including drugs)

You may choose a topic from any area of chemistry – one of the purposes of this course is to be sub-discipline independent. My goal is for you to find out what is going on in the field in an area of chemistry that interests you. If you choose an area of chemistry that I am not familiar with, I may solicit the advice of faculty members in that area when evaluating your assignment.

Specifically, an annotated bibliography is a list of references (with full citation information), and a summary (in your own words – not the published abstract) of what that paper is about. This would be the first step towards writing a review article about a particular topic. I would like your bibliography to contain a minimum of 20 references from the primary or secondary literature. If you come across a paper that you would like to look over, but we do not subscribe to that journal, let me know – in many cases I can get access through Johns Hopkins. At least four of your references should come from the past three years, and at least 12 of your 20 references should be from the primary literature.

This project will be due on Tuesday April 14 (I think we are all better off if it is not too close to the end of the semester).

Learning Objectives

By the end of this course, you should 1. Understand the structure of the scientific literature in order to interpret and evaluate published results, and follow a logical path of inquiry. a. Describe the structure of the chemical literature in order to understand how the results in a particular paper fit into the published scientific record b. Compare qualities of primary/secondary/tertiary resources in order to distinguish the type and purpose of research/information found in each resource. c. Describe reasons for citing the literature in one's own writing in order to develop conventional research and publication habits. d. Interpret citations to the literature in order to follow the citation back to the original source. e. Create citations to the literature using appropriate formatting and standard abbreviations in order to present citations in a consistent manner. f. Interpret journal abbreviations in order to find the original publication. g. Describe indexing/abstracting concepts (such as controlled vocabulary, unique identifier like CAS #, etc.) in order to understand the content and organization of a database h. Describe the purpose of different types of journal publications in order to understand different venues for reporting or describing research. i. Describe the purpose of different types of journal articles in order to understand different venues for reporting or describing research. j. Describe the purpose and parts of patents in order to relate intellectual property to chemical research.

4 k. Describe the purpose and implications of citation counts in order to illustrate one method of measuring scholarly impact. l. Compare methods of measuring the scholarly impact of publications/research in order to evaluate the potential influence or importance of information.

2. Develop search strategies, including using unique features of the chemical literature, in order to most efficiently obtain needed information related to publications. a. Perform appropriate search strategies within databases (author, topic, numeric, structure, etc.) in order to fulfill a particular information need. b. Perform appropriate search strategies for research articles in order to collect current and/or historical information on a particular topic. c. Perform appropriate search strategies for review articles in order to collect background information on a particular topic. d. Perform appropriate search strategies within handbooks, encyclopedias, treatises, and other reference works in order to collect background information on a particular topic. e. Perform appropriate search strategies for patents in order to identify intellectual property related to a particular topic, person, or organization. f. Use appropriate database tools to perform citation searches in order to gather information on the potential scholarly impact of a publication. g. Use appropriate database tools to analyze and refine literature searches (by topic, author, year, document type, language, etc.) in order to limit to more desired search results. h. Use appropriate database tools to analyze and refine substance/reaction searches (by structure, yield, steps, classification, etc.) in order to limit to more desired search results.

3. Develop search strategies, including using unique features of the chemical literature, in order to most efficiently obtain needed information related to data. a. Locate physical and chemical properties in secondary/tertiary sources in order to identify commonly accepted data on compounds. b. Locate physical and chemical properties in the primary literature in order to identify experimental values for lesser studied compounds. c. Locate syntheses for compounds in order to determine possible outcomes for particular reactants/reagents/catalysts and/or for synthesizing desired compounds. d. Locate spectra and spectral data in the literature in order to identify data for lesser studied compounds. e. Locate crystallographic data in order to identify data for known crystal structures.

4. Understand the scientific publication process (both formal and informal), especially as it relates to issues involving ethics and accountability, in order to be well informed managers, producers and consumers of information. a. Explain the general nature of the peer review process b. Understand what Open Access publication is c. Explain issues related to intellectual property (especially copyright) and publication

5 d. Explain issues related to “author’s rights” and publication e. Use RefWorks to manage citations and documents in order to incorporate the use of bibliographic management software into the research process.

Order of Topics: Topic 1: Introduction. Why this course? ; Structure of Chemical Information Topic 2: Primary Literature Overview; Types of Journals ; Types of Articles Topic 3: Citations  Why and How? Using RefWorks citation manager Topic 4: Secondary/Tertiary Literature. Review Articles, Compiled Data, Handbooks, Encyclopedias, “Comprehensive” works, Reactions, Syntheses Topic 5: Abstracting and Indexing resources: Chemical Abstracts/SciFinder, PubMed, PubChem Topic 6: Formulating an Effective Search Strategy: Searching by text, structure, and citations Topic 7: Patents Topic 8: Current Events. What kind of papers do we find in a current journal, and how do we go about reading them? Topic 9: Databases of information: Cambridge Structure Database for X-ray crystal structures, Protein Data Bank, Various spectral databases of NMR, IR, Mass spectra, UniProt for protein sequences, NIST Chemistry WebBook Topic 10: The Publication Process (peer review, open access, intellectual property)

Reminder: All students are bound by the standards of the Academic Honor Code, found at www.goucher.edu/documents/General/AcademicHonorCode.pdf

6 Chemistry 245 Goucher College

The Chemical Literature (A subjective view)

Flagship Journals in the US and Europe American Journal of the ACS

Wiley Angewandte Chemie, International Edition

Sub-discipline specific journals publishing important papers or less prestigious general journals

American Chemical Society ACS Central Science Organic Letters Journal of Organic Chemistry Inorganic Chemistry Organometallics Biochemistry Analytical Chemistry Journal of Physical Chemistry (A, B, and C) Journal of Medicinal Chemistry Macromolecules

Secondary Literature Chemical Reviews Accounts of Chemical Research Chemical and Engineering News

Royal Society of Chemistry (UK) Current: Chemical Communications Chemical Science (Inorganic)

Historical: Journal of the Chemical Society (Organic) Faraday Transactions (Physical)

Elsevier (for profit – available on Science Direct – 184 Chemistry journals) Bioorganic and Medicinal Chemistry Letters Tetrahedron Letters

7 Wiley Chemistry – A European Journal (Formerly Chemische Berichte) European Journal of Inorganic Chemistry European Journal of Organic Chemistry The EMBO Journal

American Institute of Physics Journal of Chemical Physics

American Society for Biochemistry and Molecular Biology Journal of Biological Chemistry

Other Journals in which you may find papers of interest

ACS Remaining ACS journals

RSC

Elsevier Analytical Chimica Acta Bioorganic and Medicinal Chemistry Coordination Chemistry Reviews Inorganica Chimica Acta Journal of the American Society for Mass Spectrometry Journal of Chromatography Journal of Magnetic Resonance Journal of Organometallic Chemistry Journal of Photochemistry and Photobiology Tetrahedron

Wiley Chemistry – An Asian Journal ChemBioChem ChemPhysChem Helvetica Chimica Acta

Thieme-Verlag Synthesis Synlett

Chemical Society of Japan Bulletin of the Chemical Society of Japan Chemistry Letters

8 Chemistry 245 Goucher College Types of Papers

Full Paper (Article) – Type of article that you learn to write as a lab report. Contains Abstract, Introduction, Results, Discussion, Experimental. All details are present. Usually result of a completed project.

Example: Greco, G.E.; Schrock, R.R. “Synthesis, Structure, and Electrochemical Studies of Molybdenum and Tungsten Dinitrogen, Diazenido, and Hydrazido Complexes that Contain Aryl-Substituted Triamidoamine Ligands.” Inorg. Chem. 2001 , 40 , 3861- 3878.

Communication – short paper that contains most important results. 2-3 pages in JACS , 4 pages in Organic Letters . Does not contain any section headings, all experimental details and characterization data are in the Supporting Information. The most important results coming out of academic labs are always published in communications. In the old days, a full paper with full experimental details would be published a couple of years later. Now, with electronic supporting information, the follow-up full paper is usually not published.

Example: Greco, G.E.; Gleason, B.L.; Lowery, T.A.; Kier, M.J.; Hollander, L.B.; Gibbs, S.A.; Worthy, A.D. “Palladium-Catalyzed [3+2] Cycloaddition of Carbon Dioxide and Trimethylenemethane under Mild Conditions.” Org. Lett. 2007 , 9, 3817-3820.

Comprehensive Review – a review of all of the papers that have appeared in the literature over a given time frame. Often not written by the most famous scientists in a particular field.

Example: Karkas, M.D.; Verho, O.; Johnston, E. V.; Akermark, B. “Artificial Photosynthesis: Molecular Systems for Catalytic Water Oxidation.” Chem. Rev. 2014, 114 , 11863- 12001

Retrospective Review/Highlight – Article written by a leader in a field who describes work that has been done in his/her lab (or a small number of similar labs) over the past several years.

Example: Nocera, D.G. “The Artificial Leaf.” Acc. Chem. Res. 2012 , 45 , 767-776.

Note – A paper in the style of a full paper, but the length of a communication. Represents a project of limited scope and importance. Example: J. Org. Chem . 2015 , 80 , 1214 - 1220

9 CHE 245 – Spring 2015

Citations

What are citations? When do we need to use them in a paper?

You need to reference other papers: a) In the Introduction – sets the stage for the research being described here by indicating previous related research done (both in your lab and others) b) In the Experimental when you use a previously published procedure to synthesize a compound c) In the discussion when you are comparing your results and interpretation to other results and interpretations of similar data. d) Make sure you cite the work of people who may review your paper!

Citations need to be in a different format for almost every journal

How do we keep track of all of our citations? And format bibliographies?

We use a Citation Manager. RefWorks is the citation manager that the Goucher library subscribes to and encourages all students and faculty to use.

Go to https://www.refworks.com/refworks

Sign up for a new account. You need to be at Goucher your first time.

Best feature of RefWorks is that you can import citations directly from a database, so you don’t have to type them in by hand.

RefWorks will then output a bibliography in whatever style you choose.

You may also download the Write N Cite add-on for MS Word so you can insert citations from your RefWorks library while typing a paper.

Homework Assignment

1. Sign up for a RefWorks account. 2. Search ACS Journals and find 4 papers written by Eric Jacobsen in 2014. 3. Add these papers to your RefWorks library. Go to Ref Works Help, and choose “Getting References into your Account” then “Importing from online data vendors” Choose ACS Website, and follow the instructions for downloading the citations as a .ris file, then importing the .ris file into RefWorks.

10 4. Prepare a bibliography of those 4 papers in both Organic Letters style and Biochemistry style. Turn in your bibliography. To add a new output style, choose ‘output style manager’ from the Bibliography menu. You can enter the output style you want into the search box, then it will come up in the list of output styles. Click on the green arrow to add that style to your Favorites. It will then come up as a style when you choose “Create Bibliography.”

11 Chemistry 245 – Goucher College

Secondary and Tertiary Sources

Secondary sources are accounts written after the fact with the benefit of hindsight. They are interpretations and evaluations of primary sources.

Tertiary sources consist of information which is a distillation and collection of primary and secondary sources.

• Dictionaries and Encyclopedias • Directories; • Fact books; • Indexes, abstracts, bibliographies used to locate primary and secondary sources; • Manuals; • Textbooks

I am not generally going to distinguish between secondary and tertiary sources. Review articles published in review journals are clearly secondary; Encyclopedias and textbooks are clearly tertiary, but there is a lot that can be classified either way.

This guide is NOT comprehensive. It lists SOME resources that I have found to be useful, and is limited to resources that Goucher students have access to.

Some resources are still available in print (What’s that?)

Without looking at the answers below, do you know where to find chemistry materials in the library? What is the call number range for chemistry? What level of the stacks do you go to?

Secondary/Tertiary sources fall into the following general categories: 1. Review articles published in journals (discussed previously)

2. Edited books. Books are written on a topic, and different researchers write chapters about their area of research. Quality of content in edited books is generally uneven. Individual collection of reviews rather than an well-organized set of chapters. Authors are frequently not leading authors in the field, the review process is less rigorous, and chapters are written based on author’s availability. As a result, the book as a whole frequently does not cover all of the topic.

3. Methods publications – focus on detailed procedures for carrying out specific reactions

Organic Syntheses (www.orgsyn.org)

A Publication of Reliable Methods for the Preparation of Organic Compounds. All procedures and characterization data in OrgSyn are peer-reviewed and checked for reproducibility in the laboratory of a member of the Board of Editors. Editor-in-Chief = Rick Danheiser (MIT)

12

Look up: Iridium-Catalyzed Enantioselective Allylic Vinylation with Potassium Alkenyltrifluoroborates James Y. Hamilton, David Sarlah, and Erick M. Carreira Org. Synth. 2015 , 92 , 1

Look at how detailed the procedure is – would you be able to go into the lab and do this?

Inorganic Syntheses . Similar to Organic Synthesis but for inorganic compounds. Goucher does not have online access (Wiley), but we have all copies up to the current volume (36) in print. (545.9 I58) Comes out every 3-4 years.

Methods in Enzymology . Preparation and assay of enzymes 612.015 C71

A note on Dewey Decimal call numbers: 500s = pure science; 600s = applied science.

Medicinal Chemistry = 615.1 Rest of Medicine = 610-616 Environmental Chemistry = 628

This website from the University of Illinois at Urbana-Champaign offers a nice summary of the Dewey Decimal Classification System http://www.library.illinois.edu/ugl/about/dewey.html

Here’s a page that has more info about Dewey Decimal Call Numbers: https://www.oclc.org/dewey/resources/public.en.html

Shakashiri Chemical Demonstrations 540.7 S143. (1983). 4 volumes. Should be aware of it if you are looking for cool stuff to do with CHem Club.

4. Book series about reagents or reactions

Fieser’s Reagents for Organic Synthesis . 27 volumes. We still get them. 547 F26r.

Organic Reactions . Each chapter of Organic Reactions is devoted to a particular organic chemical reaction, and chapters provide exhaustive coverage of literature work in the form of a tabular survey of known reactions. Mechanistic and experimental details, including the scope and limitations of each transformation, are also included. 85 volumes. We still get it in print, we don’t have online. 547 O681.

Handbook of Reagents for Organic Synthesis. 5 volumes, 1999. 547.2 H236

Larock: Comprehensive Organic Functional Group Transformations. 2 nd edition (1999). Single volume. Over 2500 pages. 547.2 L328c

13

Li Name Reactions (2006). Single Volume. 652 pages. 547.2 L693n. Other similar books exist.

5. Encyclopedias and Handbooks .

Macmillan Encyclopedia of Chemistry (1997) 540.3 M167. – 4 volumes. Written like an encyclopedia with A-Z entries and readable articles.

Merck Index . You are already familiar with this from other courses.

Dean’s Analytical Chemistry Handbook, 2 nd Edition 2004. 543 P311d. Survey of different analytical methods.

Shugar and Dean: The Chemist’s Ready Reference Handboo k. Methods and Data 543 S562c, 1990.

6. Books/Web sites that primarily contain data about various compounds

Lange’s Handbook of Chemistry (16 th edition, 2005) 540 L274h. 4 Sections: Inorganic, Organic, Spetroscopy, General Information & Conversion Tables. Over 1500 pages of data (physical and chemical properties).

Solubilities of Inorganic and Organic Compounds (1963). 541.34 M89. 4 volumes. Lots of data.

IUPAC-NIST solubility database - http://srdata.nist.gov/solubility/ (free online database)

CRC Handbook of Chemistry and Physics. Lots of physical constants and other data

CRC Handbook of Physical Properties of Organic Compounds 547 H236. 1997. Single Volume. Contains log P values.

A couple of resources that we don’t have access to at Goucher, but you should know about if you move on to a research university or go down to Hopkins.

“Comprehensive” series, published by Elsevier Each title is a ~10 volume set, well organized to give an introduction to certain areas of chemistry. Some are up to their 3 rd editions now. Table of Contents can be browsed on Science Direct. Comprehensive Organic Synthesis Comprehensive Heterocyclic Chemistry Comprehensive Medicinal Chemistry Comprehensive Organometallic Chemistry Comprehensive Coordination Chemistry

14

Patai – Chemistry of functional groups. Originally published as books, now online through Wiley. Goes through all of the organic functional groups that you’ve heard of and many that you’ve not heard of.

Elsevier now has subject portal of major reference works that can be searched for free. See:

Reference Module in Chemistry, Molecular Sciences and Chemical Engineering http://www.sciencedirect.com/science/referenceworks/9780124095472

Reference Module in Biomedical Sciences http://www.sciencedirect.com/science/referenceworks/9780128012383

15 CHE 245 – Goucher College SciFinder

What I covered in the SciFinder video 1. What is CAS 2. How to get an account on SciFinder 3. How to search for references by topic 4. How to refine search results by language or document type 5. Sorting search results by number of citations or analyzing by author name 6. How to search for references by author name 7. Abstracts of ACS meeting papers as a document type 8. Clicking on “citing” to get list of papers that cite a particular paper 9. Searching by structure (drawing a structure in the structure editor) 10. Output of structure searches: References, Reactions, Commercial sources 12. Refining references – just preparation for example. 13. Getting useful information out of the abstract even if you don’t have access to the original paper. 14. Using the reactions database to get list of reactions that use a particular compound from a particular document. 15. Information that is present in the reactions database. 16. Clicking on “Full Text” to go to the full text for papers that we have access to. 17. Substance detail to get information (properties, spectral data) about a substance (both predicted and experimental) 18. Saving search results by exporting to pdf.

What is it? The Web service that searches the databases produced by CAS plus the Medline database.

What is the Chemical Abstracts database? Basically 2 things: 1. A collection of the abstract of every paper (over 37 million) that has ever been written in every journal even remotely related to Chemistry (over 10,000 titles)

2. A record of every known chemical substance (currently over 90 million). “CAS is the only organization in the world whose objective is to find, collect and organize all publicly disclosed substance information.” Each substance is assigned a unique number (called a CAS registry number)

For more information, visit http://www.cas.org/content

How do I access it?

Security is very tight. You need to be on the Goucher network (either on campus or via VPN)

16 1. Go to the following Web site and register.

[Your chemistry professor can provide you with the link]

2. CAS will e-mail you an additional confirmation code. You need to click on the link in the e-mail to accept the terms within 48 hours.

3. Once you have registered, the Web site is scifinder.cas.org.

Classes of Searches

1. References A. Research Topic – Note refine tab on right. Clicking on Full Text will provide full text (use Web-based resources) if we subscribe to the journal. Unique to SciFinder is the ability to click on Get Substances or Get Reactions after doing a topic search.

B. Author Name

2. Substances (Chemical Structure) – use the structure editor, then we can only do Exact Search, not substructure searches.

Output: a. Substance Detail – CAS number, and list of experimental and predicted properties including spectra.

b. References. Can be refined further.

c. Commercial sources, including links to price and availability.

3. Reactions. Allows you to see what exact reactions that are done with a particular compound, including reaction conditions. Especially useful if our library does not have original paper.

Saving search records

Use Export link instead of Save link to keep an offline record of your results. I choose PDF under Offline Review under For: , and choose how much of the abstract you want saved if any.

Homework #3:

(None of this is written to be turned in, but please do it!)

17 1. Follow the instructions on this handout to create an account on SciFinder (if you don’t already have one), log in, and make sure it works.

2. Watch the Panopto video on Goucher Learn that I made about SciFinder. It is about 30 minutes long. The first 30 minutes of the entire 60-minute video that is posted consists of a librarian talking about RefWorks – you can skip that part and start watching at 30:00.

3. If you are interested, go to www.cas.org/about-cas and watch the 5 minute video tour of the CAS data center. I think it is neat.

CAS now has a number of short, helpful videos on their website. See the “Need to Know” videos on this webpage: http://www.cas.org/training/scifinder

18 CHE 245 – Goucher College

Free Public Databases: PubMed, PubChem, and ChemSpider

PubMed http://www.ncbi.nlm.nih.gov/pubmed

The search engine for the MEDLINE database, maintained by the National Library of Medicine. PubMed comprises more than 24 million citations for biomedical literature.

Click on PubMed Quick Start Guide for help.

This engine does text-based searches – can search by topic, author, keyword.

Example: Let’s search for L67, which is the name given to the compound that I have been working with in research.

You have to know what you’re looking for, but the original paper publishing the structure can be found (Cancer Research, 2008, Reference 7).

Click on the paper title to open the page. The abstract will be displayed. If you look at the top right of the screen, there is a section entitled “Full Text Links”. This gives you some options for accessing the full text of the article. All articles will have a link to the journal publisher. Some articles will also have a link to “PMC Full Text.” This is very useful, especially at Goucher, where we don’t have subscriptions to very many journals. According to the NIH public access policy, the public should have access to the results of studies carried out with NIH (taxpayer) funding. Authors who carry out research on NIH grants are supposed to deposit a copy of the final accepted version of the article on PubMed Central a certain length of time after it is published (usually either 6 months or a year – it is a year for ACS journals). Click on it to see how it looks. You get the article in html format, but if you want the pdf version, there is a link on the top right.

If you do a search on “Jacobsen, Eric N.” you will get the papers that you found for a previous assignment. Note that papers published in 2014 in ACS journals are not available, but older ones are. This is also a good way for us to get older Angewandte Chemie articles (waiting period is 2 years), since we don’t subscribe to Angewandte .

There are all kinds of rules about when papers can go onto PMC and these rules are evolving all the time. They govern whether the Author needs to deposit an Author copy, or whether the journal will deposit a copy of the actual publication after the requisite waiting period.

Another great feature of PubMed searches is the “Related Citations in PubMed” section on the right side. Based on my perusal of the listings, these really are the most relevant articles – the ones you would want to read. Related citations are based on both papers citing the same references in their bibliographies. Note also that review articles are given a

19 special logo so you can pick them out quickly. You can also find all of the articles that have been deposited in PubMed Central that reference the article that you are looking at. Not a comprehensive list (obviously), but pretty good as a place to start looking for information.

What is PubChem?

PubChem, released in 2004, provides information on the biological activities of small molecules.

PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system. These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides a fast chemical structure similarity search tool. More information about using each component database may be found using the links in the homepage.

Links from PubChem's chemical structure records to other Entrez databases provide information on biological properties. These include links to PubMed scientific literature and NCBI's protein 3D structure resource. Links to PubChem's bioassay database present the results of biological screening.

To Search: 1. Go to https://pubchem.ncbi.nlm.nih.gov/ 2. Try the new PubChem search 3. There are five categories that you can search: Compounds, Bioassays, Bioactivities, Targets, and Patents 4. When searching patents, you must put phrase in quotes “Human DNA Ligases” or else you will search for Human OR DNA OR Ligases, and get 52,000 hits. 5. To search for a compound by structure, choose structure, then click on the cyclohexane ring to open the structure editor.

Search for Aspirin. As an approved drug in clinical use, there is an incredible amount of information presented, and much of it is very useful. However, note that this information had to be deposited in PubChem. You can’t find information on synthesis, and even commercial sources are limited to vendors who have deposited their catalogs with PubChem (Alfa and Acros Organics (Fisher Scientific) are missing. This is not the place to go to find references in the literature.

Search for the following compound, which we call L67.

HO O

Br NH N + O NH N - O H3C L67 Br

20 This compound was first disclosed as a DNA ligase inhibitor in a 2008 Cancer Research paper, but you can’t find it. I have published its synthesis, but you can’t find that either. We originally purchased it from ChemDiv, and it is also sold by Specs, but you can’t find any of that. You also can’t find a CAS number because CAS and the NIH don’t get along.

Bottom Line – a public government-run database will give you some information, but not everything you get in a subscription database like CA.

ChemSpider – similar to PubChem, but run by the Royal Society of Chemistry (UK) www..com “A structure-centric database” providing fast access to over 34 million structures, properties, and associated information. By integrating and linking compounds from ~500 data sources, ChemSpider enables researchers to discover the most comprehensive view of freely available chemical data from a single online search. ChemSpider builds on the collected sources by adding additional properties, related information, and links back to original data sources. ChemSpider offers text and structure searching to find compounds of interest and provides unique services to improve this data by curation and annotation, and to integrate it with users’ applications.

Just like PubChem, it will give you information about the compound (usually caclucated), but it is not giving you a comprehensive list of literature references, and for L67, the patent doesn’t come up either. Any database is only as good as the information contained in it, and CAS is the best.

Some unique features: 1. In addition to being able to draw structures (Make sure Java is up to date on your computer) ChemSpider can import and recognize structures even if they are saved as pictures (.png, .jpg, or .gif), and not as “chemically intelligent” output files from a structure drawing program (.cdx, .mol, or .sdf).

2. ChemSpider Synthetic Pages. This is the “social media” alternative to Organic Syntheses. People can “publish” experimental procedures for carrying out a procedure. It is not edited, checked, or peer reviewed, but other chemists have the opportunity to post comments. You can search by text or by structure for either desired compounds or reagents. Right now, the database is not very comprehensive because there are only 453 articles, but if this catches on, it can be very useful. I had no idea it existed until I was preparing this lesson.

3. Optimized for mobile devices. There are apps for iOS and Android, and it claims that Web interface is optimized to work well on all devices (which SciFinder does not).

A word on structure drawing programs .

1. Chem Draw. The gold standard – draws the best looking structures, is easiest to use, and has the most features, but it is expensive – costs $290 for academics. They claim to have

21 student discounts. Works on PC or Mac, and there is now an iPad App for $9.99 (requires iOS 7). https://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemDraw/

2. Accelrys Draw http://accelrys.com/products/informatics/cheminformatics/draw/ This is free (for students, faculty, and academic researchers), but only runs on PC. Directly outputs .mol files which can be read by ChemSpider

3. ChemSketch http://www.acdlabs.com/resources/freeware/chemsketch/ Also free (for educational and home use), I like the interface better than Accelrys Draw, but it is also Windows only. Can save file as a .mol file to open in ChemSpider or AccelrysDraw.

4. MarvinSketch http://www.chemaxon.com/products/marvin/marvinsketch/ The only free structure drawing program that runs on a Mac. You need to register though. Quality is not as high as other programs. Can choose to save output as a .mol or .cdx file.

22 CHE 245 – Goucher College Machine readable representations of structures:

How is structural information eventually converted into the 0’s and 1’s of computer code?

CAS numbers – unique numbers that are assigned to each structure. Problem is that CAS number contains no structural information. You can’t learn ANYTHING about the structure from looking at a CAS number. Also, they are proprietary, so there are restrictions on their use in other databases. According to CAS, “A User or Organization may include, without a license and without paying a fee, up to 10,000 CAS Registry Numbers or CAS RNs in a catalog, website, or other product for which there is no charge. The following attribution should be referenced or appear with the use of each CAS RN: CAS Registry Number is a Registered Trademark of the American Chemical Society . CAS recommends the verification of the CAS RNs through CAS Client Services SM .” Special permission is needed for more than 10,000 records.

SMILES - Simplified molecular-input line-entry system. The modern equivalent of condensed structural formulas. You can search by SMILES and INChI in SciFinder (as Substance Identifier) and in Google Scholar.

The end user does not need to know how this works, but I think it is worth going into the basics in this class.

Simple molecules are same as condensed structural formulas, but don’t write in any hydrogens.

Ether: CCOCC

Double bonds are represented with an = sign, and triple bonds with a # sign.

Acetonitrile: CC#N

The fundamental problem is, how do you convert a ring into a line of text? The basic solution is to open up each ring into a straight chain, and use a number to indicate the presence of a ring.

Let’s look at some examples.

Piperidine: C1CCNCC1 The number 1’s indicate that those carbons are connected in ring #1.

Aromatic rings get lower-case letters. Pyridine: c1ccncc1 Pyrrole: c1cc[nH]c1 Hydrogen on heteroatoms in rings need to be explicitly shown.

23 Bicyclic rings: Naphthalene

H3C CH3

Break rings so as to get a totally linear molecule if possible. c1ccc2ccccc2c1

Caffeine:

O O H3C H3C

CH 1 CH N 3 NH 3 N N

N N HN O N O 2

CH CH3 3

CN1C=NC2=C1C(=O)N(C(=O)N2C)C

@ symbols are used to indicate stereochemistry (I won’t go there).

InChI – IUPAC International Chemical Identifier This is a code! If you have any interest in cryptography, this is cool!

I will develop rules, using diethyl ether as an example.

InChI string: InChI=1S/C4H10O/c1-3-5-4-2/h3-4H2,1-2H3

What?!?

Every string starts with InChI=1S. The 1 stands for version 1, S is for standard. Next comes a slash (/). Next comes the molecular formula (C,H, then alphabetical). / Next comes the connectivity of the non-hydrogen atoms. Each atom must be assigned a number (see below). For rings, the same number is indicated twice at the beginning and end of the ring. Syntax: c=connectivity / Next comes information about # of hydrogens on each atom. Small h starts this section. Start by listing all atoms with 1 hydrogen (in numerical order) separated by dashes, followed by letter H(no space), then atoms with 2 hydrogens followed by 2H, then atoms with 3H. If multiple atoms in order with same number of hydrogens, then put a dash between atoms

Decoding ether:

24 The connectivity of the atoms is atoms 1 (carbon), atom 3, atom 5 (O), atom 4, atom 2. Atoms 3 and 4 have 2 hydrogens, atoms 1 and 2 have 3 hydrogens.

Simple rules for numbering: 1. All of the carbon atoms get the lowest numbers. So atoms 1-4 are carbon, 5 is oxygen.

2. Carbon numbers will increase in order of how many non-hydrogen atoms that carbon is bound to. So the terminal carbons are 1 and 2, the internal carbons are 3 and 4.

3. When there is a need to break a tie (between CH2’s for example), the one furthest from the heteroatom will get the lower number.

Try methyl tert-butyl ether, then caffeine.

Getting Information out of SciFinder even if you don’t have access to full text.

Search for the following CAS number: 14892-97-8

The second entry for references is a 2008 paper from the Jordan Journal of Chemistry. We do not have access to this journal. But you can get a good sense of the chemistry reported in this paper by clicking on the green reaction flask on the right side of the record. It will bring up 17 reactions. You can get enough details to at least try to set up the reaction. For any particular reaction, you can hover over a reagent, click on the >> symbol on the top right, and one of the options is “Synthesize This.” Note: not every reference is in the reactions database.

25 Chemistry 245 – Goucher College

Reaxys, Beilstein, and Gmelin

These are resources that we do not have access to at Goucher, so my coverage of them will be minimal, but I need to at least mention them for sake of completeness.

Reaxys Database (http://www.elsevier.com/online-tools/reaxys ) This is a subscription database published by Elsevier (Science Direct journals) which contains similar current information as SciFinder, although the number of articles with deep indexing in Reaxys is much smaller than SciFinder. For detailed information, see: Reaxys FAQs compiled by David Flaxbart, Chemistry Librarian at University of Texas at Austin http://www.lib.utexas.edu/chem/info/reaxys.html. We only need to subscribe to one of them. I have never used Reaxys.

Reaxys also contains the Patent Chemistry Database.

Historical: Reaxys database is the current home of the digitized version of Beilstein’s Handbuch der Organischen Chemie (covering literature from 1881-1980) and Gmelin’s Handbuch der Anorganischen Chemie (covering literature from 1817-1975).

About Beilstein and Gmelin These were both multi-volume print books that took many library shelves, similar to the original print version of Chemical Abstracts. Unlike CA, which organizes information by paper in the literature, Beilstein and Gmelin organize information by chemical substance (acyclic, cyclic, and heterocyclic compounds). There would be an entry for every chemical substance. Important information about each substance would be included in the handbook, so you would not have to go back and look at each original paper, but it would give the references for the original papers. I don’t think there are any organic chemists over the age of 40 who haven’t searched print Beilstein.

Beilstein covers organic compounds, and organometallic complexes containing Group I or Group II metals. Gmelin covers all other organometallic complexes. See See: http://www.lib.utexas.edu/chem/info/beilstein.html and http://www.lib.utexas.edu/chem/info/gmelin.html

The original handbook (Hauptwerk) covered all known compounds as of 1910. The problem with this sort of handbook is that it is out of date as soon as it is published, because new substances are being created, and new papers are being published using existing substances. There were a total of 4 editions published, and each edition was updated through the publication of supplements. Compounds in supplements were incorporated into the next edition. Information from the Hauptwerk was not duplicated in the supplements. The 4 th edition covered the literature through 1959, the Fifth supplement to the 4 th edition covered the literature from 1960-1979, but it was never completed, and when the last print volume was published in 1998, it was almost 20 years out of date. Clearly this method could not keep up with the information age.

26

Both handbooks were published in German until the early 1980’s when they switched to English (Beilstein 5 th supplement). Up until the late 1980s, a chemistry major at Duke was required to take German in order to read the older literature

27 Chemistry 245 – Goucher College

Patents

(Most material for this handout is taken from Chapter 3 of the Currano and Roth textbook).

What is a patent? An exclusive right granted by a government to an inventor for a limited period of time, typically 20 years to prevent others from making, using, selling, or importing a product or process without the owner’s permission.

An inventor must prove that invention is Novel Non-obvious Performs a useful (not necessarily practical) function.

An inventor must describe the invention in sufficient detail in order to enable a person having “ordinary skill in the art” to understand how the invention is made or used. For chemistry “ordinary skill in the art” implies an advanced degree.

A patent may be granted for any new, useful machine, composition of matter, product, process, or improvement thereof. You cannot patent abstract ideas, artistic works, business methods, scientific theories, discoveries of substances as they naturally occur in the world, and inventions that are detrimental to public order, good morals, or public health. The Supreme Court has recently decided that a company cannot patent a human gene sequence.

Secret ingredients and black boxes are not allowed.

Chemical patents cover new compounds, mixtures, pharmaceuticals, processes, and methods of making a compound, or improving the efficiency of a chemical synthesis.

Patents are granted by: A national patent office (US Patent and Trademark Office) or A regional patent office (European Patent Office).

Why should you care about patents? Most novel chemistry that is discovered in industrial labs (pharmaceutical, fine chemicals, commodity chemicals, petrochemicals, specialty chemicals) is disclosed in the form of patents, and not journal articles. What gets written up in papers are failures, and only some of them at that.

Searching for patent information The most usual reason that a search will lead you into the patent literature is because you are searching for a specific compound. The best search method is SciFinder, PubChem, or

28 ChemSpider. All of them will take you into patents, and SciFinder will give you the most complete information.

Example: Do a SciFinder search for patents authored by Richard R. Schrock. You will come up with 22 of them (as of 2015).

What does all of this jargon mean? PCT Int. Appl. (2012), WO 2012167171 A2 20121206.; U.S. Pat. Appl. Publ. (2012), US 20120302710 A1 20121129.

What is published now is a Patent Application. The US Patent Application is filed with the US Patent and Trademark Office. The USPTO has been publishing applications since 2001 – previously only granted patents (containing a patent number) were published. Applications are published 18 months after the first filing date. US patent applications are examined by the USPTO and a patent is issued where appropriate. The US has now joined the rest of the world in granting patents to the “first to file” where the filing date is the date of record.

US is the country code (obvious), 2012 is year, 302710 is application number, A1 represents the application type (initial application – I think the 1 has something to do with its position in a patent family, but I’m not sure of that). Remainder is publication date. See the “Patent Searching” page of this guide for more information about codes: http://library.stanford.edu/guides/patents

PCT Int. Appl. is the process by which patents are filed internationally. PCT stands for Patent Cooperation Treaty. According to the Paris Convention for the Protection of Industrial Property, inventors who wish to patent their invention abroad can file a separate application in each country, or file a PCT application with the World Intellectual Property Organization (WIPO), an agency of the United Nations based in Geneva Switzerland. There are 148 member nations in PCT (as of Feb. 2015). See http://www.wipo.int/export/sites/www/pct/en/list_states.pdf for a map. Applications filed under the PCT are not examined by the WIPO, and do not become patents. A patent application must still be filed in each country, which are examined by the patent offices of each country. Kind of like the Common App for college applications. WO is the 2-letter code for WIPO.

Download Reference 2 and Reference 14. To get the patent, click on the title of the document, then click on “Link to Other Sources.”

Patent Pak appears to be a new add-on that you can purchase for SciFinder that helps you find the information you need in a patent without searching the whole patent. PatentPak’s biggest benefit is being able to see within a patent where a particular chemical substance is located. With mega-patents (e.g. 44,000 or even millions of compounds in one patent), this can be a real time-saver.

29 Link to other sources will take you directly to the USPTO Web site, opening up the page for that patent. You can try to make some sense of what comes up in html format on this page. Good luck. My recommendation would be to click on Images, then the full pages button on the left side of the screen which will provide you wth the entire patent in pdf format.

Navigating your way through a patent Look first at the 1992 issued patent. It is shorter and simpler. A patent has the following general sections:

Front Page Drawings Specifications Claims

The Front Page has important information in a standard format that is designed to make sense to people who do not generally read the language that the patent is published in. The two digit codes (INID codes – Internationally Agreed Numbers for the Identification of Data) have meanings: 19 = country, 54 = title, 75 = inventors etc. Numbers are easier to recognize for people from countries that do not use an alphabet.

The 1992 patent does not contain any specific drawings, the 2012 application does (mostly x-ray crystal structures). But there are plenty of reaction schemes and chemical structures in the Specifications section.

The specifications section is VERY long (especially in the 2012 application it goes from page 12 to page 93). It contains all of the material you would expect in a full paper (albeit in modified language and organization), including complete experimental procedures and all characterization data (including all bond lengths and angles from crystal structures). Background, brief summary etc. It is a detailed description of the invention, including all “prior art”. What you would most likely need to get out of this section is an experimental procedure, and maybe a look at the figures. Don’t get bogged down with the text written in legalese (in this embodiment…). Patents are written as generally as possible to protect a large number of compounds in the same class, and many reactions used to make them.

Markush Structures

Structures that look like this are very common in patents. They are a generic representation of many possible individual structures. The R1 going to the center of the

30 ring implies that there are many different substituents protected (R1 instead of something specific), and those substituents can be at any position of the ring. The fixed R2 implies that a substituent must be present at that position, but we are not specifying what it is. X and Y are heteroatoms. Z rather than R4 implies something even more generic. The ()n around one of the carbons in the ring indicates that the ring size is also variable. You do need to indicate what R, X, Y, Z, and n can be in the Specifications.

Named after Eugene Markush who first tried this idea in 1924, and the Patent Office went for it, and it held up in court.

The Claims section is a brief description of exactly what is claimed in the patent. It is short and succinct, although still not particularly easy to read.

People involved in the patent process :

Patent Agent – files patents with USPTO (or other patent agency), but not licensed to practice law.

Patent Attorney – files patents and is licensed to practice law.

Patent Examiner – works for USPTO and examines patents (good job)

Other Patent Databases

Espacenet ( http://worldwide.espacenet.com/ ) International patent database produced by the European Patent Office. Contains 70 million patent documents, and searchable full text of all EP and PCT applications. Has translation capabilities.

Google Patents (yes, Google is into this too…) ( www.google.com/patents ) Full text US patent documents from 1790 to the present that is fully indexed and searchable. USPTO just has scanned images of old stuff.

PatentScope ( https://patentscope.wipo.int/search/en/search.jsf ) The database maintained by WIPO. Contains all PCT applications from 1978 forward as well as national patent documents.

SureChEMBL https://www.surechembl.org/search/ Also has structure searching for patents

The Lens : open public resource for innovation cartography http://www.lens.org/lens/ Includes blast sequence searching. Free.

31 Of the patent databases, EspaceNet and The Lens are much higher quality for searching patents than PubChem or ChemSpider. After SciFinder, they are the sources I would search next.

32 CHE 245 – Goucher College

Reading papers for Current Events.

Read the abstract first!

The Introduction will contain information about the background to this project. The last paragraph of the introduction is actually the most important! This is where the exact experiment(s) reported in this paper are summarized, and sometimes the results are previewed.

Skip the Experimental completely (unless you are reading paper for your own research project, in which case it is probably the most important part).

Start by reading the headings of the Results and Discussion. They are a roadmap to guide you through the section. Then try to at least match up figures to section headings. They will not always be on the same page! How carefully you read through the Results and Discussion depends on whether you are reading for the general gist of the paper, or if you are trying to critically analyze the results reported in the paper.

Questions to ask when reading a paper:

Basic Level: 1. What fundamental principle or novel compound have the authors demonstrated in this paper? 2. Are there any words that you don’t know the meaning of? Start a list! 3. Do you need to know the meanings of these words to understand the paper, or can you “skip them” and move on? 4. What techniques did the authors use in this paper? Are there any techniques that you have never heard of? 5. Look at each figure. What are they trying to show in each figure. Are they reaction schemes with structures? Photographs? Spectra? 6. What are the conclusions of the paper?

Digging deeper: 1. What previous work have these authors published in this area? 2. Are there any references leading to review articles? 3. Are there a lot of different people working in this area? 4. My undergraduate research adviser said that in order to get a paper into JACS, you have to be either the first or the best. They are either reporting something fundamentally new, or they are reporting a process for doing something that is better than anything that came before. Which one applies to this paper? 5. Does the data presented convince you that the authors came to the correct conclusion?

33 CHE 245 – Goucher College

The Publication Process

How is a paper prepared? Read the Instructions for Authors for your journal. Papers need to be prepared in a template, and there are many rules relating to what constitutes acceptable science. For Organic Letters, the Instructions for Authors is 32 pages long! http://pubs.acs.org/paragonplus/submission/orlef7/orlef7_authguide.pdf For example, look at the requirements for characterization of new compounds beginning on page 17.

Look at the following guides: http://pubs.acs.org/page/4authors/submission/howtosubmit.html http://pubs.acs.org/paragonplus/submission/acs_step-by- step_guide_to_manuscript_submission.pdf for more information about how to prepare a manuscript and the submission process.

How is a paper submitted and reviewed?

Papers are submitted electronically to the Editor-in-chief of the journal. He/She is usually a pretty important chemist. The Editor-in-chief of JACS is Peter Stang, an organic chemist at the University of Utah.

The Editor-in-chief may be an interdisciplinary guy, but he still doesn’t know anything about many areas of chemistry that people submit papers in. The Editor-in-chief will then assign the paper to an associate editor. There are currently 27 Associate Editors for JACS. Their expertise represents all areas of chemistry. Some are as young as 35, some are over 70. There is one in China, one in Korea, one in Japan, one in France, one in Germany, and one in Saudi Arabia (although he was a professor in the US for many years). Organic Letters has 11 Associate Editors, and 8 of the 11 are outside the US. Authors may request an Associate Editor, but it is usually obvious which Associate Editor should handle a particular paper.

The Associate Editor is ultimately responsible for making the decision of whether or not to accept the paper. He/She will enlist the advice of reviewers who are experts in the field of the paper. Authors are asked to suggest reviewers for the manuscript, although editors also have some idea who to ask. Papers initially go out to 3 reviewers, reviewers get 2-3 weeks to review the paper. Options for reviewers are:

Publish as-is Publish after minor revision (usually asking for an additional experiment) Publish after major revisions Do not publish in this journal

34

A decision is made after 2 reviews come back if they agree. Otherwise they wait for the third, and sometimes a 4 th review.

Financial Model – Open access vs. Traditional Subscriptions

Publishing papers costs money, even for nonprofit publishers like the ACS. Someone has to pay those costs – either the authors or the readers. The move from traditional subscription to open access is happening in real time, and as scientists in 2015, we need to look at this issue a little deeper.

Traditional Subscription – anyone (or an institution) who subscribes to a journal pays an annual fee to access the content. Non-subscribers can’t read the article. The publisher holds the copyright for the article, and author’s ability to distribute personal copies of the article or post the article on their own Web sites is limited. Current policy for ACS journals is that authors may not post a published paper on their own Web site, but they get 50 free electronic reprints for the first year of publication. They receive a unique link that they can post on their Web site or distribute to anyone who requests it. After 12 months, there is unlimited access to papers through this link.

Individual Subscription: $99 per journal per year for electronic access. Print appears to no longer be an option at all.

Institutional (Library) Subscription: Goucher now subscribes to the ACS Core Plus Pack for Education. We are paying $21,510 for unlimited access to the 15 core journals, plus 150 downloads per year for other ACS journals. Before they introduced this core package in 2013, we were paying $32,000 for a subscription to all journals in 2012. As a point of reference, the cost of our Web subscription was $5000 in 2004 (but we still ordered print journals then, so we were paying about $19,000 total).

Open Access – Author (or their funding agency) pays a fee for each paper to be published. Version of record is available for free to anyone in the world on journal Web site immediately upon publication.

Rationale: 1. Compliance with policies of funding agencies. The belief in the US (and other countries) today is that most research is funded with taxpayer dollars, and the public (or at least those who can read and understand a scientific journal article) has the right to access the results of that research.

As of 2015, the NIH, and the DOE both have policies that full text of articles must be deposited (on PubMed or PAGES), 12 months after the publication date. If you publish open access in an ACS journal, the ACS will deposit your paper in PubMed or PAGES. If you do not publish open access, you are responsible for doing so. The NIH and DOE allow you to use funds from your grant to pay for open access charges.

35 2. There is a lot of good science being done at institutions around the world that do not have funds to subscribe to all of the major journals. (We don’t have RSC, Wiley, Springer, or Thieme). Some authors want as many people as possible to be exposed to their research, and this is how you do it.

Debate: With traditional publication, institutions (universities, companies) pay for the cost of publication through the library. With Open Access, individual researchers, or their funding agencies pay the cost. Ideally, this is supposed to reduce library costs, and institutions are encouraged to support open access fees. It remains to be seen if this will happen. Problem is that the costs are prohibitive for people who don’t have large external research grants, which is becoming more and more common as external funding gets tighter and tighter.

Cost of Open Access Publication:

ACS journals (as of 2014) option is called ACS Author Choice:

ACS Member: $2000/article for immediate availability; $1000 for availability after 12 months. Reduced to $1500 and $750 if you are at a university that has a subscription to all ACS journals (not what Goucher currently has).

If ACS went 100% open access and eliminated subscriptions, and kept publishing charges at $2000, and the money that we currently put into journal subscriptions were put into an Open Access publishing fund, Goucher would end up way ahead, because the break even point would be if we published 10 articles/year and we don’t even come close as a department.

Look at this blog post: http://blog.chembark.com/2013/11/05/acs-expanding-open- access/

American Society for Biochemistry and Molecular Biology (publisher of Journal of Biological Chemistry ): $1500/article for members (immediate availability). Author can choose traditional or open access.

Nucleic Acids Research (published by Oxford University Press): All papers are open access – there is not an option to publish for free. Cost structure is designed so that it can be shared by institution and author. Author charges are $1385/article if the author’s institution is an NAR member, $2770 if not. An institutional membership is $4793 per year. So if authors at an institution intend to publish 4 or more articles per year in NAR , it makes financial sense for the institution to have a membership.

More on Open Access:

ACS Central Science - http://pubs.acs.org/journal/acscii - does not charge authors or libraries. ACS’ Editor Choice -http://pubs.acs.org/editorschoice/ - is not charging authors or libraries either. One article per day is being made openly accessible.

36

The ACS Advanced Search page - http://pubs.acs.org/search/advanced - the Access Content Type section allows you to limit searches to openly accessible articles.

PubMedCentral can be searched directly at http://www.ncbi.nlm.nih.gov/pmc/

CHORUS can be used to find OA articles - http://www.chorusaccess.org/

Because supporting materials from many journals are openly accessible, between the abstract, list of references, and the supporting materials readers can get a good sense of what an article is discussing even without a subscription to that journal.

37 CHE 245 – Goucher College

Structure and Sequence Databases

Cambridge Structure Database

What is it? A database that contains detailed information about the structures of small molecules that are obtained by X-ray crystallography, most importantly, a 3D Java-based structure viewer (jmol). It is hosted at Cambridge University in England.

X-ray crystallography is used to determine the connectivity of molecules, but also determine all of the bond lengths and angles. For example, by knowing exact bond lengths, you can get experimental evidence of whether a bond is single or double.

How to access: http://www.ccdc.cam.ac.uk/pages/Home.aspx

Anyone can access structures if they know exactly what they are looking for. You need to know the doi (digital object identifier) for the journal article that contains the structure you want to examine, but that is very easy to get. You need to be a subscriber to the database in order to be able to search for structures using the CSD’s search software.

Example: Enter the following doi into the Get Structures link: 10.1021/ic001123n

There are 4 structures in this paper (I solved all of them). Choose the structure you want to look at from the menu on the left (under Results).

You can play with the structure in the jmol viewer on the CSD site or you can download it (as a mol2 file) and open it in any program that opens mol files such as MarvinView on the Mac. To download, choose Open, then CSD entry in external viewer. If you are using the jmol viewer, you can view a single molecule, or the entire unit cell. To view the entire unit cell (which shows how different molecules pack together), choose Unit Cell from the Packing menu below the structure. You can also choose 3X3 (to view 9 unit cells), but there are so many atoms that I find it to be confusing. The blue buttons allow you to choose whether or not to view hydrogens (they are often uninteresting, and get in the way), to stop the spinning, or view many other things from the Menu button, including how the structure is displayed.

If you open the structure in Marvin View, go to View-Display-Ball and Stick to see it as a Ball and stick. If you go to View-Advanced-Bond Lengths, you can see the bond lengths. This paper is about Mo dinitrogen complexes, and the most important detail of the structure is the N-N bond length because it demonstrates how reduced the N 2 is.

38 UniProt

What is it? UniProt is a comprehensive, high-quality and freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature.

How to access it: www.uniprot.org

How do we use it? UniProt KB (Knowledgebase) consists of 2 databases: SwissProt (Currently contains 548,208 entries) – these proteins are manually annotated and reviewed (from a wide variety of organisms). These proteins have been characterized (where possible) and literature references are given. TrEMBL contains over 46 million entries. These entries are generated by computer based on finding coding sequences in genes. They are not reviewed. Many proteins have never been isolated and characterized.

There is a separate Proteomes database. (A proteome is the complete set of proteins in an organism – a much more difficult problem than genomics). Can search Proteome database by organism, and get (for example) the 68,511 proteins in the Human Proteome (as of Feb. 2015). It is organized based on which chromosome the gene encoding the protein appears on. You can (for example) get a list of all 5324 proteins encoded by a gene on Chromosome 1. If you look at the first 250, some will be well known (Caspase 9), others will simply be “Uncharacterized protein C1orf186.”

Let’s search for a specific protein – Human DNA Ligase.

It brings up over 20,000 entries. The one I care about is DNA Ligase 1. Every protein gets an Accession Number. This one is P18858. It is like a CAS number for proteins. Click on the accession number to bring up the record for this protein. Look at all of the information that is available! Function Names and Taxonomy Subcellular location Pathology and Biotech Post-translational modifications (phosphorylation of serine, threonine, and tyrosine residues) Expression Interaction Structure (link to X-ray structure in Protein Data Bank – 1x9n is ID number for X-ray crystal structure). Shows where α-helices and β-strands are based on crystal structure. Family & Domains Amino Acid sequences. Includes natural variants (where amino acids are varied) Cross-references Publications Entry information/Miscellaneous/similar proteins.

39

Under Function, you can find information about specific amino acid where the active site is, parts of the sequence that are known to interact with DNA, ATP or Mg 2+ .

Under Family and Domains, you can find a longer 10 amino acid sequence (449-458) that is known to interact with DNA. This sequence is the target of inhibitors under investigation in my lab. Clicking on it brings up the sequence (RLRLGLAEQS), but also opens it up in a BLAST window.

What is BLAST? Stands for Basic Local Alignment Search Tool. It is a method to find other amino acid sequences in other proteins that match a given amino acid sequence. A 10 amino acid sequence is a good length to do a BLAST on. I recommend checking the “Run Blast in a separate window” box, because it takes a little while, and you can keep working in your current tab. The complete Blast took 73 seconds, and came up with 59 hits (out of its database of 47 million sequences containing 15 billion letters.

Alignment shows where this sequence is found in other proteins. For example, in the first entry (Macaca fascicularis), this sequence occurs between amino acids 142 and 151.

You can also Align 2 proteins against each other. For example, Ligase 1 (P18858), and Ligase 3 (P49916). How different are they?

There are 198 identical amino acids (indicated with a *), and 295 similar positions (indicated with . or :). Note that these are not simply lining up amino acids – it identifies regions in the sequence of one protein that has no parallel in the other protein.

RCSB Protein Data Bank

What is it? An information portal to 108,263 Biological Macromolecular Structures. Hosted by the Research Collaboratory for Structural Bioinformatics (RCSB). Contains 3-dimensional structures from NMR and X-ray studies. As of now, there are over 35,000 distinct protein sequences, and about 20% of them are proteins bound to DNA. This is possibly the most important database in human medicine, because the traditional method of drug discovery is to find small (organic) molecules that can inhibit an enzyme or bind to a receptor. In order to know what size/shape molecule to design, and what functional groups to put on it, one needs to know the 3-dimensional structure of the protein of interest, especially the size, shape, and functionality present at binding sites.

How to access: www.rcsb.org/pdb/home/home.do

There is a text-based search box at the top of the screen, that can handle different types of search strings (protein name, author, Uniprot accession number)

40 For example: search for Triose Phosphate isomerase (one of the enzymes involved in glycolysis)

You will get 172 hits. The results are organized by organism (in order from greatest number of deposited structures) to least, experimental method (X-ray or NMR), the resolution for X-ray structures (lower is better), release date (among others).

You may not recognize the top two organisms, but they are both parasites. The first one causes African sleeping sickness, and the second causes Malaria. This is an essential enzyme in metabolism, and if we can find structural differences between the parasitic forms and the human forms, we can try to develop antiparasitic drugs. The structures appear below (organized by release date with the newest on top). Some of them are natural forms of the enzyme, some of them are engineered forms – X-ray can show how changing the amino acid sequence can change the 3-dimensional structure.

Choose Homo sapiens, and then the oldest structure (1WYI). Under the code are the 2 most relevant options: Download the pdb file (which you can then open in MarvinSpace), or click on the 3 color blocks to view the structure in jmol. You can use the mouse to rotate the structure. If you mouse over a particular part of the structure, it will bring up information about which amino acid you are pointing to. The right button brings up the standard jmol menu with many other options. For example, you can zoom in/out, or go to Style-Scheme-Ball and Stick to see all atoms. Choosing cartoon will take you back to the default view. If you just want to see the sulfur atoms, choose Select-Element-Sulfur. If you want to see just the side chains, choose Select- Protein-Side Chains. Similar for Acidic residues (Asp, Glu), or Basic Residues (Lys, Arg, His).

41 CHE 245 – Goucher College

Databases of Spectral and Other Data

Sigma-Aldrich NMR library

Sigma-Aldrich sells library of 1H and 13C NMR spectra of many compounds that they sell. 11,800 compound included. I have the CD-ROM, and a copy of it installed on the NMR computer.

Sigma-Aldrich Web Site

NMR spectra of many compounds are freely available to anyone on the Web. 1. Search for a compound (e.g. 4,5-diaminopyrimidine) 2. Click on a catalog number (D24501) 3. Scroll down to the Documents section near the bottom of the entry on the right side. 4. If NMR spectra are available, a link to FT-NMR will be there. The carbon spectrum is on top, and the proton spectrum is on the bottom.

Spectral Database for Organic Compounds (SDBS) Web site hosted by the Japanese National Institute of Advanced Industrial Science and Technology (AIST). http://sdbs.db.aist.go.jp/sdbs/cgi-bin/direct_frame_top.cgi

Contains 6 kinds of spectra: Mass, 1H NMR, 13C NMR, FT-IR, Raman, ESR.

Can search by usual search terms (Name, Formula CAS number). However, another useful way of searching is by NMR spectral peaks. You can specify a peak frequency and a tolerance. This will be most useful if there is an NMR peak at a somewhat unusual region. For example, search for a peak at 9.61 ppm with a tolerance of 0.05 ppm. This will return all compounds in the database with an NMR peak between 9.51 and 9.71 (117 compounds). Adding text to another search box will refine search further. For example, adding “acid” under name will bring up just carboxylic acids (or molecules that have the word acid in one of its names).

You can search by multiple 13C peaks. You can also search by ions in mass spectrum, but this is less useful because of fragmentation.

Once you find the compound that you are looking for, click on the “Y” under the spectrum type to bring up that spectrum. A menu with other types of available spectra is present on the left side of the screen. It is also useful to enter ranges of heteroatom (including zero to zero) in the search to focus results

42 NIST Chemistry WebBook http://webbook.nist.gov/chemistry/

Hosted by National Institute of Standards and Technology (NIST).

This database is of greatest interest to physical chemists. The following types of data are available:

a. General Information b. Gas Phase Thermochemistry Data c. Condensed Phase Thermochemistry Data d. Phase Change Data e. Reaction Thermochemistry Data f. Gas Phase Ion Energetics Data g. Gas Phase IR Spectra h. Mass Spectra i. UV/Vis Spectra j. Vibrational and Electronic Spectra k. Constants of Diatomic Molecules l. Henry's Law Data m. Gas Chromotagraphic Retention Data n. Thermophysical Properties of Fluid Systems o. References to the Literature

For example, search for Anisole. To find the links to the data, look under the heading “Other data available”. 9 types of data are available for this compound. If you click on that type of data, it will bring it up (or a plot of the spectrum in the case of MS or IR spectrum).

The NIST Data Gateway is also useful for identifying what other resources are available for free at NIST - http://srdata.nist.gov/gateway/

43 Chemistry 245 – Goucher College

Assignment #1 – Spring 2015

1. List 2 areas of interest to you within the discipline of chemistry. You may choose from one of the traditional subdisciplines of chemistry (Analytical, Biochem, Inorganic, Organic, Physical), or from an interdisciplinary topic such as synthesis, medicinal chemistry, materials science, polymers, bioorganic, nanotechnology, energy, computational chemistry etc. It will really help me to know what people are interested in

2. Go to the ACS journals Web site (pubs.acs.org). Choose 8 journals (there are currently 54 ACS journals), and provide the following information for each

1. Scope of the journal – what kind of chemistry is published in that journal (in your own words) 2. What kinds of articles are published? (Communications, full papers, notes, review articles etc.) 3. How frequently is the journal published? 4. Total number of articles published in 2013 5. Total number of citations 6. Journal Impact Factor

You can find most of this information from the Journal’s home page, which can be accessed from the main ACS publications page. For some of this information, you may need to click on the “About the journal” tab that appears on the right below the journal title near the top of the page.

44 Chemistry 245 – Goucher College

Exercise – Searching for information on SciFinder.

You may begin working on this in class, and complete it for homework.

1. The Arthur C. Cope Award is awarded annually by the American Chemical Society “To recognize outstanding achievement in the field of organic chemistry, the significance of which has become apparent within the five years preceding the year in which the award will be considered.” The 2014 recipient was Stuart Schreiber, a chemical biologist at Harvard, the Broad Institute, and the Howard Hughes Medical Institute.

Use SciFinder to locate a list of paper that Schreiber published in journals in 2014. Compile a list of all of the journals that Schreiber published papers in during 2014.

2. The following topics are considered hot topics in chemistry:

Cryogenic electron microscopy

Single-Walled Carbon Nanotubes

Graphene

Perovskite solar cells

Choose one of these topics. Use SciFinder to locate a recent (past year or two) review article that would be a good place to start learning more about the topic. Use SciFinder to find the name of a chemist who is a leading researcher in your chosen field.

3. Choose one of the following new drugs that was approved in 2014: olaparib – a PARP inhibitor for the treatment of ovarian cancer peramivir – a neuraminidase inhibitor for the treatment of influenza ceritinib – an anaplastic lymphoma kinase inhibitor for the treatment of lung cancer vorapaxar – a thrombin receptor antagonist given to patients at high risk for heart attack or stroke.

Do a topic search in SciFinder, and answer the following questions:

45 a. How many total references are there? b. What is the year of the oldest reference? c. Provide the citation information for the reference with the greatest number of citations. d. Name two investigators who have authored multiple papers about this drug e. Provide a reference for a paper in which you can find the preparation of this compound. f. Provide a reference for a paper in which you can find the NMR data for this compound. g. What is the calculated log P for this compound (you will have to look under the Lipinski tab).

4. Carry out a structure search on SciFinder to find a synthesis of the following molecule:

46 CHE 245 – Goucher College

Assignment – PubMed and PubChem

A. Go to the PubChem blog http://pubchemblog.ncbi.nlm.nih.gov/category/pubchem- explained/ and read the entries entitled “Ten Years of Service,” “Why Contribute your data to PubChem,” and “What is the difference between a substance and a compound in PubChem.” In this last article, follow the link to http://pubchem.ncbi.nlm.nih.gov/sources/ to explore how PubChem gets its data.

B. Answer the following questions:

1 Are there more substances or compounds in PubChem? 2. If you want to find out all of the information about a particular molecule, should you be looking for that molecule in the substance database or the compound database? 3. If Dr. Schultz and I independently published papers about the same compound, and submitted the data to PubChem, would there be two substance records? 4. Which source has deposited the greatest number of substances in PubChem? 5. What data source category deposits the greatest number of substances in PubChem (you don’t need to count them up – just estimate) 6. What is the only family of journals that deposits substance information in PubChem?

C. Use PubMed to find a paper describing an aromatase inhibitor that is available in full text on PMC. Your paper must include the chemical structure of the inhibitor. Download the pdf and e-mail it to me. Answer the following question:

What disease do aromatase inhibitors treat? Bonus: Try to find out the name of a researcher at the University of Maryland School of Medicine who discovered a popular aromatase inhibitor.

47 CHE 245 – Goucher College

Assignment – Structure drawing programs, SMILES, and InChI.

1. Draw a structure for the DNA base adenine like we did in class for caffeine. First the complete structure, then a structure in which the rings are broken for SMILES. Using your structure, come up with a SMILES string for adenine.

2. Come up with an InChI string for piperidine

48 CHE 245 - Goucher College

“Current Events”

We have spent the last few weeks studying how to search various databases for specific information. The other way that practicing chemists use the literature is to keep up on what is current in their field of interest.

We will spend a little bit of time engaging this type of learning. We will start by looking at the Table of Contents of a recent issue of JACS. Then we will choose a few articles (based on your interest), and learn how to read them so that you can figure out what the paper is about, and what the most important results are.

Assignment:

There are 16 Communications and 25 Articles in the March 18, 2015 issue of JACS. I am listing the starting page number for each article. Classify each Communication and article into its appropriate area of chemistry. You may choose one of the traditional areas of chemistry (Analytical, Biochemistry, Inorganic, Organic, Physical, or Materials chemistry) or you may use interdisciplinary classifications like Biophysical, Bioinorganic, physical organic, Medicinal, organometallic etc. Look at the abstracts if you find them helpful. If you have no idea, it is OK to leave a few of them blank. Then circle two papers that you would be interested in reading. .

Communications Starting Page Classification Starting Page Classification

3446 3478

3450 3482

3454 3486

3458 3490

3462 3494

3466 3498

3470 3502

3474 3506

49 Articles

Starting Page Classification

3510 3656

3520 3663

3525 3670

3533 3678

3540 3686

3547 3693

3558 3705

3565

3574

3585

3592

3600

3610

3616

3622

3631

3638

3649

50 Final Exam

I do not wish to make my final exam freely available to anyone who reads this paper. If you are an instructor of a course on chemical information literacy, and you wish to have a copy of the final exam, please send me an e-mail directly.

George Greco ([email protected])

51