BI0409 Chemoinformatics Laboratory
Total Page:16
File Type:pdf, Size:1020Kb
BI0409 Chemoinformatics Laboratory BI0409 CHEMOINFORMATICS LAB MANUAL Offered to III YEAR B.TECH BIOINFORMATICS DEPARTMENT OF BIOINFORMATICS SCHOOL OF BIOENGINEERING SRM UNIVERSITY KATTANKULATHUR Sub. Code/Sub.Title: BI0409/ Chemoinformatics Laboratory Sem/Year: VII/IV Page No: BI0409 Chemoinformatics Laboratory Sub. Code/Sub.Title: BI0409/ Chemoinformatics Laboratory Sem/Year: VII/IV Page No: BI0409 Chemoinformatics Laboratory Ex no:1 Date: Knowledge about Chemical Databases Aim: To study & to analyze the information from the various chemical databases available on the World Wide Web. Description: Chemical data are highly complex and interrelated. Vast amount of chemical information needs to be stored organized and indexed so that the information can be retrieved and used. There are five major types of databases namely PubChem, ZINC, ChemBank, PDBeChem, ChemExper. Procedure: 1. Open web browser and type the web address of the required database. 2. Explore the database and analyze various information available in the database. 3. Use the tools provided by the databases. 4. Save the output into a separate folder. 1.PubChem Sub. Code/Sub.Title: BI0409/ Chemoinformatics Laboratory Sem/Year: VII/IV Page No: BI0409 Chemoinformatics Laboratory URL: http://pubchem.ncbi.nlm.nih.gov/ PubChem provides information on the biological activities of small molecules. It is a component of NIH's Molecular Libraries Roadmap Initiative. PubChem includes substance information, compound structures, and BioActivity data in three primary databases, Pcsubstance, Pccompound, and PCBioAssay, respectively. • Pcsubstance contains more than 69 million records. You can check the count of substance records as of today. • Pccompound contains more than 27 million unique structures. You can check the count of compound records as of today. • PCBioAssay contains more than 434,000 BioAssays. Each BioAssay contains a various number of data points. You can check the count of BioAssay records as of today. The Substance/Compound database, where possible, provides links to BioAssay description, literature, references, and assay data points. The BioAssay database also includes links back to the Substance/Compound database. PubChem is integrated with Entrez, NCBI's primary search engine, and also provides compound neighboring, sub/superstructure, similarity structure, BioActivity data, and other searching features. PubChem contains substance and BioAssay information from a multitude of depositors. You can check the PubChem data source status as of today. PubChem Substance Database The PubChem substance database contains chemical structures, synonyms, registration IDs, description, related urls, database cross-reference links to PubMed, protein 3D structures, and biological screening results. If the contents of a chemical sample are known, the description includes links to PubChem Compound. PubChem Compound Database The PubChem Compound Database contains validated chemical depiction information that is provided to describe substances in PubChem Substance. Users can perform a term/keyword search in a same manner as for substance database. In addition, the PubChem compound database also provides a chemical property search. Examples: Sub. Code/Sub.Title: BI0409/ Chemoinformatics Laboratory Sem/Year: VII/IV Page No: BI0409 Chemoinformatics Laboratory Molecular weight search: compounds have molecular weight between 100 and 200.Enter 100:200[mw] or 100:200[molecular weight] in the Search textbox and press the Go button. • XLogP search: compounds have XLogP between 2.3 and 2.4. Enter 2.3:2.4[xlogp] in the Search textbox and press the Go button. • Heavy atom count search: compounds contain 8 heavy atoms. Enter 8[heavyatomcount] in the Search textbox and press the Go button. Chemical property range searches: substances do not violate the "Lipinski Rule of 5". In the Chemical Property Search section: For the Molecular Weight (MW) range, type 0 and 500 in the from and to text boxes, respectively. or the Hydrogen Bond Donor Count (HBD) range, type 0 and 5 in the from and to text boxes, respectively. or the Hydrogen Bond Acceptor Count (HBA) range, type 0 and 10 in the from and to text boxes, respectively. or the XLogP range, type -5 and 5 in the from and to text boxes, respectively. Push the Go button in the top Search bar.BioActivity Analysis links for reported "Chemical Probes", "Active" compounds, and "Tested" compounds are shown with the icons. The home page of PubChem can be seen as follows: Output: Sub. Code/Sub.Title: BI0409/ Chemoinformatics Laboratory Sem/Year: VII/IV Page No: BI0409 Chemoinformatics Laboratory Query Examples: • Molecule synonym search Which substances have "methotrexate" as a part of their molecule name? Simply enter methotrexate in the Search textbox on the PubChem homepage or Entrez search page and press the Go button. You will get all substances with the synonym methotrexate and/or with any other keyword methotrexate. Or enter methotrexate [synonym] in the Search textbox and press the Go button. Note: the term in the brackets "[]", such as "[synonym]", is an index field name or alias. For more information about index searches, please see PubChem Indexes and Index Search. Which substances have "3'-Azido-3'-deoxythymidine" as their molecule name? Enter "3'-Azido-3'-deoxythymidine" (including the quotes) in the Search textbox and press the Go button. 2. ZINC URL: http://zinc.docking.org/ It is a free database of commercially-available compounds for virtual screening. Virtual screening (VS) is a computational technique used in drug discovery research. It involves the rapid in-silico assessment of large libraries of chemical structures in order to identify those structures that most likely to bind to a drug target, typically a protein receptor or enzyme. ZINC is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF). The home page of ZINC can be viewed as follows: Sub. Code/Sub.Title: BI0409/ Chemoinformatics Laboratory Sem/Year: VII/IV Page No: BI0409 Chemoinformatics Laboratory 3. ChemBank URL: http://chembank.broadinstitute.org/ ChemBank is a public, web-based informatics environment created by the Broad Institute's Chemical Biology Program and funded in large part by the National Cancer Institute's Initiative for Chemical Genetics (ICG). Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the ICG in collaborations involving biomedical researchers worldwide. We Can:- 1. Find Small Molecules • by substructure: Search compound collection by substructure (may be specified via a SMILES or SMARTS string, or drawn with JME Molecular Editor). Sub. Code/Sub.Title: BI0409/ Chemoinformatics Laboratory Sem/Year: VII/IV Page No: BI0409 Chemoinformatics Laboratory • by similarity: Search compound collection by similarity to a structure (may be specified via a SMILES string, or drawn with JME Molecular Editor). • using descriptors: Filter compound collection using calculated molecular descriptor values. • by assay: Find compounds scoring as "hits" in biological assays. • by function: Find compounds with known biochemical interactions, therapeutic uses, or molecular functions. • by chemist: Find compounds made by a particular chemist, or sold by a particular vendor. • by molecule name: Find compounds with a particular name, or containing a part of a name. • by user list The home page of ChemBank can be viewed as follows: 4. PDBeChem URL: http://www.ebi.ac.uk/msd-srv/msdchem/cgi-bin/cgi.pl Introduction The Chemical component dictionary service provides web access to the "Chemical Component Dictionary" of the wwPDB as this is loaded in the PDBe database at EBI. Sub. Code/Sub.Title: BI0409/ Chemoinformatics Laboratory Sem/Year: VII/IV Page No: BI0409 Chemoinformatics Laboratory How to search with PDBeChem There is a wide range of possibilities for searching and exploring the dictionary. • Code: This is the PDB 3 letter code for the ligand (i.e. ATP). You may also select the "like" operator with a wildcard expression ('*' means any characters and '.' means one character) For example *TP will match most triphospate ligands. • Molecule name: An expression or word that is part of any of known molecule name (standard name, common name, systematic name). The special character '*' matches a sequence of any characters and '.' matches any single character. Examples: 'amino' • Formula: An expression that sets range constraints for the number of atoms from each element. The value that you have to provide is of the form [<E><n>-<m> ]* where <E> is an element <n> is the minimum number and <m> is the maximum number that the element must appear on the formula . The order in which the elements are given is not important. For example if you want to find ligands that have more than 10 and less than 15 carbons, 3 nitrogens and one oxygen, you should give 'C10-15 N3 O1'.Other examples: - 'CL3 N0' find molecules with exactly 3 Clorines and no nitrogens • Non stereo smile: For structure based searches. By clicking on the edit button, a form appears that will allow you to specify a molecule or a molecule segment by using one of the three options: - Draw the molecule using the JME Molecular editor - Upload a standard chemical file like Mol2,Sdf,PDB e.t.c. in the JME editor. You may specify any file types and formats accepted by the CACTVS system - Give the standard code (i.e. ATP) of a ligand that already exists in the database in order to be loaded on the JME editor. After you load a ligand you may also modify it. For example if you are looking for ligands similar to ATP you may load ATP on the JME editor and then remove some atoms and bonds, keeping just the substructure you are interested in. As soon as a molecule or molecule segment is specified then you may use it to search the dictionary using one of the following operators: - contains: Find all the ligands that whose graph contains the molecule specified as a subgraph. Please be patient since this operation may take a few (2-4) minutes in the worst case.