Computing and Chemistry
YUH-MEI LIAO AND The Chemical Markup Language HAMID GHANADAN
CML is powerful, but its impact will depend disposal, you can manipulate the spectrum to optimize the information density, view on how many people use it. it at higher resolutions, print it, or even add it to your personal database. CML magine that you are searching the markup language that allows the creation accomplishes this by using tags to describe Internet for publications that include of unique tags for data identification and and preserve the integrity of chemical infor- Ithe compound 1-bromobutane. You go transmission. HTML, on the other hand, mation, making it a loss-less form of chem- to your favorite search engine and type merely formats information on the screen ical information transfer (2). The same “1-bromobutane”. Hopefully, your results (1). With CML, Web documents can be concept can be applied to chemical struc- will contain webpages with that exact tures, annotated chemical reactions, phrase. Your results, however, will not and other properties. include any documents that only use 1-bromobutane’s other chemical refer- Tools of the CML trade ences, such as n-butyl bromide or Peter Murray-Rust, Henry Rzepa, and
CH3(CH2)3Br. Your results are disap- Christopher Leach first presented pointing because your computer has CML at the 1995 ACS meeting in no way of discerning chemical infor- Chicago (4). Real progress came mation. To your Internet browser, when CML adopted XML as its base the letters “Br” in the molecular in 1997. Since then, more extensive structure CH3(CH2)3Br are no differ- development tools have been creat- ent than the same letters in the word ed. In 1999, for example, the first “Bravo”. This is a serious problem in formal set of CML specifications was hypertext markup language (HTML), published (5). Software programs to the current standard language for interpret CML are now available, and Internet communication (1). more are on the way (2). HTML is a language that describes Just as HTML is a language that how information should be displayed needs interpretation, CML also needs within a browser on your screen. To the correct support. Most comput- accomplish this task, HTML uses a ers today come equipped with all set of commands called tags, which the tools you need to use HTML: effectively “mark up” the information for enriched with molecular information about browsers that can display a page (viewers), formatting and display. For example, the atoms, bonds, computed properties, spec- programs that allow you to convert file
©2002 AMERICAN CHEMICAL SOCIETY OCTOBER 2002 TODAY’S CHEMIST AT WORK 17 related software programs. Converters are for the following formats: MDLMolfile, SYBYL trate on the content of the CML document, usually not stand-alone applications; but MOL2, SMILES, PDB (mainly molecular infor- not the code. Current CML editors include if your favorite software has a CML convert- mation), CIF (small molecules), mmCIF JME, JchemPaint, and XACE (2). With these er, then creating a CML document is as easy (macromolecules), MIF (a CIF-like format), tools—viewers, converters, and editors— as clicking a “Save as CML” button (2). JME, XYZ, MOPAC, Gaussian, GAMESS, your computer will be better equipped to Most often, the software manufacturer CASTEP, VAMP, and all JCAMP formats (2). view and create CML documents. decides whether to develop CML convert- Lastly, you may want a CML editor to ers, and this decision is based on demand. create and manipulate complex CML docu- Applications of CML At the moment, manufacturers of analyt- ments. Although CML editors are not neces- Although CML is gaining support within ical instruments do not offer CML convert- sary, powerful editors minimize the need the chemical community, it is difficult to ers. However, file converters are available for programming, allowing you to concen- assess its popularity. Because CML can be used to communicate many different aspects of chemical information, applica- tions such as publishing, terminology, regu- latory processes, and molecular databases may benefit from it (2). According to Murray-Rust and Rzepa, the University of California–San Diego has adopted CML as the chemical technology for its new infor- mation and computing grid portals and for the Protein Data Bank (5). For a list of CML examples, visit the CML website (www. xml-cml.org). Once the tools for creating and view- ing CML documents become more main- stream, CML may be poised to revolution- ize the nature of chemical communication. Imagine conducting your Internet search for 1-bromobutane again in a CML world. Your results would include any document with any instance of this compound, regard- less of how it was named or portrayed. Instead of printing this document, you could index and store the information in your personal molecular database. You could use any attribute of the information in your research. The data that you collect from your instruments could be automat- ically stored as CML and mathematically compared with the data from your search. The same search robot could find all of the suppliers that sell the compounds you need to carry out your experiments and place an order with the vendor who offers the lowest price or the highest purity. You could easily publish loss-less results and know that other scientists would be able to find them. In short, your computer could become a complete digital laboratory note- book, which you could seamlessly share with the world (2). Just how far off the CML dream is from reality depends mainly on how well it is embraced by chemists, publishers of chemical information, and manufacturers of chemistry-related software and instru- mentation. After all, how useful would the Internet be if there were only 40 published webpages, and you were limited to search- ing them using a text-only Web browser?
18 TODAY’S CHEMIST AT WORK OCTOBER 2002 www.tcawonline.org References Language, 2001; www.xml-cml.org/informa (1) Murray-Rust, P.; Rzepa, H. S.; Wright, M. tion/position.html. New J. Chem. 2001, 25, 618–634. (2) Murray-Rust, P.; Rzepa, H. S. CML Frequent- This article is reprinted from Analytical ly Asked Questions. Chemical Markup Language, Chemistry, July 1, 2002, p 389A–390A. 2001; www.xml-cml.org/faq/index.html. (3) Murray-Rust, P.; Rzepa, H. S.; Wright, M.; Zara, Yuh-Mei Liao and Hamid Ghanadan are S. Chem. Commun. 2000, 16, 1471–1472. multimedia producers for gh multimedia, (4) Murray-Rust, P.; Rzepa, H. S.; Leach, C. Inc., specializing in scientific communica- CML–Chemical Markup Language, 2000; http://origin.ch.ic.ac.uk/cml. tion. Send your comments or questions (5) Murray-Rust, P.; Rzepa, H. S. Chemical Markup regarding this article to [email protected] or to Language. A Position Paper. Chemical Markup the Editorial Office address on page 6. ◆
20 TODAY’S CHEMIST AT WORK OCTOBER 2002 www.tcawonline.org