Computing and Chemistry

YUH-MEI LIAO AND The Chemical HAMID GHANADAN

CML is powerful, but its impact will depend disposal, you can manipulate the spectrum to optimize the information density, view on how many people use it. it at higher resolutions, print it, or even add it to your personal database. CML magine that you are searching the markup language that allows the creation accomplishes this by using tags to describe Internet for publications that include of unique tags for data identification and and preserve the integrity of chemical infor- Ithe compound 1-bromobutane. You go transmission. HTML, on the other hand, mation, making it a loss-less form of chem- to your favorite search engine and type merely formats information on the screen ical information transfer (2). The same “1-bromobutane”. Hopefully, your results (1). With CML, Web documents can be concept can be applied to chemical struc- will contain webpages with that exact tures, annotated chemical reactions, phrase. Your results, however, will not and other properties. include any documents that only use 1-bromobutane’s other chemical refer- Tools of the CML trade ences, such as n-butyl bromide or Peter Murray-Rust, , and

CH3(CH2)3Br. Your results are disap- Christopher Leach first presented pointing because your computer has CML at the 1995 ACS meeting in no way of discerning chemical infor- Chicago (4). Real progress came mation. To your Internet browser, when CML adopted XML as its base the letters “Br” in the molecular in 1997. Since then, more extensive structure CH3(CH2)3Br are no differ- development tools have been creat- ent than the same letters in the word ed. In 1999, for example, the first “Bravo”. This is a serious problem in formal set of CML specifications was hypertext markup language (HTML), published (5). Software programs to the current standard language for interpret CML are now available, and Internet communication (1). more are on the way (2). HTML is a language that describes Just as HTML is a language that how information should be displayed needs interpretation, CML also needs within a browser on your screen. To the correct support. Most comput- accomplish this task, HTML uses a ers today come equipped with all set of commands called tags, which the tools you need to use HTML: effectively “mark up” the information for enriched with molecular information about browsers that can display a page (viewers), formatting and display. For example, the atoms, bonds, computed properties, spec- programs that allow you to convert file Br tags tell the browser that tral data, units, or structures (2). The data types for publishing on the Web (convert- the letters Br should be displayed in within the CML document can be searched, ers), and applications that let you create bold. They do not tell the browser that Br archived, and viewed without losing infor- and modify your own webpages (editors) is the chemical symbol for bromine. A better mation (1). (2). CML requires the same three types of way to publish this information would be To better understand the “loss-less” software. to use a language that can discern the nature of CML, imagine that your search Viewers, which are available as plug-ins difference between “CH3(CH2)3Br” and turns up an HTML webpage that contains for Web browsers, are necessary to correct- “Bravo”—a language specific to chemistry. information about 1-bromobutane. Includ- ly display the CML information. The JUMBO Instead of using display tags such as ed in the page is the compound’s GC/MS 3.0 viewer was originally designed to , the language would use tags, such spectrum. Since HTML has no means of demonstrate the power and usefulness of as , that describe the infor- storing the raw spectral data, you are CML and is available as a free download. mation itself (1). Chemical markup provided a low-resolution image of the Other current plug-ins are JMVS, JME, , language (CML) is designed to do just that. spectrum (1). Because you have no real JchemPaint, and CrocZilla (2). Each offers access to the spectral information—only slightly different features. For example, the The Nature of CML what your eye can discern—the spec- most recent version of JME is the only one CML, an offshoot of the extensible markup trum in the HTML document is classified at the moment to operate seamlessly as an language (XML), describes the chemical as “lossy”. On the other hand, CML is capa- applet within Web browsers. nature of information. According to the ble of holding the raw data and produc- Converters are also essential for publish- online dictionary Webopedia (www.webo ing the same spectrum on the basis of ing CML documents. They allow you to easi- pedia.com), XML is a universally accepted the numbers (3). With the real data at your ly output a CML file from your chemistry-

©2002 AMERICAN CHEMICAL SOCIETY OCTOBER 2002 TODAY’S CHEMIST AT WORK 17 related software programs. Converters are for the following formats: MDLMolfile, SYBYL trate on the content of the CML document, usually not stand-alone applications; but MOL2, SMILES, PDB (mainly molecular infor- not the code. Current CML editors include if your favorite software has a CML convert- mation), CIF (small molecules), mmCIF JME, JchemPaint, and XACE (2). With these er, then creating a CML document is as easy (macromolecules), MIF (a CIF-like format), tools—viewers, converters, and editors— as clicking a “Save as CML” button (2). JME, XYZ, MOPAC, Gaussian, GAMESS, your computer will be better equipped to Most often, the software manufacturer CASTEP, VAMP, and all JCAMP formats (2). view and create CML documents. decides whether to develop CML convert- Lastly, you may want a CML editor to ers, and this decision is based on demand. create and manipulate complex CML docu- Applications of CML At the moment, manufacturers of analyt- ments. Although CML editors are not neces- Although CML is gaining support within ical instruments do not offer CML convert- sary, powerful editors minimize the need the chemical community, it is difficult to ers. However, file converters are available for programming, allowing you to concen- assess its popularity. Because CML can be used to communicate many different aspects of chemical information, applica- tions such as publishing, terminology, regu- latory processes, and molecular databases may benefit from it (2). According to Murray-Rust and Rzepa, the University of California–San Diego has adopted CML as the chemical technology for its new infor- mation and computing grid portals and for the Protein Data Bank (5). For a list of CML examples, visit the CML website (www. -cml.org). Once the tools for creating and view- ing CML documents become more main- stream, CML may be poised to revolution- ize the nature of chemical communication. Imagine conducting your Internet search for 1-bromobutane again in a CML world. Your results would include any document with any instance of this compound, regard- less of how it was named or portrayed. Instead of printing this document, you could index and store the information in your personal molecular database. You could use any attribute of the information in your research. The data that you collect from your instruments could be automat- ically stored as CML and mathematically compared with the data from your search. The same search robot could find all of the suppliers that sell the compounds you need to carry out your experiments and place an order with the vendor who offers the lowest price or the highest purity. You could easily publish loss-less results and know that other scientists would be able to find them. In short, your computer could become a complete digital laboratory note- book, which you could seamlessly share with the world (2). Just how far off the CML dream is from reality depends mainly on how well it is embraced by chemists, publishers of chemical information, and manufacturers of chemistry-related software and instru- mentation. After all, how useful would the Internet be if there were only 40 published webpages, and you were limited to search- ing them using a text-only Web browser?

18 TODAY’S CHEMIST AT WORK OCTOBER 2002 www.tcawonline.org References Language, 2001; www.xml-cml.org/informa (1) Murray-Rust, P.; Rzepa, H. S.; Wright, M. tion/position.. New J. Chem. 2001, 25, 618–634. (2) Murray-Rust, P.; Rzepa, H. S. CML Frequent- This article is reprinted from Analytical ly Asked Questions. Chemical Markup Language, Chemistry, July 1, 2002, p 389A–390A. 2001; www.xml-cml.org/faq/index.html. (3) Murray-Rust, P.; Rzepa, H. S.; Wright, M.; Zara, Yuh-Mei Liao and Hamid Ghanadan are S. Chem. Commun. 2000, 16, 1471–1472. multimedia producers for gh multimedia, (4) Murray-Rust, P.; Rzepa, H. S.; Leach, C. Inc., specializing in scientific communica- CML–Chemical Markup Language, 2000; http://origin.ch.ic.ac.uk/cml. tion. Send your comments or questions (5) Murray-Rust, P.; Rzepa, H. S. Chemical Markup regarding this article to [email protected] or to Language. A Position Paper. Chemical Markup the Editorial Office address on page 6. ◆

20 TODAY’S CHEMIST AT WORK OCTOBER 2002 www.tcawonline.org