![Open Chemistry: Restful Web Apis, JSON, Nwchem and the Modern Web Application Marcus D](https://data.docslib.org/img/3a60ab92a6e30910dab9bd827208bcff-1.webp)
Hanwell et al. J Cheminform (2017) 9:55 DOI 10.1186/s13321-017-0241-z RESEARCH ARTICLE Open Access Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application Marcus D. Hanwell1*† , Wibe A. de Jong2† and Christopher J. Harris1 Abstract An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform ofers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It ofers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing resources, and ofer command-line tools to automate interaction—connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platform with automated deployment possible using Ansible cou- pled with standard Ubuntu-based machine images deployed to cloud machines. The platform is designed to enable teams to reap the benefts of the connected web—going beyond what conventional search and analytics platforms ofer in this area. It also has the goal of ofering federated instances, that can be customized to the sites/research per- formed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that sup- port computational chemistry calculations. These structures were developed to make it easy to process data across diferent languages, and send data to a JavaScript-based web client. Keywords: Chemistry, Web, Data, Semantic, NWChem, JSON Background the Clean Energy Project [8] (over 2.3 million calculated Te in-silico determination of chemical and materi- structures). als properties is a vital capability that drives innovation Driven by the U.S. Materials Genome Initiative the across many market sectors. Its importance is refected in development of new and novel materials has become a the number of codes that can perform simulations over multidisciplinary research endeavor where complex sim- a broad range of levels of theory and length scales [1–5], ulation and experimental data get integrated, and analyt- and the enormous investments in experimental facilities ics such as machine learning techniques are utilized to that can also produce large data that is often difcult or aid in scientifc discovery. Tis same multidisciplinary impossible to reproduce. However, it is all too common approach is becoming essential in chemical and bio- for experimental and computational studies to take place logical research and development, in the design of new independently, with large scale studies often involving chemicals, biomolecules and drugs, or new energy ef- heroic eforts developing one-of software projects dedi- cient chemical production processes. cated to the specifc resource, such as the Protein Data Tere is a strong need to develop a collaborative scien- Bank [6] (over 100,000 experimental structures), Mate- tifc research software platform that enables researchers rials Project [7] (over 33,000 simulated materials), and to defne concepts and hypotheses, add them, and ana- lyze integrated sets of experimental and computational data to ofer efective knowledge discovery more univer- sally. Tis must go well beyond a web portal to create an *Correspondence: [email protected] interactive platform integrating simulation, experimental †Marcus D. Hanwell and Wibe A. de Jong contributed equally to this work data, and analytics, while leveraging semantic web tech- 1 Kitware, Inc., 28 Corporate Drive, Clifton Park, NY 12065, USA Full list of author information is available at the end of the article nologies to support the federated storage of data across © The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Hanwell et al. J Cheminform (2017) 9:55 Page 2 of 10 geographically dispersed sites. Te use of language inde- with translation facilities using existing open source tools pendent programming interfaces that are consumed to existing fle formats, along with integrated visualiza- by web, desktop, and command-line clients will enable tion/analysis capabilities in the web browser. researchers to make much better use of their data. Te Te project can be used directly, extended for related platform described aims to provide an open source ser- use cases, or built upon in future work to ofer a more vice that can satisfy this basic need, and also ofer sim- comprehensive semantically enriched platform for chem- ple methods of extension to accommodate new areas of ical data. Most existing platforms use approaches that research. mix data with web page generation on the server, rather Te web has evolved signifcantly in the last two decades, than embracing modern client-server approaches using and new developments should be explored to assess what the latest web standards and frameworks. Te source can be achieved in a platform that seeks to use the latest code is often not available, and the platforms are curated open source tools, technologies, standards, and approaches centrally—such as the PDB, Materials Project and Clean to deliver an end-to-end platform for chemical/materi- Energy Project mentioned earlier. als research. Te basic approach employed was to develop a server component written in the Python language that Methods exposes RESTful (representational state transfer) endpoints An open source prototype web platform was developed to interact with the data on the server. Te Python code can to demonstrate key capabilities in addressing the needs use existing core functionality, Python modules, and wrapped outlined in the introduction, and summarized in Fig. 1. libraries developed in other programming languages. Te application has a number of components devel- Ideally no HTML, images, etc would be generated oped in several languages following modern develop- on the server, the server acts as a data server primarily ment methodologies. It was intentionally developed through RESTful endpoints that accept/return data, along using some of the latest technology innovations, which with user authentication (required by some endpoints). It means that it requires a modern web browser that sup- can trigger calculations, perform analyses, and batch jobs ports WebGL [9] in order to render 3D geometry, and it in order to make the data discoverable. Te server can makes extensive use of HTML5 [10]. It is clear that not then be consumed by a rich HTML5 web client, more tra- all devices/web browsers have full support for these tech- ditional desktop clients, and from command-line clients nologies at this time, but that this support is already sub- or other servers to perform automated workfows. stantial and will grow in the coming years. In this work a rich HTML5 web interface was developed Te server-side components were written in Python [11], that made use of the server’s API (application program- with wrapped C++ code [12] providing access to a num- ming interface) briefy discussed above. It consumes data ber of existing chemistry/materials libraries such as Avoga- from the server, uploading/editing data, and maintain- dro [13] and Open Babel [14]. Te basis for the server-side ing local state in a one-page application reusing a popular project was the Girder project [15], which reuses a number open source HTML5 framework. Te application inte- of Python modules such as CherryPy [16] to provide a mini- grated other open source frameworks for client side chart- malist Python web framework, Swagger [17] to document the ing, and 3D rendering/visualization of molecular structure. programming interfaces, MongoDB [18] to store data/meta- Te web application was developed using open source data, and Virtuoso [19] to store triples. Te chemistry spe- tools, a number of frameworks, and was “built” as a static cifc functionality was developed as additional programming bundle of HTML5 web assets that are downloaded by the interfaces using the Girder project’s plugin mechanism to add web client. It dynamically constructs the page in response additional endpoints extending upon existing functionality. to user interaction, server data, and other client events Te client-side components were developed in HTML5, to provide a rich, interactive experience. Tis means using open source web frameworks/technologies such as that many interactions take place entirely on the client, AngularJS 1.6 [20] and Material Design [21] to provide a requiring no interaction with or access to the server. Tis single page web application. 3DMol.js [22] was used to ren- ofers interactive data visualization and analysis, even on der molecular geometry in 3D, D3 [23] to render charts, relatively low bandwidth links, once initial data for a mol- and responsive design elements to accommodate devices of ecule or calculation has been downloaded. various sizes/aspect rations. Te capabilities developed have
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-