Umansysprop V1.0: an Online and Open-Source Facility for Molecular Property Prediction and Atmospheric Aerosol Calculations

UManSysProp V1.0: An Online and Open-Source Facility for Molecular Property Prediction and Atmospheric Aerosol Calculations Topping, David and Barley, Mark and Bane, Michael and Higham, Nicholas J. and Aumont, Berbard and Dingle, Nicholas and McFiggans, Gordon 2016 MIMS EPrint: 2016.13 Manchester Institute for Mathematical Sciences School of Mathematics The University of Manchester Reports available from: http://eprints.maths.manchester.ac.uk/ And by contacting: The MIMS Secretary School of Mathematics The University of Manchester Manchester, M13 9PL, UK ISSN 1749-9097 Geosci. Model Dev., 9, 899–914, 2016 www.geosci-model-dev.net/9/899/2016/ doi:10.5194/gmd-9-899-2016 © Author(s) 2016. CC Attribution 3.0 License. UManSysProp v1.0: an online and open-source facility for molecular property prediction and atmospheric aerosol calculations David Topping1,2, Mark Barley2, Michael K. Bane3, Nicholas Higham4, Bernard Aumont5, Nicholas Dingle6, and Gordon McFiggans2 1National Centre for Atmospheric Science, Manchester, M13 9PL, UK 2Centre for Atmospheric Science, University of Manchester, Manchester, M13 9PL, UK 3High End Compute, Manchester, M13 9PL, UK 4School of Mathematics, University of Manchester, Manchester, M13 9PL, UK 5LISA, UMR CNRS 7583, Universite Paris Est Creteil et Universite Paris Diderot, Creteil, France 6Numerical Algorithms Group (NAG), Ltd Peter House Oxford Street, Manchester, M1 5AN, UK Correspondence to: David Topping ([email protected]) Received: 1 September 2015 – Published in Geosci. Model Dev. Discuss.: 3 November 2015 Revised: 3 February 2016 – Accepted: 16 February 2016 – Published: 1 March 2016 Abstract. In this paper we describe the development and opment of a user community. We have released this via a application of a new web-based facility, UManSysProp Github repository (doi:10.5281/zenodo.45143). In this paper (http://umansysprop.seaes.manchester.ac.uk), for automating we demonstrate its use with specific examples that can be predictions of molecular and atmospheric aerosol proper- simulated using the web-browser interface. ties. Current facilities include pure component vapour pressures, critical properties, and sub-cooled densities of organic molecules; activity coefficient predictions for mixed inorganic–organic liquid systems; hygroscopic growth fac- 1 Introduction tors and CCN (cloud condensation nuclei) activation poten- The many thousands of individual aerosol components en- tial of mixed inorganic–organic aerosol particles; and ab- sure that explicit manual calculation of properties that in- sorptive partitioning calculations with/without a treatment fluence their environmental impacts is laborious and time- of non-ideality. The aim of this new facility is to provide a consuming. The emergence of explicit automatic mechanism single point of reference for all properties relevant to atmo- generation techniques (Aumont et al., 2005; Jenkin et al., spheric aerosol that have been checked for applicability to 2012), including up to many millions of individual gas phase atmospheric compounds where possible. The group contri- products as aerosol precursors, renders manual calculations bution approach allows users to upload molecular informa- impossible and automation is necessary. For example, both tion in the form of SMILES (Simplified Molecular Input Line inorganic and organic material can transfer between the gas Entry System) strings and UManSysProp will automatically and particle phase. Inorganic electrolytes are restricted to a extract the relevant information for calculations. Built using few well-understood compounds. However, organic material open-source chemical informatics, and hosted at the Univer- can comprise many thousands of compounds with potentially sity of Manchester, the facilities are provided via a browser a vast range of properties (Hallquist et al., 2009). Predict- and device-friendly web interface, or can be accessed using ing the evolution of aerosol requires calculating the distri- the user’s own code via a JSON API (application program bution of all components between the gas and aerosol phase interface). We also provide the source code for all predic- according to equilibrium partitioning or disequilibrium mass tive techniques provided on the site, covered by the GNU transfer. Either treatment requires knowledge of all compo- GPL (General Public License) license to encourage devel- nent vapour pressures and other thermodynamic properties. Published by Copernicus Publications on behalf of the European Geosciences Union. 900 D. Topping et al.: UManSysProp In the moist atmosphere, the most abundant material that can sues. Current facilities include pure component vapour pres- readily interact with aerosol particles is water vapour. The sures, critical properties, and sub-cooled densities of or- formation of atmospheric liquid water has a profound influ- ganic molecules; activity coefficient predictions for mixed ence on the aerosol life cycle and climate. Predicting the hy- inorganic–organic liquid systems; hygroscopic growth fac- groscopic response of complex inorganic–organic mixtures tors and CCN (cloud condensation nuclei) activation poten- requires treatment of solution non-ideality, for example. tial of mixed inorganic–organic aerosol particles with asso- It can be difficult to establish what factors are responsible ciated K(kappa)–Köhler values (Kreidenweis et al., 2005); for the outcome of a model prediction. This is particularly and absorptive partitioning calculations with/without a treat- true when the number of components might be high in, for ment of non-ideality. UManSysProp automatically extracts example, SOA (secondary organic aerosol) mass partitioning the relevant information for calculations. Built using open- simulations. It then becomes difficult for others in the source chemical informatics, described in Sect. 2, the fa- community to assess the results presented. This might be cilities are provided via a browser and device-friendly web complicated by the need to include pure component vapour interface. In Sect. 3, examples of each prediction are given pressures or activity coefficient predictions for a wide along with reference to our existing publications that use range of highly multifunctional compounds. For example, these tools. Providing a wide range of comparisons between predictions of aerosol hygroscopicity have either been based predictions and measurements of each property is outside on simplified Kohler theory at one extreme (Kreidenweis of the scope of this paper given all of the potential sub- et al., 2005) or thermodynamic equilibrium models at the tleties associated with measurement data (e.g. Topping and other (Topping et al., 2005). It is not clear to what extent McFiggans, 2012). Nonetheless, by providing a minimum replication of results is ever achieved for a range of aerosol set of examples for each case, the ability to perform such simulations. Whilst this might also be an issue with results comparisons and act as the community’s point of reference from instrumentation, the development of community driven is demonstrated. Relevant inputs to replicate these exam- software at least enables modellers to tackle this problem ples are given in the text, with larger files to upload pro- directly. There are a number of property prediction facilities vided in Appendix A. If you want to access UMansSysProp that are available online. For example, the US EPA (Environ- without using a web-browser, we also provide a program- ment Protection Agency) host predictive models and tools mer friendly JSON API (application program interface) that for assessing chemicals under the Toxic Substances Control enables you to call our suite of tools from your own code. Act (TSCA) (http://www.epa.gov/tsca-screening-tools). This is described in detail on our ReadTheDocs.org web page From this site one can access the simulation program (https://umansysprop.readthedocs.org/) with an example pro- EPi Suite (http://www.epa.gov/tsca-screening-tools/ vided in Appendix A. We also provide the source code for download-epi-suitetm-estimation-program-interface-v411). all predictive techniques provided on the site, covered by This provides a number of facilities including estimates of the GNU GPL (General Public License) license to encour- physical/chemical properties (melting point, water solubility, age development of a user community. We have released etc.) and environmental fate properties (breakdown in water this via a Github repository https://github.com/loftytopping/ or air, etc.). The Dortmund Databank (DDB) provides a UManSysProp_public.git, which has an associated DOI for wide range of database and software products related to the exact model version given in this paper as provided by fundamental properties of molecules and mixtures. With the Zenodo service (doi:10.5281/zenodo.45143). varying proprietary and free educational access, their program package ARTIST – Thermophysical Properties from Molecular Structure was developed for the estimation 2 Chemo-informatics base of UManSysProp of pure component properties from molecular structure. In the UK the National Chemical Database Service (CDS) The discipline of chemo-informatics typically concerns the provides free access to web-based services including use of both software and computational hardware techniques ACD (Advanced Chemistry Development Inc.)/Labs Inc applied to a range of problems in chemistry. The emergence Physchem and NMR (nuclear magnetic resonance) predic- of the open-source movement has

Umansysprop V1.0: an Online and Open-Source Facility for Molecular Property Prediction and Atmospheric Aerosol Calculations

The Alexandria Library, a Quantum-Chemical Database of Molecular Properties for Force ﬁeld Development 9 2017 Received: October 1 1 1 Mohammad M

Flexible Heuristic Algorithm for Automatic Molecule Fragmentation: Application to the UNIFAC Group Contribution Model Simon Müller*

Chemical Database Projects Delivered by RSC Escience

Predicting Outcomes of Catalytic Reactions Using Machine Learning

Bringing Open Source to Drug Discovery

PSC-Db: a Structured and Searchable 3D-Database for Plant Secondary Compounds

Chemical Space, Diversity, and Complexity[Version 1; Peer Review: 2

Daylight Theory Manual Daylight Theory Manual Table of Contents Daylight Theory Manual

A Database of Medicinal Materials and Chemical Compounds in Northeast

Recent Advances in Multidimensional QSAR (4D-6D): a Critical Review

Chembiofinder V14 User Guide

View and Approval by a Chemoinformatician