View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Available online at www.sciencedirect.com ScienceDirect

SoftwareX 5 (2016) 1–5 www.elsevier.com/locate/softx

Open cyberGIS software for geospatial research and education in the big data era

Shaowen Wanga,b,c,d,e,f,∗, Yan Liua,b,c,f, Anand Padmanabhana,b,c,f

a CyberGIS Center for Advanced Digital and Spatial Studies, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA b and Geospatial Information Laboratory, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA c Department of Geography and Geographic Information Science, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA d Department of Urban and Regional Planning, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA e Graduate School of Library and Information Science, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA f National Center for Supercomputing Applications, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA Received 16 March 2015; received in revised form 3 July 2015; accepted 28 October 2015

Abstract CyberGIS represents an interdisciplinary field combining advanced cyberinfrastructure, geographic information science and systems (GIS), spatial analysis and modeling, and a number of geospatial domains to improve research productivity and enable scientific breakthroughs. It has emerged as new-generation GIS that enable unprecedented advances in data-driven knowledge discovery, visualization and visual analytics, and collaborative problem solving and decision-making. This paper describes three open software strategies – open access, source, and integration – to serve various research and education purposes of diverse geospatial communities. These strategies have been implemented in a leading-edge cyberGIS software environment through three corresponding software modalities: CyberGIS Gateway, Toolkit, and Middleware, and achieved broad and significant impacts. ⃝c 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/ by/4.0/).

Keywords: CyberGIS; Cyberinfrastructure; Geospatial big data

Code metadata

Current code version v0.6 Permanent link to code/repository used of this code version https://github.com/ElsevierSoftwareX/SOFTX-D-15-00005 Legal Code License NCSA open source license Code versioning system used git Software code languages, tools, and services used C, C++, Python, Bash; MPI, OpenMP, CUDA Compilation requirements, operating environments & dependencies Compilers: GNU/Intel/Cray; OS: Linux (RedHat, Debian, Ubuntu, CentOS, SUSE); Dependencies: GDAL, GEOS, PROJ4, SPRNG, PySAL, OpenGeoDa, etc. If available Link to developer documentation/manual https://github.com/cybergis/cybergis-toolkit http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php/ct Support email for questions CyberGIS Helpdesk ([email protected])

∗ Corresponding author at: CyberGIS Center for Advanced Digital and Spatial Studies, University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA. E-mail address: [email protected] (S. Wang).

http://dx.doi.org/10.1016/j.softx.2015.10.003 2352-7110/⃝c 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). 2 S. Wang et al. / SoftwareX 5 (2016) 1–5

1. Motivation and significance an open access platform, Gateway represents a software-as- service approach that significantly reduces the complexity Geospatial data and related analytics have become ubiqui- of accessing advanced CI and managing cyberGIS software. tous as continued growth in geographic information science In general, advancing scientific software requires both soft- and technology enables scientific investigations and decision- ware engineering and domain-specific scientific knowledge. In making support in a plethora of science and engineering fields particular, cyberGIS software exhibits additional dimensions including for example ecology, environmental science and of complexity due to the integration with high-performance engineering, public health, geosciences, and social sciences parallel and distributed computing resources and services, [1,2]. Extensive computational capabilities are needed to man- and diverse geospatial user communities. Each Gateway ser- age and analyze massive quantities of complex and heteroge- vice is currently implemented as a RESTful web service neous geospatial data collected across multiple scales and used (https://en.wikipedia.org/wiki/Representational state transfer). for diverse applications by many geospatial communities [3]. The service-oriented approach alone is not sufficient for However, conventional GIS approaches and associated software broad open access to cyberGIS capabilities. Using cyberGIS tools are primarily developed using sequential computing and functions often needs highly interactive user interfaces because cannot adequately resolve this increasing data intensity, com- geospatial data and analytics require frequent user involvement plexity, and diversity of applications [4]. CyberGIS – defined as in such tasks as data and study area selection, map projection, GIS based on advanced cyberinfrastructure (CI) – collectively feature extraction, and map visualization. Therefore, Gateway harnessing heterogeneous CI resources (e.g., cloud, high-end, is designed to provide a rich set of interactive user interface and high-throughput) has emerged as new-generation GIS for components for cyberGIS data and analytics by exploiting resolving geospatial big data challenges [5,6]. advances in web technologies such as HTML5 and geospatial The rapid development of cyberGIS as an interdisciplinary visualization software. A Gateway application is a standalone field has been pushed by advanced digital technologies and web application within the Gateway online framework to pulled by a large number of scientific innovation and discovery interact with backend CI and services for a suite of geospatial challenges and opportunities that exist in numerous geospatial data and analytical functions. Detailed discussion about communities. CyberGIS has evolved as a complex ecosystem Gateway application development can be found in [8,10]. This of hardware, infrastructure, software and services, and video: https://www.youtube.com/watch?v=hrJ cZkG-Xs&t=12 applications [7]. Open software is critical to effectively resolve provides an illustrative example of a geoscience application the complexity of the ecosystem and support the diversity of while demonstrating how Gateway can interoperate with online geospatial communities. Our open cyberGIS software approach data services. has three key strategies: open access, source, and integration CyberGIS Gateway has been advanced as an open access enabled by three corresponding modalities: CyberGIS Gateway, environment for a large number of users to perform compute- Toolkit, and Middleware [8,5,9]. CyberGIS Gateway (referred and data-intensive, and collaborative geospatial problem solv- to as Gateway hereafter) provides an online problem-solving ing enabled by advanced CI. The development of Gateway environment for geospatial communities to access cyberGIS software focuses on reusable cyberGIS user interface com- software and data capabilities based on CI. CyberGIS ponents and Gateway portal management. Reusable user in- Toolkit maintains a suite of community-selected open source terface components such as map panel, visualization and spatial analysis and modeling software that is scalable on symbology, data layer ordering, and map-making functions are high performance computing resources. GISolve Middleware built in Gateway as JavaScript library for application develop- bridges Gateway and Toolkit to manage the complexity of CI ment. A coding framework is established for scalable integra- access. These three modalities form open software architecture tion of individual application codes and portal management. (see Figure 1 in [8] and Figure 1 in [10]) to address open access, open API, open source, and CI-based integration and computation for cyberGIS software. This cyberGIS approach 3. CyberGIS Toolkit—open source has already had significant impact in a number of domains (e.g., biosciences [11], coupled human–natural systems [12], CyberGIS Toolkit integrates a set of loosely coupled scal- econometrics [13], and public health [14,15]). able geospatial software components for the following pur- poses [10]: 2. CyberGIS Gateway—open access • Sustain the CyberGIS Toolkit as a reliable community soft- ware toolbox for scalable cyberGIS analytics through rigor- Gateway is the leading online geospatial problem-solving ous software building, testing, packaging, and deployment environment providing cyberGIS capabilities to serve various based on open source software practice; research and education purposes [8,5]. As a pioneer of sci- ence gateways [16], Gateway is built on the TeraGrid GI- • Capture spatial characteristics of software elements to Science Gateway approach to bridging advanced CI and GIS achieve optimal computational performance, scalability, and capabilities through friendly user interfaces based on rich- portability in various CI environments; and client web technologies [17]. Gateway capabilities are made • Engage computational and data scientists to advance available to users at two levels: service and application. As scalable geospatial computing. S. Wang et al. / SoftwareX 5 (2016) 1–5 3

Table 1 Software components in CyberGIS Toolkit.

Name Description Scalable computing Deployment Scalability (cores) PABM Scalable agent-based modeling MPI + MPI IO XSEDE 16,384 Parallel PySAL Scalable PySAL functions Multi-core XSEDE 32+ PGAP Parallel Genetic Algorithm Library MPI XSEDE Blue Waters 262,144 pRasterBlaster Map reprojection MPI + MPI IO XSEDE 1024+ SPREG Spatial regression High throughput Phantom Cloud On demand TauDEM Hydrological information analysis MPI + MPI IO XSEDE 1024 Viewshed Visibility analysis GPU ROGER Single GPU WRF Multi-scale weather modeling MPI XSEDE 4096 Note: XSEDE (the Extreme Science and Engineering Discovery Environment, http://xsede.org), Blue Waters (http://bluewaters.ncsa.illinois.edu), ROGER (https://wiki.ncsa.illinois.edu/display/ROGER) are CI programs and facilities supported in part by the US National Science Foundation. Phantom Cloud is an on-demand and scalable cloud resource hosted at the Argonne National Laboratory.

Table 1 lists a suite of representative software components that • Language and system: Python modules (e.g., Numpy and have been integrated in the current release or have open access Scipy), Lustre file system tools (for MPI IO), MPI, OpenMP, in Gateway and are planned to release in CyberGIS Toolkit. CUDA, and SPRNG (parallel random number generator); These components are based on research codes developed in • Geospatial software: GEOS, Proj4, GDAL, PySAL, Shapely, various cyberGIS-related community projects. Each component Geoserver, and PostGIS; and follows the CyberGIS Toolkit integration process based on • Performance profiling: PAPI, IPM, and Darshan. community needs and code readiness levels such as code CyberGIS Toolkit can be downloaded and deployed on quality and scalability, and whether associated work has been computational resources configured with parallel computing published. All of the CyberGIS Toolkit components are open capabilities. Toolkit deployment includes both programming source. libraries and applications that can be directly used by end users. CyberGIS Toolkit has a continuous integration process es- tablished to streamline the integration of an identified geospa- 4. GISolve middleware—open integration tial code through rigorous open source software engineering. If a code needs to be refined for evaluation from the open The open integration strategy is designed to provide source community, code developers are provided with cloud- interoperable access to cutting-edge CI resources and based development virtual machines customized for software establish spatially intelligent programming capabilities to help library support for this particular code. Developers adopt appro- application developers directly benefit from accessing advanced priate desktop-level software testing tools (e.g., Python Note- CI capabilities. To achieve this, geospatial software experts, book/iPython) for component-level testing. As the code is who are trained to program using geospatial tools and services ready for integration, two levels of testing are applied: porta- but may not be well equipped to directly work with advanced bility and scalability. The portability test is conducted with CI and geospatial big data, need to be enabled to bridge the different combinations of operating systems, architecture, and technical chasm and hence, the open integration strategy is focused on this gap. Specifically, the open integration strategy software library versions through CI resources for software manages the complexity of CI access, while providing them building and testing (e.g. https://www.batlab.org/). The scal- with spatially aware APIs. ability test requires high performance computing expertise The GISolve middleware fulfills this role enabling and computational performance profiling to identify potential open integration of advanced computing and information performance bottlenecks on computing, memory, input/output infrastructure with geographic information system capabili- (IO), and network. This test is critical to enhance the scalability ties for computationally intensive and collaborative geospa- of a code to both high performance computing resources and tial problem solving via a suite of open service APIs problem sizes. Oftentimes, such tests and computational inten- (available at: http://sandbox.cigi.illinois.edu/home/doc/gosapi/ sity evaluation lead to improved scalable algorithms and novel GISolveOpenServiceAPI.html). In particular, the GISolve computational techniques [18,19]. As the scalability of a Cy- Open Service API defines a set of REST Web service inter- berGIS Toolkit component is improved through this process, a faces for authentication, application integration, and CI-based major benefit for users is that they are able to solve geospatial geospatial computation; plays an important role in the integra- problems through resolving big data, which would not be fea- tion of applications and the management of the compute and sible using conventional GIS approaches. This is because for data requirements; and is a key enabler for the open access most of software components in Toolkit, computation repre- provided by the CyberGIS Gateway. Furthermore, GISolve sents a major bottleneck when data volume is significant. middleware is spatially aware and models the computational In addition to the software components (Table 1), CyberGIS intensity of spatial analysis and modeling to represent compu- Toolkit also manages software dependencies collectively. tational requirements based on CI [4]. Current components rely on the following three types of open Key GISolve web service interfaces for enabling open in- source software libraries/APIs: tegration include: (1) application integration API provides a 4 S. Wang et al. / SoftwareX 5 (2016) 1–5 mechanism to integrate application software into the cyberGIS software environment and allows application software devel- opers and contributors to define customized user/service inter- faces, deploy their applications on CI resources and publish these services into the cyberGIS software environment; (2) se- curity API provides a token-based authentication and autho- rization framework for integrating services, web applications (e.g. CyberGIS Gateway), and CI resources; and (3) compu- tation API manages the complexity of computation, data and visualization resources to provide on-demand and flexible mechanisms to access CI resources needed to support cyberGIS analytics.

5. Impact Fig. 1. Spatial distribution of CyberGIS Gateway users across the globe. The three modalities of the open cyberGIS software posed by various geospatial communities with varied levels of approach provide a comprehensive open software solution computational expertise and technological skills, we estab- to advancing cyberGIS and related domain sciences, high- lished an open-access, open-source and open-integration soft- performance spatial analysis and modeling algorithm devel- ware approach with three distinct but interrelated modalities. opment, and scalable computational methods. For example, Within the NSF CyberGIS software project these three modal- CyberGIS Gateway has been used across the globe for var- ities are represented respectively by (1) CyberGIS Gateway, ious cutting-edge research and education purposes (Fig. 1). focused on broad communities without sophisticated techni- CyberGIS Gateway can also be widely accessed by general cal knowledge; (2) CyberGIS Toolkit, focused on cyberGIS ex- public to gain understanding about advanced CI-enabled sci- perts with deep technical knowledge of both GIS and CI; and entific problem solving through customizable and friendly user (3) GISolve Middleware, that bridges the gap between Gate- interfaces. The holistic cyberGIS software approach has helped way and Toolkit and focuses on providing GIS developers with improve the software capabilities of the USGS National Map friendly service interfaces for supporting CI-enabled applica- program, which also has significant and broad societal im- tion integration and execution. Together, they form a cutting- pacts. As a software contributor, USGS published a scalable edge geospatial big data and compute platform for cyberGIS map reprojection software as an open source software after the communities. CI-based pRasterBlaster integration significantly improved the From CI software perspective, the Gateway modality scalability of this software on thousands of processors. On the is similar to successful software platform approaches de- other hand, as a cyberGIS community user, they found and are veloped in other science domains, such as the NanoHub using TauDEM, another Toolkit component, to accelerate the (http://nanoHUB.org, Madhavan et al. [20]) in nanotechnol- National Hydrography Dataset (NHD) research. This balanced ogy and the generalized HubZero approach (http://hubzero.org, and holistic open software approach has potential to be appli- McLennan and Kennell [21]). A key difference from such cable to other inter- and multi-disciplinary communities facing similar approaches is that cyberGIS software focuses on geospatial big data challenges. As most of the Toolkit compo- provisioning geospatial capabilities and exploiting geospatial nents have gone through software engineering and scalability characteristics in big data and compute. We believe this ex- testing on advanced CI resources, Toolkit components them- panded approach to openness employed by the cyberGIS soft- selves and the experience of developing scalable software on ware ecosystem represents a powerful enhancement to open CI have the potential to be directly adopted or adapted for in- source that has undoubtedly benefited the cyberGIS community dustrial use. and likely holds potential for adoption in other science commu- nities. 6. Conclusions Acknowledgments Over the past several years, cyberGIS has grown rapidly in an organic and distributed fashion into a complex ecosys- This paper and associated materials are based in part tem of online interfaces, software tools and services, which upon work supported by the National Science Foundation enable a number of spatial analysis, modeling and simula- (NSF) under grant numbers: 0846655, 1047916, 1354329, tion applications by employing the capabilities of advanced and 1443080. This work used the NSF Extreme Science and CI. This growth has presented and crystallized unique require- Engineering Discovery Environment (XSEDE). NSF supports ments that are not fully met by the traditional open source XSEDE under grant number 1053575. Any opinions, findings, software model. In this paper, we have laid out our open soft- and conclusions or recommendations expressed in this material ware approach to innovating and sustaining an open cyberGIS are those of the authors and do not necessarily reflect the views software ecosystem. Specifically, to address the requirements of NSF. S. Wang et al. / SoftwareX 5 (2016) 1–5 5

References [12] Tang W, Wang S, Bennett DA, Liu Y. Agent-based modeling within a cyberinfrastructure environment: a service-oriented computing approach. [1] Wang S, Armstrong MP. A quadtree approach to domain decomposition Int J Geogr Inf Sci 2011;25(9):1323–46. for spatial interpolation in environments. Parallel Comput [13] Anselin L, Rey SJ. Spatial econometrics in an age of CyberGIScience. Int 2003;29(10):1481–504. J Geogr Inf Sci 2012;26(12):2211–26. [2] Wright DJ, Wang S. The emergence of spatial cyberinfrastructure. Proc [14] Padmanabhan A, Wang S, Cao G, Hwang M, Zhang Z, Gao Y, Soltani K, Natl Acad Sci 2011;108(14):5488–91. Liu YY. FluMapper: a cyberGIS application for interactive analysis of [3] Wang S, Hu H, Lin T, Liu Y, Padmanabhan A, Soltani K. CyberGIS for massive location-based social media. Concurr Comput.: Pract Exper 2014; data-intensive knowledge discovery. ACM SIGSPATIAL Newslett 2014; 26(13):2253–65. 6(2):26–33. [15] Shi X, Wang S. Computational and data sciences for Health-GIS. Ann. [4] Wang S, Armstrong MP. A theoretical approach to the use of GIS 2015;21(2):111–8. cyberinfrastructure in geographical analysis. Int J Geogr Inf Sci 2009; [16] Lawrence KA, Wilkins-Diehr N, Wernert JA, Pierce M, Zentner M, 23(2):169–93. Marru S. Who cares about science gateways? A large-scale survey of [5] Wang S. A CyberGIS framework for the synthesis of cyberinfrastructure, community use and needs. In: Proceedings of the 9th gateway computing GIS, and spatial analysis. Ann Assoc Amer Geograph 2010;100(3): environments workshop. (GCE’14), Piscataway (NJ, USA): IEEE Press; 535–57. 2014. p. 1–4. http://dx.doi.org/10.1109/GCE.2014.11. [6] Wang S, Wilkins-Diehr NR, Nyerges TL. CyberGIS—toward synergistic [17] Wang S, Liu Y. TeraGrid GIScience Gateway: Bridging cyberinfrastruc- advancement of cyberinfrastructure and GIScience: A workshop ture and GIScience. Int J Geogr Inf Sci 2009;23(5):631–56. summary. J Spat Inform Sci 2012;4:125–48. [18] Finn MP, Liu Y, Mattli MD, Guan Q, Yamamoto KH, Shook E, Behzad [7] Wang S. CyberGIS: Blueprint for integrated and scalable geospatial B. pRasterBlaster: High-performance small-scale raster map projection software ecosystems. Int J Geogr Inf Sci 2013;27(11):2119–21. transformation using the extreme science and engineering discovery [8] Liu YY, Padmanabhan A, Wang S. CyberGIS gateway for enabling data- Environment. In: The XXII international society for photogrammetry & rich geospatial research and education. Concurr Comput.: Pract Exper remote sensing congress, Melbourne, Australia, August 25–September 1, 2015;27(2):395–407. 2012, 2012. [9] Wang S, Armstrong MP, Ni J, Liu Y. GISolve: A grid-based [19] Fan Y, Liu YY, Wang S, Tarboton D, Yildirim A, Wilkins-Diehr N. problem solving environment for computationally intensive geographic Accelerating TauDEM as a scalable hydrological terrain analysis service information analysis. In: Proceedings of the 14th international symposium on XSEDE. In: Proceedings of the 2014 annual conference on extreme on high performance distributed computing (HPDC-14)—challenges of science and engineering discovery environment. (XSEDE’14), Atlanta large applications in distributed environments (CLADE) workshop. IEEE (GA): ACM Press; 2014. July 13–18, p. 5:1–5:2. Press; 2005. p. 3–12. [20] Madhavan K, Zentner L, Farnsworth V, Shivarajapura S, Zentner M, [10] Wang S, Anselin L, Bhaduri B, Crosby C, Goodchild MF, Liu Y, Denny N, Klimeck G. nanoHUB.org: Cloud-based services for nanoscale Nyerges TL. CyberGIS software: A synthetic review and integration modeling, simulation, and education. Nanotechnol Rev 2013;2(1): roadmap. Int J Geogr Inf Sci 2013;27(11):2122–45. 107–17. http://dx.doi.org/10.1515/ntrev-2012-0043. [11] Wang S, Zhu X-G. Coupling cyberinfrastructure and geographic [21] McLennan M, Kennell R. HUBzero: A platform for dissemination and information systems to empower ecological and environmental research. collaboration in computational science and engineering. Comput Sci Eng BioScience 2008;58(2):94–5. 2010;12(2):48–52.