Models of everywhere revisited: a technological perspective BLAIR, Gordon, BEVEN, Keith, LAMB, Rob, BASSETT, Richard, CAUWENBERGHS, Kris, HANKIN, Barry, DEAN, Graham, HUNTER, Neil, EDWARD, Liz, NUNDLLOL, Vatsala, SAMREEN, Faiza , SIMM, Will and TOWE, Ross Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/26223/

This document is the author deposited version. You are advised to consult the publisher's version if you wish to cite from it. Published version BLAIR, Gordon, BEVEN, Keith, LAMB, Rob, BASSETT, Richard, CAUWENBERGHS, Kris, HANKIN, Barry, DEAN, Graham, HUNTER, Neil, EDWARD, Liz, NUNDLLOL, Vatsala, SAMREEN, Faiza, SIMM, Will and TOWE, Ross (2019). Models of everywhere revisited: a technological perspective. Environmental Modelling and Software, 122, p. 104521.

Copyright and re-use policy See http://shura.shu.ac.uk/information.html

Sheffield Hallam University Research Archive http://shura.shu.ac.uk Environmental Modelling and Software 122 (2019) 104521

Contents lists available at ScienceDirect

Environmental Modelling and Software

journal homepage: http://www.elsevier.com/locate/envsoft

Models of everywhere revisited: A technological perspective

Gordon S. Blair a,*, Keith Beven b, Rob Lamb c,b, Richard Bassett a, Kris Cauwenberghs d, Barry Hankin e,b, Graham Dean a, Neil Hunter e, Liz Edwards a, Vatsala Nundloll a, Faiza Samreen a, Will Simm a, Ross Towe a a Ensemble Team, , UK b Lancaster Environment Centre, Lancaster, UK c JBA Trust, UK d Flanders Environment Agency (VMM), Belgium e JBA Consulting, UK

ARTICLE INFO The concept ‘models of everywhere’ was first introduced in the mid 2000s as a means of reasoning about the Keywords: environmental science of a place, changing the nature of the underlying modelling process, from one in which Environmental modelling general model structures are used to one in which modelling becomes a learning process about specificplaces, in Models of everywhere particular capturing the idiosyncrasies of that place. At one level, this is a straightforward concept, but at another Grid computing it is a rich multi-dimensional conceptual framework involving the following key dimensions: models of every­ Cloud computing where, models of everything and models at all times, being constantly re-evaluated against the most current Data science evidence. This is a compelling approach with the potential to deal with epistemic uncertainties and non- linearities. However, the approach has, as yet, not been fully utilised or explored. This paper examines the concept of models of everywhere in the light of recent advances in technology. The paper argues that, when first proposed, technology was a limiting factor but now, with advances in areas such as Internet of Things, cloud computing and data analytics, many of the barriers have been alleviated. Consequently, it is timely to look again at the concept of models of everywhere in practical conditions as part of a trans-disciplinary effort to tackle the remaining research questions. The paper concludes by identifying the key elements of a research agenda that should underpin such experimentation and deployment.

1. Introduction potential generality of the approach.) This is a compelling argument, and a reaction against the view that The concept of ‘models of everywhere’ was introduced by Beven in there can be generic environmental models capable of representing 2007 (Beven, 2007), and revised in a follow up paper (Beven and Alcock, processes and behaviours across multiple places and indeed across 2012). The concept is fundamentally about having a stronger association multiple scales. Such general models are “expensive to develop, difficult between a given environmental model and the place that it represents. In to maintain and to apply because of their data demands and need for the 2012 paper, they argue that it is “useful, and even necessary, to think parameter estimation or calibration” (Beven, 2007). They also have in terms of models of everywhere … [and this] … will change the nature problems in dealing with local epistemic uncertainties and of the modelling process, from one in which general model structures are non-stationarities, for example caused by change in local characteristics used in particular catchment applications to one in which modelling and climate drivers (e.g. Prudhomme et al., 2010; Beven, 2009, 2016). becomes a learning process about places”. The ‘necessity’ stems from the Some examples of models of everywhere have been deployed but for need to constrain uncertainty in the modelling process in order to sup­ relatively constrained applications and at specific scales. However, the port policy setting and decision-making, particularly around water concept has not been developed to the extent that the authors envisaged, management (e.g. flooding and water quality), although the principles where everywhere is represented across all scales in a coherent way. can also potentially apply to other areas of environmental modelling and This paper re-examines the concept of models of everywhere from a management. (In the rest of the paper, we will tend to use examples and technological perspective arguing that, at the time, the underlying illustrations from and floodmodelling although we stress this technology was not sufficiently advanced to support the concept. Now,

* Corresponding author. E-mail address: [email protected] (G.S. Blair). https://doi.org/10.1016/j.envsoft.2019.104521 Received 27 April 2018; Received in revised form 6 August 2019; Accepted 24 September 2019 Available online 27 September 2019 1364-8152/© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521 however, this has changed, with significantdevelopments in areas such scenarios. However, the outputs from such models will be associated as data acquisition techniques, data storage and processing technolo­ with significant uncertainty, even after calibration on historical data. gies, and data analytics capabilities, alongside a move towards a more Thus, this is a prime example where the concept of models of every­ open science supported by these developments. where and the local constraint of uncertainty using local information Note that in developing ideas of models of everywhere we mean would be useful, especially when assessing the uncertainty in potential something quite different to the "hyperresolution" models that are outcomes might make a difference to the decision that might be made. A starting to be used in Earth Systems Science (e.g. Wood et al., 2011; re-evaluation of the concept of models of everywhere is therefore very Beven and Cloke, 2012; Bierkens et al., 2015; Gilbert and Maxwell, timely. 2017). With resolutions of the order of 1 km, the latter do not (as yet) The paper is structured as follows. Section 2 examines the concept of provide simulations and visualisations at scales that local stakeholders models of everywhere in more depth, highlighting the different di­ can relate to directly (see the discussion of Beven et al., 2015). This is a mensions behind this initial vision, and culminating in a set of techno­ critical aspect of how the models of everywhere concept has the po­ logical requirements to support models of everywhere. Section 3 looks at tential to change the way that modelling is done. Both approaches do, the technological landscape, as it existed in the period 2007–2012, however, focus attention on a requirement for scale dependent param­ systematically reviewing the different technological requirements and eterisations that has proven difficult to resolve (e.g. McDonnell and concluding with an overall assessment of technology readiness at that Beven, 2014). time. Section 4 repeats this analysis, but looking at the state-of-the-art The overall aim of the paper is to determine the current feasibility of now. The paper then presents ongoing research in this area, including models of everywhere, particularly in the area of hydrological modelling, the identification of a research roadmap for the implementation of the given the state-of-the-art in underlying technology. This breaks down into concept of models of everywhere, (Section 5). Section 6 documents the following objectives: related work, including existing deployments of the concept of models of everywhere. Finally, Section 7 concludes the paper with some final re­ 1. To carry out a detailed examination of the concept of models of flections on models of everywhere from a technological perspective. everywhere to determine key underlying technological requirements; 2. Models of everywhere unpicked 2. To compare the state-of-the-art in technology in the period 2007–2012 and the present day to evaluate whether the time is now While models of everywhere at one level is quite a straightforward right for a widespread deployment of models of everywhere; concept representing as association of models with particular places, at 3. To provide a research roadmap to support such deployment in terms another level, it is a rich multi-dimensional conceptual framework. In of outstanding research questions and challenges. particular, we discuss three (mutually supportive) dimensions with the goal of highlighting the technological requirements to support the Note that there are other issues related to models of everywhere that overall vision: also should be addressed, most notably human and societal issues. Such issues include the need to move towards open science and open data, Models of everywhere; and the role of communities in improving models in representing local Models of everything; places. These are alluded to in the paper but a full treatment of this Models at all times. important dimension is beyond the scope of the paper. We elect instead to focus on technological readiness. These are discussed in turn below. The work is being carried out in the context of a significant re- evaluation of approaches to flood modelling and associated risk man­ 2.1. Models of everywhere agement. For example, the UK Government’s National Flood Resilience 1 Review, published in September 2016 included important recommen­ 2.1.1. Key characteristics dations around improvements to long-term modelling capabilities. The The starting point of models of everywhere is to move from generic review also encouraged the use of natural flood mitigation methods or models that can then be customised to particular locations, for example “working with natural processes”. This concept involves the use of many through appropriate parameterisation, to models that are specific to distributed in-channel and off-channel storage features, coupled with particular places. As such, they can be tailored to represent the behav­ changes of land use to try to retain more flood runoff in catchment iour at a specificplace without the need to represent any other place. In headwaters, or at least slow down its arrival to areas at risk of flooding particular, observations and inputs from local stakeholders can be used (see Dadson et al., 2017). There are many current projects in the UK that to constrain the uncertainty that is associated with environmental are implementing natural flood management measures. Very few, modelling (e.g. Beven, 2009). however, have been associated with detailed monitoring of changes to Note that models of everywhere is often interpreted as models rep­ the flow, or the operation of individual mitigation measures. Addition­ resenting specific localised areas but the concept does not imply any ally, there are issues about whether the strategy will be effective under particular scale; rather models of everywhere can represent local, extreme floodevents, which in the UK are often preceded by a period of regional, national and global scale with these models often co-existing. prior catchment wetting (see Metcalfe et al., 2017; Hankin et al., 2017). While there is an expectation that model parameterisations should be In fact, by slowing the flowin some parts of the catchment, it is possible resolution dependent (e.g. McDonnell and Beven, 2014; Beven, 2019), in that the peak flow might increase elsewhere. This is called the syn­ the absence of any adequate scaling theory for many environmental chronicity problem, the impact of which will vary from event to event processes, particular models may need to be tailored for the scale at (because of the different patterns of rainfall intensities) and with which they operate in terms of both process representations and effec­ changes in the scale of catchment being considered. The distributed tive parameter values. The approach is illustrated in Fig. 1: a generic nature of this problem, and the potential for such mitigation effects to model with a specificset of parameter values cannot represent flooding have impacts on other environmental factors, requires an integrated at all areas of the catchment and hence fivelocalised models need to be catchment modelling approach to evaluate possible implementation developed. A core motivation of models of everywhere is to constrain uncertainty, exploiting as much knowledge as is available about a particular place. 1 https://www.gov.uk/government/publications/national-flood-resilienc This is developed further when we look at models of everything and e-review. models at all times in Sections 2.2 and 2.3 below.

2 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Fig. 1. Models of everywhere: showing a range of models across a catchment (the Conwy) representing different places, in this case at the same scale (see also Fig. 2/ 3 which build on this).

2.1.2. Technological requirements the flooding from “Surface Water”4 (ubiquitous in coverage, represent­ The main requirement of models of everywhere is very large-scale ing potential flooding from overland flows and ponding, rather than computational capacity. For example, if this approach were to be water overflowing from rivers or the sea), the “Risk of Flooding from adopted for future flood prediction, there would be a need for the Reservoirs” (predicting places at risk in the event of dams and im­ deployment of very large numbers of models all over the country at poundments being breached), and groundwater flooding. different scales. For illustration, consider models applied to support More detailed models also exist in many places to support activities Flood and Coastal Risk Management (FCRM) in England. such as the economic appraisal of proposed flooddefence schemes, flood For any given community (city, town, village or street), there may be risk assessments for proposed floodplain developments, or the detailed an array of models developed, paid for and used by different organisa­ design, construction and maintenance of drainage systems. These tions, for different purposes, and using different data resources (or, very models usually capture further information about infrastructure (e.g. often, common data sets, exploited in different ways). Most localities are bridges, culverts, weirs, sluice gates) and river channel surveys. included within national scale models, sometimes referred to as “stra­ Organisations commissioning and owning such models may include tegic”, with the outputs of the National Flood Risk Assessment (NaFRA2 the Environment Agency, which leads on FCRM in England for local published as “Risk of Flooding from Rivers and Sea”3) being typically the government, water companies supplying drainage services, private de­ most generic. Separate models have been applied to provide mapping of velopers or other landowners. It is usual for all of the above models to evolve over time, incorporating both new data and technical

2 https://www.gov.uk/government/publications/flooding-in-england-na tional-assessment-of-flood-risk. 4 https://www.gov.uk/government/publications/flood-maps-for-surface 3 https://data.gov.uk/dataset/risk-of-flooding-from-rivers-and-sea1. -water-how-they-were-produced.

3 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Fig. 2. Models of everything: showing the rich diversity of data and models that may exist for a given place and how they may inform each other. improvements (e.g. better numerical solution schemes). It is also com­ over the underlying distributed systems architecture to support such mon for multiple instances of each model to be executed, i.e. many in­ massive deployment, e.g. centralised, distributed or decentralized (or dividual “runs” or simulations, to support scenario or uncertainty indeed combinations of different approaches). analysis. The Environment Agency has over 1500 such detailed local The approach also asks fundamental questions over the relationship models, and reported in its 2010–2015 modelling strategy5 an invest­ and consistency of models at different scales and how to support ment of approximately £17 million per year in modelling and mapping reasoning across scales in terms of supporting a deeper understanding of and an additional £15 million in gathering and processing data to sup­ the science and all its complexities and inter-dependencies. port FCRM. This has significant resource requirements in terms of the number of processors or virtual machines to run these models and data storage, as 2.2. Models of everything well as the human costs in terms of developing and tailoring the models for given places, and analysing and understanding the outputs. For 2.2.1. Key characteristics example, production6 of the “Risk of Flooding from Surface Water” maps The second dimension is concerned with exploiting information cited earlier involved more than 70,000 individual simulations of flood about a place, in particular, coupling a model of that place with as much inundation on a mosaic of approximately 7100 36 km2 tiles covering all local data as can be collected, thus embracing the heterogeneity of of England, run on a 2m � 2m resolution digital height map that available data sources. The availability of such data is increasing included over 91,000 manually-determined corrections. This process significantly and now includes (Blair et al., 2019): needed around two months for data preparation and one month of computer processing time, fully utilising a grid of over 100 GPU- Remote sensing data collected by satellites or aircraft-borne in­ accelerated PCs. struments (including drones); Where possible, technological and operational support would need Other monitoring technologies that consist of a range of sensor to be provided for such development. This also asks important questions technologies typically in close proximity with the observed phe­ nomena, including the use of Internet of Things (IoT) technologies to provide real-time streaming and multi-faceted data about the natural 5 https://www.gov.uk/government/uploads/system/uploads/attachment environment; _data/file/292949/geho0310bsbt-e-e.pdf. Historical records held in a variety of locations and scales; The 6 https://www.whatdotheyknow.com/request/208078/response/ increasing amount of data available from national/local government 520170/attach/3/National%20modelling%20and%20mapping%20method% and other open data portals often increasingly offering APIs, e.g. 20statement%20May%202013.pdf. data.gov.uk;

4 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Data mining provides additional information from the web or social ideally this open philosophy would also extend to models, with media; models available as open source. Data collected from citizen science, with the potential to direct cit­ 4. How to make sense of the heterogeneous data elements? It is one thing to izen science to areas of data scarcity. have access to this rich underlying data, but it is another thing to be able to make sense of this data and therefore data analysis techniques Together, this adds up to the potential for having environmental data are also required to build this higher-level of understanding from the at an unprecedented scale (Blair et al., 2019). For example, if focusing underlying data (cf. environmental data science (Blair et al., 2019)). on floodprediction, it is possible to use a variety of data sources such as This will inevitably imply the construction of data models using a historical Parish records and flood marks, satellite imagery, local sen­ rich array of statistical and machine learning techniques. sors, photographs from social media and citizen science to help steer 5. How to combine process models with data models? Once the data models process-based hydrological models. Indeed many researchers are advo­ are constructed, there is a need to couple the data model or models cating such approaches in hydrology, e.g. (Di Baldassarre et al., 2009; with process models to build a complete understanding of a given Smith et al., 2009, 2015), and more generally in disaster risk reduction place. Techniques are therefore required to support model coupling (McCallum et al., 2016). between process and data models. In hydrology, it has been shown The concept also naturally extends to other aspects of environmental that even hydrological observations may not always be informative science, for example collecting and analysing data around water quality in model calibration and validation (e.g. Beven and Smith, 2015). issues, biodiversity, or soils and indeed the inter-dependencies between them. There are also important human and societal issues around privacy The additional dimension of models of everything is shown in Fig. 2. and security but, as mentioned in the introduction, this area is not The concept of models of everything has the potential to further considered in this paper (but is an important area of future investiga­ reduce the uncertainty around predictions for a variety of different tion). Note that the execution of additional data models, and the need to variables of environmental interest in a coherent way. This is particu­ allow for observational uncertainties also increases the computational larly important where the relevant processes are intrinsically coupled, requirements of the system. for example the water flows that drive the transport of nutrients from farmland and households into rivers and lakes. This constraint of un­ certainty is important because many of the sources of uncertainty are 2.3. Models at all times epistemic in nature. Epistemic uncertainties are those are those that arise from lack of knowledge, in contrast to aleatory uncertainties that 2.3.1. Key characteristics represent random variability that derives from ‘irreducible natural The final dimension is that models should be active at all times, variability’ (see Beven, 2009, 2016; Rougier et al., 2013; Beven and constantly re-evaluating what is known about particular places and Hall, 2014; Di Baldassarre and Beven, 2016). By definition, it is not adapting accordingly. This does not necessarily mean that models are possible to deal with epistemic uncertainties in process models without always executing as this would consume significant computational re­ breakthroughs or deepening of knowledge about a given place and its sources without any real gain. Rather, models should execute periodi­ states and behaviours. Beven et al. (2015) also talk about the role of cally and frequently, for example when new data becomes available, to models of everything in overcoming what they refer to as hyper­ understand the “idiosyncrasies of particular places” (Beven, 2007) and resolution ignorance in modelling, that is evaluating the hyperresolution how this might change over time. At other times, the model would be in a information produced by simulation to overcome the local lack of data quiescent state, but otherwise ready to re-execute at any time. This and unknowns in scientific understanding (e.g. the understanding of contrasts significantly with existing practice when distinct model runs subsurface structures in hydrology). are carried out infrequently under the auspices of a scientific experi­ ment, with perhaps more runs carried out an at a pre-deployment phase 2.2.2. Technological requirements to understand the sensitivities and uncertainties of a given (generic) Models of everything adds significantlyto the requirements imposed hydrological model. This iterative approach is nicely aligned with the on the computational infrastructure around five key areas: work of Box (1980) on the iterative relationship between practice and theory (Box’s loop), recently extended by Blei (2014) in the context of 1. How to store the ‘big data’? With models of everywhere, there is a need latent variable models applied to complex data-sets. This may also to capture and store significant quantities of data about a given introduce more consistency between models used for short-term fore­ place, and then repeat this across all places. This therefore very casting, often applied in an adaptive framework where on-line data can quickly becomes a ‘big data’ problem. In many ways, though, this is be used for updating, and simulation models that are rarely updated. more demanding than many areas of big data given the high level of This leads to a new perspective of “modelling as a learning process”, as heterogeneity in the data-sets with some of the data being structured discussed in depth in the 2007 paper (Beven, 2007). The 2012 paper and other elements being unstructured, and inevitably captured in a (Beven and Alcock, 2012) develops this further talking about models as wide variety of formats (Blair et al., 2019). hypotheses to be tested against current and historical observations with 2. How to represent and manage the collected data? Given the variety and some models being rejected in favour of others and indeed this changing heterogeneity discussed above, there is a need to represent, evaluate over time, so the current chosen model structure and associated as­ and manage the overall collection of data, and this must include sumptions best reflect the full idiosyncrasies of a given place as repre­ support for interoperability, data discovery and also the association sented by numerous additional data observations. This in turn leads to of appropriate meta-data and ontologies, including provenance an adaptive approach to modelling. information. The finalaspect to consider is what can be adapted about the model. 3. How to ensure open access to data? The concept of models of every­ There are various possibilities here, increasing in level of sophistication thing implies a move towards open data, where data is openly and ambition: available for use and stored in a way that allows such open access (also important to support a more collaborative and cross- 1. The outcomes from a model can be adapted for the purposes of real- disciplinary science as required to interpret this data). As time forecasting when data can be made available for assimilation, mentioned in the introduction, while this is technically straightfor­ and post-event analysis can then be used to inform local improve­ ward to achieve, this requirement is more concerned with cultural ments to the model, including adaptation of parameter values to best issues, for example around the perceived value of data. Note that represent behaviour at the current time;

5 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Fig. 3. Models at all times: showing the meta-level reasoning framework associated with models as a learning process, extracting meaning from diverse data about a place, applying learning techniques to extract meaning from this data and making appropriate adaptations around model selection and parameterisation.

2. A number of models can co-exist in an ensemble approach, with everywhere, but adds a whole new level of complexity in terms of the model selection applied to identify the best models for that given underlying technology requirements. In particular, the approach am­ place/time; plifies the underlying resource requirements and also the underlying 3. The internal structure and behaviour of a given model can be distributed systems architecture as discussed in Section 2.1. For example, adapted, for example, by changing fine-grained elements of the un­ the approach requires the frequent execution of potentially ensemble derlying hydrology to best reflect the current place/time; models at large numbers of places and different scales. 4. The representation of residual uncertainty can be adapted as more The approach also introduces significant additional (mutually sup­ information is obtained locally. portive) requirements:

Clearly, these approaches can also be combined in different ways. 1. How to support adaptive reasoning? As discussed above, ‘models at all Indeed, a combination of all four offers a new and radical approach to times’ is fundamentally a learning process and implies that models models of everywhere. are constantly adapted in response to new knowledge extracted from The concept of models at all times is illustrated in Fig. 3. available data-sets. There is therefore the need to support this The key motivation of models at all times is to offer a modelling adaptive reasoning and ideally this should involve a strong element framework that supports explicit reasoning about uncertainty, with the of automation as provided, for example, by autonomic computing explicit goal of reducing uncertainty for a given place. More specifically, (supporting self-adaptive systems) (Kephart and Chess, 2003; the approach also has the potential to deal with epistemic uncertainties, McKinley et al., 2004). as argued in (Beven and Alcock, 2012). In this paper, following Beven 2. How to incorporate reasoning about uncertainty? Building on the above, (2006), the authors argue for an approach based on limits of acceptability, it is important that adaptation decisions incorporate reasoning about whereby models that perform well according to such limits are accept­ uncertainty, and this implies making uncertainty explicit in the able (and perhaps reinforced), while others are rejected, with this driven modelling process, and also incorporating approaches to deal with by the collected set of observational data (cf. models of everything). epistemic uncertainties and non-linearities as inevitably encountered Finally, the approach has significant potential to deal with in such complex systems. non-linearities and fundamental changes over time, for example related 3. How to support adaptation? A truly adaptive system requires ready to climate change, with its emphasis on ongoing adaptation to the cur­ access to a range of elements that can be changed. Supporting more rent context. coarse-grained adaptation is relatively straightforward, and imple­ mented in terms of selecting from different models in model en­ 2.3.2. Technological requirements sembles, or changing model parameters. Supporting fine-grained Models at all times is crucial to the overall vision of models of strategies is however more challenging as this requires intimate

6 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Table 1 3. The ability to support modelling as a learning process, including Technological requirements for models of everywhere, everything and at all reasoning about uncertainties (R9, R10, R11); times. 4. Practical issues around deployment at scale, including availability of Requirement Primary motivation(s) open data and approaches to support large-scale deployment (R3,

R1: Massive resource requirements in terms of Models of everywhere, R12). underlying computation and storage (cf. big data) everything and at all times R2: Appropriate underlying distributed systems Models of everywhere, This clustering will be used in the assessment of the changing tech­ architecture for models of everywhere everything and at all times nological landscape as discussed in Sections 3 and 4 below. R3: Support for reasoning across scales Models of everywhere R4: Appropriate data representation architecture, Models of everything recognising the heterogeneity of underlying data 3. Technological landscape (2007–2012) (structured and unstructured) and its complexities R5: Support for data discovery and navigation Models of everything 3.1. Overview of the landscape R6: Supporting open access to data Models of everything R7: Providing rich data analytics methods to make Models of everything sense of this data In the period 2007 to 2012, the landscape was dominated by grid R8: Supporting the integration of process and data Models of everything, at all computing. The concept of grid computing was firstintroduced in in the models times 1990s and became prominent with the publication of the seminal paper R9: Support for adaptive reasoning Models at all times by Foster and Kesselman (1998), introducing the grid as a “blueprint for R10: Support for reasoning about uncertainty, Models at all times a new computing infrastructure”. The term was introduced as a meta­ including epistemic uncertainties and non-linearities R11: Ability to support coarse-grain and fine-gran Models at all times phor for the electricity grid, with the goal of making computational adaptation of environmental models power as accessible and ubiquitous as electricity. Software platforms R12: Supporting deployment of models of everywhere Models of everywhere, were developed to support the deployment of applications and services at scale everything and at all times in the grid, most notably the Globus toolkit, with various versions released starting in 1997 with the last major release (Globus toolkit 7 access to the structure and behaviour of individual models in terms version 5) in 2009. Around this time, the grid was being superseded by of, for example, alternative hydrological equations at the heart of the cloud computing (for example, the first version of Amazon Web Ser­ 8 model. Most existing models will not provide such access, i.e. black vices was introduced in 2006 with rapid growth since); this growth in box implementations. To fully realise the vision however, we need to cloud computing is discussed further in Section 4.1 below. go further than this and provide more white-box access to internal In parallel, researchers were becoming interested in the use of such software architectures of environmental models, as provided by, for computational power to support a range of application domains example, reflective architectures (Maes, 1987; Kon et al., 2002). including, for example, eCommerce. Most notably, in the context of this paper, there was also great interest in eScience, that was, the use of ‘Models at all times’ also places additional emphasis on the need to technological infrastructure including the grid to support a new kind of integrate process and data models (discussed in Section 2.2) to support computationally intensive and data-rich science (Hey et al., 2009). For adaptive reasoning. example, in the UK, the national eScience programme ran from 2001 to There is also an over-arching requirement emanating from this 2010, supporting a range of infrastructure projects and application analysis, and that is the ability to support deployment at scale, and this projects in areas as diverse as bioinformatics, neuroinformatics and implies the ready deployment of individual models (of everywhere) and medical informatics. In the environmental area, the most prominent 9 also of large numbers of models at different scales. There are many di­ project was climateprediction.net. Similar, large-scale initiatives were mensions to this scalability involving, for example, making it easier to launched in other countries, for example in the States the National deploy models in underlying computational infrastructure, whether Science Foundation (NSF) funded a series of cyberinfrastructure initia­ 10 provided by HPC or cloud facilities, offering software frameworks that tives starting around 2003, including the Open Science Grid developed can support the deployment of models or ensembles of models ready to by the Open Science Grid Consortium (OSGC). be tailored for the idiosyncrasies of places, and automating the subse­ quent adaptation/learning process (hence the importance of self- 3.2. Addressing the requirements adaptive approaches). 3.2.1. Underlying technological infrastructure The emergence of the grid and also the eScience community that 2.4. Overall analysis coalesced around the grid provided important expertise, experience and also facilities to support the development of models of everywhere. ‘Models of everywhere’ is an important and potentially crucial However, in practice (and this is clear in retrospect), the grid did not approach to environmental modelling, particularly in terms of managing meet the full set of requirements to support the broader vision of models uncertainty. A full implementation of the concept however imposes very of everywhere. significant requirements in terms of the technological infrastructure Although the vision of the grid was to provide plentiful resources on alongside other fundamentals, most notably cultural elements around a demand, the reality was somewhat different at that time. The avail­ move to open science (incorporating more open approaches to data and ability of resources varied greatly and depended on access to one of the modelling). The approach is best understood as a combination of models experimental grid facilities that were introduced in different global of everywhere, everything and at all times, with this trichotomy used to centres. The overall distributed systems architecture was therefore one analyse the overall requirements in more depth in the discussions above. of centres at given fixed locations offering (by definition) relatively The resultant requirements are shown in Table 1 below. centralised services with partial access and limited control of these These requirements can usefully be clustered as follows: services. A number of researchers explored more decentralized

1. The capacity and level of sophistication of the underlying techno­ logical infrastructure in terms of both computation and data (R1, R2, 7 http://toolkit.globus.org/toolkit/. R4, R5); 8 https://aws.amazon.com/. 2. The availability of rich data analytics capability to make sense of 9 http://www.climateprediction.net/. complex and highly heterogeneous data-sets (R3, R6, R7, R8); 10 https://www.opensciencegrid.org/.

7 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

sophistication to meet the technology infrastructure requirements as identified in Table 1. The main middleware technology used at the time was the Globus Toolkit, with the overall architecture of the Toolkit (v5) shown in Fig. 4. This was a large and complex architecture with many dimensions but, as can be seen, the emphasis is on supporting resource sharing, and at a fairly low level of abstraction. As stated in the seminal paper on the “anatomy of the grid”, Foster et al. (2001) argue that the grid was fundamentally about coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organisations”. They go on to argue that this implies “direct access to computers, software, data, and other resources” …. and this sharing should be “highly controlled”. Hence, the emphasis was very much on meta-level concerns such as standardised APIs to ensure interoperability, service discovery, access control and resource management. There was also more emphasis in practice on computational resources rather than data management, for example GRAM (Grid Resource Allocation and Management offered an architecture to submit and monitor (batch) jobs in the Grid (see Fig. 5). Fig. 4. The architecture of the Globus Toolkit Version 5 (http://toolkit.globus. This is quite different though from the execution style required for org/toolkit/about.html). models of everywhere.

Fig. 5. The architecture of GRAM (http://toolkit.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Approach.html). architectures, for example climateprediction.net (mentioned above) and As mentioned above, the data side was quite primitive with an SETI@home11 utilising BOINC12 – a more peer-to-peer volunteer emphasis on low-level facilities for access to data remotely (GridFTP13) computing platform, but such initiatives were not mainstream and not and to assist in replication of data. While the grid was used successfully integrated into other grid initiatives. for a number eScience experiments involving elements of ‘big data’, the It is also important to emphasise that this was a research programme level of sophistication of data management was insufficientfor the rich and hence the underlying platforms were not stable, with frequent and heterogeneous data required for models of everywhere (and asso­ changes over time in terms of services and facilities on offer. As will be ciated needs in terms of discovery and navigation), and indeed for seen, this contrasts significantly with what is available now in terms of environmental data more generally. As will be seen below, this is one both capacity and stability of services (see Section 4.2). area that has advanced significantly in the last few years. More fundamentally, the services on offer did not have the level of There was also a general lack of experience of using grid computing

11 https://setiathome.berkeley.edu/. 12 https://boinc.berkeley.edu/. 13 http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/.

8 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521 for the earth and environmental sciences. Other scientific communities the numerical schemes that are used to solve them, the discretisations were more advanced in terms of their use of grid computing and involved in applying those schemes to real data sets, and in the very embracing eScience. This led to a lack of services specificto this field,e. many “edge cases” for which special solutions are required. Benchmark g. to support environmental modelling in the grid, although parallel comparisons16 have shown how important these differences can be in developments such as the OpenMI (Open Modelling Interface) stan­ controlling the results of floodsimulations in various situations. There is dard14 as adopted by the Open Geospatial Consortium, provided also a strong body of research on training models based on historical important building blocks to support model deployment and (most data, and current observations can be used to steer future states of the crucially) interoperability across models. model (data assimilation) (Lahoz et al., 2010; Park and Xu, 2017). In summary, grid computing was important in terms of establishing a More generally, in the time period under consideration, there was a community working together on distributed architectures and infra­ deep concern that process models alone are not sufficient, and that structure and, in particular, for building a strong dialogue with the fundamental issues remain, for example, reasoning about uncertainty science community in terms of a new open, computational and data-rich and dealing with epistemic uncertainties and non-linearities in complex style of science. However, there are a number of limitations that systems. Indeed, this is the prime driver for models of everything. This impacted on the feasibility of models of everywhere, most notably the reflectsa sense that it is necessary to integrate the process model view of difficulties of access to computational and data resources, the lack of science with one that recognises the importance of data and associated sophistication of the distributed infrastructure particularly in terms of data analytic techniques (effectively data models). This is a significant data management, and the primitive nature of many of the services on cross-disciplinary challenge requiring input from environmental, com­ offer (also revisited below under ‘deployment at scale’). puter and mathematical scientists. At that time, this dialogue was not happening (discussed further in Section 4.1). Scientists also tended to 3.2.2. Data analytics focus on specificexperiments and studies to understand phenomena at a The ability to represent and access very-large scale and highly het­ given scale, so reasoning across scales was in its infancy. erogeneous data is important. Equally, it is crucial to have a range of Overall, even by 2012, there were major barriers around data ana­ techniques to make sense of this data. As can be seen above, this re­ lytics that made it very hard to support the realisation of models of quires: a move towards open data as a prerequisite for open science; the everywhere. availability of a rich set of techniques to analyse data; the ability to extend this reasoning across scales; and an integration of process 3.2.3. Modelling as a learning process modelling with data models produced to analyse the complex data (over As discussed above, the perspective of models as a learning process is and above the baseline requirement for storing, accessing and managing the most important but also most demanding aspect of implementing large and complex data-sets). Open data was in its infancy in the period models of everywhere requiring a new, adaptive approach to learning. 2007–2012 with data often regarded as core intellectual property with From our analysis, this breaks down into support for adaptive reasoning, many institutions seeking ways to commercialise their rich data-sets. explicitly representing consideration of uncertainty in this reasoning, There was however, a growing recognition with the complexity of and also being able to carry out both coarse-grained and (importantly) modern science, that a new, more open approach to data was necessary. fine-grained adaptations. For example, the Royal Society published “Science as an Open Enter­ In the field of computer science, in the time period under consider­ prise” in 201215 with a core recommendation: ation, a deep understanding of adaptive computing developed. For example, IBM launched a new initiative examining autonomic systems “Scientists should communicate the data they collect and the models in 2001, that is systems that can self-manage (mirroring the autonomic they create, to allow free and open access, and in ways that are functioning of nervous system in the human body), in terms of a range of intelligible, assessable and usable for other specialists in the same or self-* properties (e.g. self-awareness, self-configuration,self-healing and linked fields wherever they are in the world” self-optimisation) (Kephart and Chess, 2003). More generally, there was This built on the emergence of Science 2.0 (Waldrop, 2008), seeking a large literature around software architectures to support an open approach to science based on emerging Web standards self-adaptation (including reflective architectures), the use of control (particularly Web 2.0 technologies offering user generated content and a loops in decision making, and the use of more advanced machine move towards a more social web). In practice, however, at that time learning techniques to support higher levels of autonomic there were many cultural and technological barriers to a world where self-management, for example dealing with unknowns (Oreizy et al., data-sets were available for open access in common repositories. 1999; McKinley et al., 2004). However, there has been little or no In terms of making sense of data, the environmental sciences, consideration about how such techniques can be used in terms of including hydrology, make extensive use of process models to under­ adaptive environmental modelling. Existing environmental models are stand fundamental processes of nature and then use these models to also often written in older programming languages, most notably make future predictions. A wide range of process models have been Fortran, and tend to be monolithic, black box implementations, hence developed, for example in hydrology, where there have been recent do not lend themselves to the implementation of fine-grainedadaptation attempts to incorporate multiple process components into a common strategies. framework (e.g. Fenicia et al., 2011; Clark et al., 2015). In applications There is also the important requirement to represent and reason for flood risk assessment, there have been many codes routinely used about uncertainty explicitly as part of the adaptation process. At that both by industry and researchers to model the flowof water through the time, there was growing recognition of the need to represent uncertainty landscape, including interactions with physical infrastructure systems. in modelling and reasoning about uncertainty across scientific experi­ These codes can be categorised reasonably precisely in terms of the ments. For example, the UncertWeb project introduced techniques to approximations made to a set of physical governing equations (in the capture uncertainties as meta-data in web-based environments, with case of flood models this means simplifications of the fundamental details of the resultant UncertWeb framework published in early 2013 Navier-Stokes equations for fluid dynamics). Even so, there remain (Bastin et al., 2013). Researchers had also developed a number of differences in the interpretation of the prototypical physical equations, frameworks to reason about uncertainty in scientific experiments, including seminal work by Beven and Binley (1992, 2014) and others

14 http://www.openmi.org/. 15 https://royalsociety.org/~/media/Royal_Society_Content/policy/projects/ 16 https://www.gov.uk/government/publications/benchmarking-the-latest sape/2012-06-20-SAOE.pdf. -generation-of-2d-hydraulic-flood-modelling-packages.

9 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Table 2 models of everywhere, mapping individual active spatial objects to Assessment of technological readiness (2007–2012). places and also explicitly representing the relationship between places Requirements Technological Most significant barriers (mainly in terms of fluxes). This is an attempt to seek a higher level of cluster readiness abstraction to support the deployment of models of everywhere. At the Technological ** Insufficientlevel of resources offered by time of writing, object-oriented computing and indeed distributed ob­ infrastructure the grid; lack of stability of grid jects were an important area of research reflected in the importance of platforms; lack of sophistication of technologies such as CORBA (Common Object Request Broker Archi­ services offered; lack of support for tecture).17 This approach is now largely superseded by alternative pro­ complex and highly heterogeneous gramming models, reflecting (most principally) difficulties in realising data. Data analytics * Lack of progress towards open data; distributed objects in Internet-scale developments. immaturity and lack of cross- disciplinary dialogue on data analytics; 3.3. Overall assessment and technological readiness lack of sophistication in dealing with uncertainty in process models; lack of research on process and data model It is clear from the assessment above that, even by the end of this integration; lack of research on period (2012), there were major technological barriers in terms of the reasoning across scales. deployment of models of everywhere. Our overall assessment is sum­ Modelling as a ** Lack of cross-disciplinary research marised in Table 2, which shows an overall rating against each of the learning process looking at adaptive techniques in environmental modelling; little support requirements together with the identification of the most important for fine-grained adaptation due to barriers. existing model structures; major issues As can be seen from Table 2, the overall readiness level is generally around representing and reasoning low to medium, with important barriers remaining across all categories. about uncertainties; lack of support for It is interesting to note that quite a number of the barriers are due to a epistemic uncertainties and dealing with non-linearities. silo-ed approach to research and can be addressed by more cross- Deployment at scale * Low level of abstraction in grid disciplinary collaboration in this area. Overall, we would argue that, environments; lack of programming in the period 2007–2012, the vision of models of everywhere was right models or frameworks to support but the technology was not ready. We continue our discussion by deployment at scale. considering how things have advanced to date, noting important de­ Key (readiness level): **** ¼ very high (no significant barriers); *** ¼ high velopments that make an implementation of the concept more realistic. (some significantbarriers); ** ¼ medium (a number of important barriers); * ¼ low (major barriers remain). 4. Technological landscape (current)

(see Renard et al., 2010; Vrugt and Sadegh, 2013; Nearing et al., 2016). 4.1. Overview of the landscape As discussed in the 2012 models of everywhere paper, Beven and Alcock (2012) were just starting to think about reasoning about uncertainties in The technological landscape has changed enormously since 2012 model selection or rejection (as a key part of models as a learning and indeed this is one of the key drivers to revisit the concept of models process). of everywhere in terms of technological readiness. In particular, there In summary, most of the building blocks were there by 2012, but the have been three mutually supportive areas of significant innovation, work was fragmented and split across many communities, and key issues namely cloud computing, data science and IoT. We look at each in turn remained over how to support more advanced reasoning of un­ below. certainties, including dealing with epistemic uncertainties. The concept of cloud computing first came to prominence in the last decade. For example, Amazon introduced Amazon Web Services, as an 3.2.4. Deployment at scale early cloud offering, in 2006. It has really been in the last five years Finally, and importantly, there is the key question of whether there though that the area has exploded in terms of scale and sophistication of was sufficient advancement at that time to support deployment of the the underlying services on offer. The cloud is defined as “a set of kind of scale that makes models of everywhere a reality. As discussed Internet-based application, storage and computing services sufficientto above, there are several key dimensions to support such large-scale support most users’ needs, thus enabling them to largely or totally deployment, including how easy it is to deploy individual models, dispense with local data storage and application software” (Coulouris what support there is to then repeat this across many places (at different et al., 2011). Cloud computing further promotes the view of everything as scales) and also whether the learning (and hence tailoring process) can a service, from low-level services such as data storage or virtualised be automated. The latter issue is intrinsically inked to the support for machines, through intermediary middleware services supporting par­ self-adaptive modelling and hence we focus more on the firsttwo issues. allel/distributed computing or database facilities, through to a plethora One of the key problems with deploying in the grid environments, or of applications (referred to as Infrastructure as a Service (IaaS), Platform indeed to other HPC facilities, is the low level of abstraction offered by as a Service (PaaS) and Software as a Service (SaaS)). Cloud computing software platforms. This was discussed in the consideration of the may be offered by companies and made available to others as services, i. Globus Toolkit above. Given this, the development and deployment of e. public clouds such as those offered by Amazon, Google, IBM, Microsoft even an individual model is a tedious, expensive and error prone process and Yahoo, or private clouds that can be established within an organi­ and this in itself is a barrier (Simm et al., 2018) to the deployment of sation or associated community (e.g. using open source software such as models of everywhere. This is a barrier to more general deployment OpenStack or CloudStack). Hybrid solutions are also possible where an across a range of places where the individual models need to be specific organisation may have their own private cloud but extended with extra to this place, both initially and also with the model or models refined capacity from public clouds. There is also a move in cloud computing over time to reflect the particular idiosyncrasies of this place. This im­ from owning resources to a more elastic use, where resources can be plies some form of software framework coupled with models as a requested (and paid for in the case of public clouds) only when required. learning process and, at that time, this was significantly beyond the The growth of cloud computing over the last fiveyears in particular state-of-the-art for model development. It is interesting to note that the initial models of everywhere paper (Beven, 2007) discusses an object-oriented approach to programming 17 http://www.omg.org/spec/CORBA/.

10 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Fig. 6. Expected growth of IoT (https://www.i-scoop.eu/internet-of-things-guide/). has been phenomenal. For example, a report by Cisco indicates that in area that is relatively under-developed (it is though one of the major 2015 total data storage capacity in data centres is 382 EB, with this themes of the Data Science Institute19 at Lancaster University, UK). projected to grow to 1.8 ZB by 202018. There has also a corresponding Finally, there have been significantdevelopments in the area of IoT, growth in processing capabilities, and innovation around cloud services, with the Internet evolving from being an Internet of computers to one most notably for the purposes of this paper in the area PaaS, with a wide that is an Internet of ‘Things’, with the Things being everyday objects range of new services introduced to storage and process massive data- with embedded intelligence (Atzori et al, 2010). Experts predict that IoT sets, e.g. BigTable, Cassandra and HBase in terms of ‘big’ data storage will embrace over 50 billion devices by 2020 (see Fig. 6). and MapReduce and Apache Spark in terms of distributed computation As with data science, the main growth areas are expected to be (we return to this innovation in Section 4.2 below). around smart cities, logistics and transport and health and wellbeing. The developments in cloud computing have also stimulated interest There is also significant potential for IoT deployments in the natural in ‘big data’ or more generally data science, that is the science of ana­ environment; for example, Nundloll et al. (2019) describe an experiment lysing and making sense of very large and/or highly complex data-sets. in deploying an environmental IoT in a catchment in Wales. This is a fundamentally cross-disciplinary area of study involving, for It is clear however that this is an area in its infancy. The real sig­ example mathematical sciences, computational sciences and areas of nificance of IoT technology in this area is when data can be combined application. To support this, a number of such cross-disciplinary in­ with other sources including remote sensing, earth monitoring tech­ stitutes have been set up worldwide, including the Alan Turing Institute nologies, historical records and other data mined from the web in sup­ in the UK, and Data Science Institutes at Berkeley and Columbia in the port of models of everything (as discussed in Section 2.2). United States, and Imperial College London, UCL, Warwick and Lan­ These technologies are mutually supportive in that cloud computing caster also in the UK, amongst many others. With the huge investments provides the underlying very large-scale and elastic pool of resources in data science, there is a growing body of literature on techniques to and associated services to store, process and present very large data-sets. extract meaning for large and complex data-sets, including techniques Data science provides a range of methods to make sense of complex data that embrace unstructured data. More importantly, there is a dialogue and extract meaning from this data, and IoT technology provides access across disciplines to understand how different techniques can work to real-time observations on a very large scale. This symbiotic rela­ together to resolve major challenges around big data. tionship is illustrated in Fig. 7. A lot of the research in data science is targeted at underlying algo­ rithms and their scalability and efficiency.There is also an emphasis on more applied research, most notably in the areas of eCommerce and 4.2. Addressing the requirements marketing, smart cities, logistics and transport, and also health and wellbeing (Blair et al., 2019). There is also huge potential in data science 4.2.1. Underlying technological infrastructure for the natural environment although, perhaps surprisingly, this is an The underlying technological infrastructure has changed signifi­ cantly in terms of both the availability of large-scale computational re­ sources and also the stability of the associated platforms. There has also

18 https://www.cisco.com/c/dam/en/us/solutions/collateral/service-pro vider/global-cloud-index-gci/white-paper-c11-738085.pdf 19 http://www.lancaster.ac.uk/dsi/.

11 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Services to support scientific workflow in the cloud, e.g. Taverna38 and Kepler.39

There is also strong interest in achieving integration between cloud computing and IoT technology, although this work is at early stages of development. Most significantly, there is a rapidly growing body of research around edge computing (sometimes also referred to as fog computing) to provide intermediary storage and processing capabilities closer to end devices (Lopez et al., 2015). For example, edge devices can be used to carry out initial analyses of real-time streaming data from IoT devices, with only aggregate or significant data then sent to the cloud environment. Edge computing can also support the integration of mobile devices (Ahmed and Ahmed, 2016). The technological landscape has therefore changed dramatically with many of the technologies now in place to support models of Fig. 7. The symbiotic relationship between cloud computing, data science everywhere. A number of significant barriers though still remain, most and IoT. notably the lack of standardisation in cloud computing, with different providers offering quite distinct programming paradigms and APIs. This been significant innovation in this area with an explosion of new ser­ leads to problems of vendor lock-in and also difficulties in managing vices now available. computations that span multiple providers (including hybrid cloud en­ The developments in cloud computing, as documented above, are vironments embracing public and private providers). There are also particularly significant in this regard. Whereas grid computing was a difficultiesin programming and managing the underlying technological rather niche and immature technology, cloud computing provides access infrastructure especially when combining cloud computing with IoT to an abundance of underlying resources (in terms of both computation technology, the result being a rather sophisticated but highly complex and storage) and also an ever increasing set of associated services. The system in itself (more accurately described as a system of systems services most relevant for models of everywhere include: (Jamshidi, 2011)). We return to this point below (under deployment at scale). A rich underlying set of programming constructs to support distrib­ uted programming, including service-oriented architecture, con­ 4.2.2. Data analytics tainers and microservices/serverless computing, e.g. Docker20 and There have been similar advances in terms of data analytics. There is Rocket21 (for containers) and OpenWhisk22 and AWS Lambda23 (for now much more awareness of the need to move to open science, microservices/serverless approaches); including the need for open data policies. Governments and research Services to support the subsequent deployment and execution of funding bodies are also moving towards the need for open data, and complex distributed executions, e.g. Kubernetes24 and ZooKeeper25; there is a similar move towards more open, reproducible or repeatable A range of underlying storage architectures that cater for very large science.40 It is fair to say though that important barriers remain and scale and highly heterogeneous data-sets, including unstructured these tend to be cultural rather than technological.41 data, e.g. Cassandra26 HBase27 and MongoDB28; In terms of making sense of data, the emergence of data science as a Parallel and distributed programming paradigms to process and discipline is strongly encouraging albeit with a need to attract more data manipulate such data-sets, including historical and streaming data- scientists to work on environmental challenges and problems (Blair sets, e.g. the Hadoop framework29 MapReduce (Dean and Ghema­ et al., 2019). The most important development has been the wat, 2008), Spark30 and Pig31; cross-disciplinary dialogue that is now happening within the data sci­ Techniques to semantically enrich and subsequently navigate very ence community involving statisticians, computer scientists and domain large scale and highly heterogeneous data-sets, e.g. building on experts (amongst others). This is very significant and is leading to technologies such as OWL, SPARQL and RDF32 and also graph da­ breakthroughs in terms of efficient algorithms and their application in tabases such as GraphDB33 AllegroGraph34 or Neo4j35; important societal problems. In the Data Science Institute at Lancaster, Software frameworks and associated libraries to support data ana­ for example, we are interested in how contemporary techniques such as lytics, e.g. Mahout36 or RStudio37; extreme value theory, changepoint analysis, time-series analyses and statistical/machine learning can be applied to complex environmental data. We are also particularly interested in how resultant data models can co-exist and inform process models, combining stochastic and 20 https://www.docker.com/. deterministic understanding of complex environmental phenomena. 21 https://coreos.com/blog/rocket.html. 22 https://openwhisk.incubator.apache.org/. While there is increasing awareness of the potential of such approaches, 23 https://aws.amazon.com/lambda/. this is a relatively immature area; solutions tend to be ad hoc and a more 24 https://kubernetes.io/. principled understanding of how such techniques can work together has 25 https://zookeeper.apache.org/. yet to emerge. There is also a similar narrative around reasoning across 26 http://cassandra.apache.org/. scales; while there is more experience of this in the earth and environ­ 27 https://hbase.apache.org/. mental sciences, the solutions are also quite ad hoc and often not shared 28 https://www.mongodb.com/. across different areas of study. 29 http://hadoop.apache.org/. 30 https://spark.apache.org/. 31 https://pig.apache.org/. 32 https://www.w3.org/standards/semanticweb/. 38 http://www.taverna.org.uk/. 33 http://graphdb.ontotext.com/. 39 https://kepler-project.org/. 34 https://allegrograph.com/. 40 http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/proje 35 https://neo4j.com/. cts/sape/2012-06-20-SAOE.pdf. 36 http://mahout.apache.org/. 41 http://crc.nottingham.ac.uk/projects/rcs/OpenScience_Report-Sarah_Curri 37 https://www.rstudio.com/. er.pdf.

12 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

In conclusion, there have been significantdevelopments since 2012, used in a number of local flood risk assessments and water resource particularly in terms of the required cross-disciplinary dialogue around management projects (e.g. Lane et al., 2011; Landstrom€ et al., 2011; data science for the natural environment. Nevertheless, this work is still Evers et al., 2012; Maskrey et al., 2016; Ferr�e, 2017; Basco-Carrera et al., at a relatively early stage of maturity with important (and fairly unique) 2017; see also Voinov et al., 2016). An important component of this challenges of this area still to be addressed (Blair et al., 2019). learning process is the potential to visualise model outcomes at scales that allows consideration of local detail by local stakeholders so that 4.2.3. Modelling as a learning process different scenarios (and their uncertainties) can be explored in collab­ As discussed above, many of the building blocks for models as a orative ways (Hankin et al., 2017, see below). learning process were already in place by 2012, albeit fragmented across different communities. The state-of-the-art now is quite similar and 4.2.4. Deployment at scale there remains a need for stronger cross-disciplinary dialogue between There have been several important developments in terms of researchers working on environmental modelling, data science and deploying at scale, with containers in particular making it far easier to adaptive/autonomic computing. The most significant changes in this deploy and subsequently manage executing models in the cloud in a time have been: i) advances in areas such as statistical and machine platform-independent manner. The availability of cloud-based workflow learning that directly supports meta-reasoning about model selection engines is also significant, although there are questions over whether and rejection, and ii) the computational capacity offered by the cloud, workflow offers the right abstraction for all elements of environmental which supports both the execution of complex environmental models in modelling (Blair et al., 2019). the cloud, and the execution of associated reasoning algorithms. More generally, there is still a problem-implementation gap (France There has been little progress on the crucial area of uncertainty – in and Rumpe, 2007) between what scientists would like to do in the cloud, terms of representing uncertainty explicitly in computations, and also and the level of support offered by existing technologies and services, reasoning about uncertainty as part of the decision making process. with a prior knowledge of the underlying technical details required. This More generally, one of the most significantdevelopments over this time makes it very time consuming and also error prone to execute envi­ period in the environmental sciences has been the recognition of the ronmental models or ensemble models in the cloud, and also requires unavoidable uncertainties associated with predictive models, whether access to computing expertise, which may be a scarce resource in many used for simulation or forecasting purposes (e.g. Beven, 2009). As noted environmental research labs. In the context of models of everywhere, the earlier, a primary driver for the models of everywhere concepts was the models may themselves be quite complex, involving different ensembles potential for using local information to constrain local uncertainties in of process models or the integration of process and data models for predicting local variables. This is not just a problem of assessing the example. This makes the cost quite prohibitive, especially when this statistics of model residuals (though many studies have approached the would entail the deployment of many instances of these models at many problem in this way). This is because many sources of uncertainty are different places and scales. the result of lack of knowledge about processes, variables or forcings Software frameworks offer a promising technology to support the (particularly into the future) that are not necessarily easily represented more rapid deployment of recurrent software architectures (Johnson, in simple statistical forms. In particular, input uncertainties will be 1997). Software frameworks are tailored towards particular domains of processed through the nonlinear dynamics of a model to produce com­ application, abstracting over the lower level details and capturing the plex nonstationary residual structures, that will then interact with un­ commonalities within that domain, while allowing some degree of certainties in the observational data used in model evaluation, which specialisation. They are heavily used in cloud computing, for example might also have associated epistemic uncertainties (e.g. in hydrology, MapReduce abstracts over the complexities of managing a large and arising from the rating curves used in the estimation of river flows,(see complex underlying cloud infrastructure and supports the execution of Westerberg et al., 2011; Westerberg and McMillan, 2015; Coxon et al., distributed algorithms in the cloud, allowing the user to plug-in and 2015)). These issues underlay the development of the Generalised specialise the computation through providing application specific map Likelihood Uncertainty Estimation (GLUE) methodology (e.g. Beven and and reduce operations (Dean and Ghemawat, 2008). At present though Binley, 1992, 2014; Beven, 2006, 2016), which includes some statistical such frameworks tend to be relatively generic, e.g. dealing with methods as special cases. distributed computation, and are not specific enough to support some­ As new data become available, it should be possible to learn more thing as domain dependent as environmental models. about the characteristics of the uncertainties associated with different In terms of programming models, distributed objects have now been predictands, at least where the new data are informative (that this may replaced by alternative paradigms supported in the cloud, around not always be the case has been shown by Beven et al., 2011 and Beven service-oriented architecture enhanced by concepts such as deployment and Smith, 2015). In doing so, it will be possible to combine prior in­ in containers and optionally support for microservices. This approach formation with the new information to update the estimates. This leads overcomes the problems associated with distributed object technology, naturally to a form of Bayesian reasoning, where uncertainties can be being much better suited to large-scale Internet wide deployment. Some represented as probabilities, but much more research is needed in research is required though in terms of how to map models of every­ environmental models to understand how best to definethe likelihoods where on to such programming concepts. used in the Bayesian methodology. Simple statistical likelihood func­ tions used with multiplicative Bayesian updating appear to lead to 4.3. Overall assessment and technological readiness overconditioning of model parameters because they do not take any account of the epistemic nature of sources of uncertainty (e.g. Beven It is apparent from the discussion above that there have been sig­ et al., 2011; Beven, 2016, 2019). There are also issues of whether even nificant advances in the underlying technology to support the vision of the best models might be fit-for-purpose for the type of decisions that models of everywhere. Equally, a number of barriers remain. Our overall they might be used for (see the discussion of Beven and Lane, 2019). assessment is summarised in Table 3, repeating the style of analysis A critical aspect of the models of everywhere concept is the potential carried out for the period 2007–2012 (in Table 2). for using local knowledge within this learning process to improve the From this analysis, we can see that there have been significantshifts representations of places. This is where information from local stake­ in readiness around the underlying technological infrastructure and in holders and the Internet of Things might be used in local model evalu­ data analytics and also (partially) around supporting deployment at ations to reject potential model structures and constrain uncertainties in scale. Support for modelling as a learning process has not changed much parameterisations and outcomes. This can be considered as an extension although the developments in cloud computing and data analytics does of the collaborative and participatory learning that has already been offer the potential (as yet unrealised) of significantadvances in this area.

13 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Table 3 technology). The initial deployment is targeting a specificplace with the Assessment of technological readiness now. intention of having a modelling framework that is able to capture and Requirements Technological Most significant barriers indeed learn the idiosyncrasies of that place. The overarching goal of cluster readiness this work is to identify software architectural principles for imple­ Technological *** Lack of standardisation in cloud menting models of everywhere in the cloud with a view to supporting infrastructure computing; difficulties in managing more widespread deployment of models of everywhere at different complex underlying distribute systems places and at different scales (discussed further below). We are also infrastructure (or systems of systems). strongly interested in supporting decision making at different scales, for Data analytics *** Cultural impediments to open data; example over the potential effectiveness of different natural flood need to address particular data science challenges related to the environment, management strategies and also over how to use constrained national or including around process and data regional budgets most effectively. model integration and on reasoning The high-level systems architecture is as shown in Fig. 8 below. across scales. This recognises the existence of multiple sources of data and the Modelling as a ** Lack of cross-disciplinary research learning process looking at adaptive techniques in importance of integrating this data and, in turn, looking at model inte­ environmental modelling; little support gration on top of this, which includes both data and process models for fine-grained adaptation due to coupled together. The top layer then supports interrogation and existing model structures; major issues querying of the information about that particular place. This maps on to around representing and reasoning a more detailed cloud-based systems architecture exploiting the range of about uncertainties; lack of support for epistemic uncertainties and dealing services supported by the cloud in each of these areas. This is shown in with non-linearities. Fig. 9. Deployment at scale *** Problem-implementation gap and the Through this work, we intend to overcome the remaining techno­ need to raise the level of abstraction in logical barriers around models of everywhere and therefore open the terms of supporting execution in the cloud; lack of experience of using cloud door to the desired, more widespread deployment of the concept in programming paradigms in this area. hydrology and beyond. Further details of this research can be found in Towe et al. (2019) Key (readiness level): **** ¼ very high (no significant barriers); *** ¼ high with Edwards et al. (2017) also discussing human and societal di­ (some significantbarriers); ** ¼ medium (a number of important barriers); * ¼ low (major barriers remain). mensions of the research around decision making. Our overall research roadmap is summarised in Table 4, which shows the key research questions and associated areas of investigation. The need for cross-disciplinary dialogue is a common theme across all Having deployed the concept of models of everywhere at a given these areas and is crucial in terms of addressing the remaining barriers. place, we then hope to consider how to generalise the approach to model Overall, we conclude that, in terms of technological readiness, the other environmental facets at that place, to model other places time is right to carry out large-scale experiments of the concept of (including places at other scales), and to support coherent reasoning models of everywhere. The next section explores ongoing research in across scales. This also involves key questions over discretisation, this area. especially given the fact that data may exist at different scales for a given place. In the longer term, we will also be interested in how the concept 5. Initial experiments and research roadmap can be applied to other areas of environmental science, including biodiversity and soil management, and also how models of everywhere Ongoing research at Lancaster is looking at an experimental can help us in understanding the inter-dependencies across such areas (a deployment of the concept of modes of everywhere in the area of hy­ key motivation of models of everything as discussed above). This quickly drology, supported by recent developments in cloud computing, data becomes a large research agenda that goes beyond the scope of our science and new sources of data (including but not limited to IoT research study, and we hope to stimulate other research to address these

Fig. 8. High-level systems architecture.

14 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Fig. 9. More detailed cloud-based systems architecture.

key issues. Table 4 Overall research roadmaps. 6. Related work Research questions Potential solutions

How to effectively and efficiently map Identifying appropriate software There are, of course, already models of everywhere (and to some the concept of models of everywhere architectures for models of everywhere sense everything) in the sense of global earth system science models that on to contemporary cloud and examining the mapping on such have been developed from global atmosphere and ocean dynamic cir­ programming paradigms and architectures to service-oriented culation models. Examples are the Japanese Earth Simulator (Habata associated services? architecture, containers and microservices. et al., 2003); EC-Earth (Hazeleger et al., 2010) and the Community Earth How to deploy models of everywhere at Investigating the role of specialised System Model (Hurrell et al., 2013). While these are still limited to grid scale, with a view to supporting the software frameworks for models of resolutions of kilometres for global applications, these systems rapid deployment of new instances? everywhere, coupled with the use of commonly include the possibility of nesting finer grid domains, with techniques from the model-driven boundary conditions provided by global simulations. The philosophy of engineering community, especially around domain specific languages. such approaches, however, has been quite different from that presented How to achieve data integration given Investigating underlying cloud storage here. The model structure and parameterisations are generally fixed,so highly heterogeneous sources of data technologies such as Cassandra or HBase, that application everywhere has been a matter of finding appropriate (including unstructured and more and associated technologies for semantic effective parameter values for different grid locations using whatever structured data)? integration (especially ontologies and linked data). data might be available. How to make sense of this complex data? Explore a range of appropriate data There have also been some attempts to produce distributed model­ science methods in isolation and in ling systems that could be applied widely at finer grid scales, allowing combination. for a more flexiblechoice of structures. In hydrology, for example, there How to achieve integration between process Seek underlying principles related to was the inter-agency Object Modelling System (OMS) of Leavesley et al. models and data models? process and data model integration; investigate how this can support a (2002) that developed into a more general modelling system (Lloyd reduction of uncertainty and also how it et al., 2011; David et al., 2013). More recently, Clark et al. (2015) have can deal with epistemic uncertainty and proposed the Structure for Unifying Multiple Modeling Alternatives non-linearities. (SUMMA) framework. In both cases, several different model represen­ How to realise the concept of modelling as Seek to bring together expertise in a learning process? adaptive/autonomic computing and tations were provided for the user to choose from in producing a models environmental modelling; seek ways to structure for a particular catchment area. Within these systems, the annotate computations with uncertainty expertise of users can be elicited to defineappropriate model structures, and use this in the reasoning/adaptation although identificationof appropriate model parameters and hypothesis process; seek approaches to support both testing of competing model structures are still major issues (e.g. Weiler coarse and fine-grained adaptation; seek approaches to accept, refine or reject and Beven, 2015). The type of approaches presented here could be used models, including consideration of the with such systems. limits of acceptability approach. In flood risk management, there are inherent motivations to view

15 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Fig. 10. (a) Co-production and local improvements of a flood risk model by Flanders Environment Agency (Vlaamse Milieumaatschappij, VMM), illustrating place- based dialogue about local model errors and data; (b) Difference viewer allowing cooperating parties to assess model updates, supported by contextual information. modelling as a process of learning about places, stemming from two on the incorporation of knowledge about specific localities from multi­ distinctive features of the problem. Firstly, the likelihood and impacts of ple stakeholders. One such system42 has been developed for the Flanders flooding, although driven ultimately by weather and climate, are Environment Agency (Vlaamse Milieumaatschappij, VMM) for mapping strongly influencedby local features of landscapes, land use, and human areas at risk of floodingfrom surface runoff. Here, a web-based interface activities. In some cases, even very small topographic features or creates a shared collaboration space enabling local partners, such as infrastructure assets can have a significant control on flood risk, for town councils or local water- and sewer managers, to engage in a dia­ example by directing the flow of flood waters towards or away from logue about model improvements (Fig. 10). buildings. This information is not always captured well (if at all) in Important features that have been incorporated within the modelling generic model structures and data sets. Secondly, the assessment of flood through this process include areas where flood water can be held back risk involves gathering information about extreme events, which tends by embankments, or drained by pumps and control structures (e.g. gates, to place an emphasis on historical knowledge, often reliant upon sluices, weirs, culverts) that are known to local staff and may not be detailed knowledge of the locality for interpretation, and on the represented adequately without that detailed local knowledge, such as updating of risk assessments as new observations become available. For these reasons, some floodrisk management applications already implement frameworks for iterative co-production of modelling, based 42 https://www.youtube.com/watch?v¼lEWtELDWTsU&feature¼youtu. be&t¼74.

16 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Fig. 11. Engagement, capture of local data and knowledge on the iTable.

Fig. 12. FiGIS and model outputs on the iTable following one of the workshops. the flood retention storage and flow control structures represented in where model-based flood risk assessments are published and can be Fig. 10. challenged by individuals, on the basis of detailed local knowledge, via a The co-production website is shared with professional partners and web-based facility. within its first three months of operation enabled more than 9000 Stakeholder interaction with the outputs of models of everywhere detailed improvements to data and modelling to be implemented can also made more direct and local, to demonstrate and explain as­ together with nearly 300 re-simulations for 103 sub-models involving sumptions, and to alter inputs or outputs as a way of iterating to a co- 150 organisations, and resulting in positive evaluations of model im­ produced model of a place. In this way local models will be better provements at 500 locations across the whole of Flanders. constrained by the local stakeholder information, and more trusted if The VMM example discussed here explicitly exposes modelling as done well. As an example of this way of working, a series of workshops, part of a process of learning about place through knowledge sharing, sponsored by Natural England, were used to engage local farming supported by digital technology. A more constrained example is the communities on the potential benefits of ‘working with natural pro­ floodhazard mapping of the FEMA National Flood Insurance Program43 cesses’ (WWNP) to mitigate flooding, often called Natural Flood Man­ agement (see Hankin et al., 2017 for modelling concepts). Two engagement devices were used. An Augmented Reality Sandbox 43 https://www.fema.gov/national-flood-insurance-program-flood-hazard was used to provide real-time feedback on the response of flowpathways -mapping. to a user sculpting channels in sand. Virtual inputs to the sandbox are

17 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521 controlled by waving a hand over the sandbox. Water flowpathways and In conclusion, the concept of models of everywhere is even more storages are then shown by projection of blue onto the sand. This was important than when it was first proposed given the environmental used as a precursor to the demonstration of more quantitative modelling challenges we face, and this paper has demonstrated that the time is results, with visualisations being projected onto a large interactive right for more large-scale experimentation with the concept. Further iTable and shown in Fig. 11. research though is clearly needed to deliver against this vision and this Engagement using the iTable followed 4 key steps: research has to be fundamentally trans-disciplinary in nature bringing together environmental scientists, data scientists and computer scien­ 1) The model, and modelling assumptions, is explained following a tists to reach a common understanding of representing complex envi­ general discussion of NFM. A baseline run of the model under flood ronmental data, making sense of the resultant highly heterogeneous conditions is discussed for acceptability in terms of local knowledge data, integrating knowledge from process and data models, and rolling of patterns of flooding. out the concept of scale. Equally importantly, there is a need to work 2) A GIS package is used to show different layers for the local catchment closely with social scientists to understand the human and societal issues and bring in layers of potential opportunities that might be based on related to models of everywhere, including the necessary cultural shift to national strategic layers.44 These are discussed with the participants open data, treatments of security and privacy and the role of commu­ and options and be switched on and off according to where the nities in ensuring models represent the peculiarity of places. catchment partners identify where they would be happy to try different sorts of NFM. Acknowledgements 3) The measures are then plugged into the model and the model is run to predict the outcome of the changed configuration. This work is supported by EPSRC Grants: EP/P002285/1 and EP/ 4) The distributed changes to the hydrological responses are explored N027736/1 (Blair’s Senior Fellowship on the Role of Digital Technology with the partners to understand model behaviour and effectiveness. in Understanding, Mitigating and Adapting to Environmental Change, Fig. 12 shows the outputs following one of the workshops. and Models in the Cloud: Generative Software Frameworks to Support the Execution of Environmental Models in the Cloud, respectively). We Interestingly, in some discussions of significantmeasures in front of would also like to thank our colleagues in the Data Science Institute and their peers, landowners came up with interventions that were very sig­ the Centre for Ecology and Hydrology (CEH) for input and for providing nificant,for example sacrificingsome summer irrigation storage to act as such an inspiring environment for cross-disciplinary dialogue around floodstorage areas in the winter season. This process of standing around data science for the natural environment. Finally, we thank the terrific the tables and discussing the catchment with peers appeared to make inputs from a wide range of researchers and practitioners in partner people more forthcoming, and the process more effective. The process organisations, including from JBA Trust, JBA Consulting, the Environ­ does, however, require a different approach to modelling since some of ment Agency, United Utilities, the European Centre For Medium Range the feedback from local stakeholders might not be positive. It is there­ Weather Forecasts, the Oasis Loss Modelling Framework, and the fore important that the modellers involved should not be too protective Environmental Change Institute (Oxford University). about their model, but should recognise and explain the assumptions and uncertainties inherent in the modelling process and be prepared to References incorporate new knowledge as far as possible. Finding ways of conveying (and if necessary recalculating) prediction uncertainties Ahmed, A., Ahmed, E., 2016. A survey on mobile edge computing. In: Proc. 10th IEEE International Conference on Intelligent Systems and Control (ISCO’16). https://doi. within this context is the subject of on-going work. org/10.1109/ISCO.2016.7727082. Atzori, L., Iera, A., Morabito, G., 2010. The Internet of things: a survey. Comput. 7. Conclusions Network. 54 (15), 2787–2805. https://doi.org/10.1016/j.comnet.2010.05.010. Basco-Carrera, L., Warren, A., van Beek, E., Jonoski, A., Giardino, A., 2017. Collaborative modelling or participatory modelling? A framework for water resources This paper has carried out a systematic analysis of the technological management. Environ. Model. Softw 91, 95–110. readiness for the concept of models of everywhere. In particular, the Bastin, L., Cornford, D., Jones, R., Heuvelink, G.B.M., Pebesma, E., Stasch, C., Nativi, S., Mazzetti, P., Williams, M., 2013. Managing uncertainty in integrated environmental paper has examined the various dimensions associated with models of modelling: the UncertWeb framework. Environ. Model. Softw 39, 116–134. https:// everywhere and determined a set of technological requirements that doi.org/10.1016/j.envsoft.2012.02.008. must be met for the successful large-scale deployment of the concept. Beven, K.J., 2006. A manifesto for the equifinality . J. Hydrol. (Wellingt. North) – This set of requirements was then used to compare technological read­ 320, 18 36. Beven, K.J., 2007. Working towards integrated environmental models of everywhere: iness when models of everywhere was first proposed against the readi­ uncertainty, data, and modelling as a learning process. Hydrol. Earth Syst. Sci. 11 ness levels now, showing that the time is right for widespread (1), 460–467. experimentation and deployment of the concept. Although many of the Beven, K.J., 2009. Environmental Modelling: an Uncertain Future? Routledge, London. Hb: 978-0-415-46302-7; Pb: 978-0-415-45759-0). http://www.uncertain-future.org.uk. technological barriers have been removed, key research issues remain Beven, K.J., 2016. EGU Leonardo Lecture: facets of Hydrology - epistemic error, non- and the paper has highlighted a set of open research questions that must stationarity, likelihood, hypothesis testing, and communication. Hydrol. Sci. J. 61 – be addressed before progress can be made. Importantly, this research (9), 1652 1665. https://doi.org/10.1080/02626667.2015.1031761. Beven, K.J., 2019. Towards a Methodology for Testing Models as Hypotheses in the agenda represents a shift in environmental modelling from an approach Inexact Sciences. Proceedings Royal Society A (in press). centred on process understanding (through deterministic models) to one Beven, K.J., Alcock, R., 2012. Modelling everything everywhere: a new approach to that embraces a more data-centric perspective, whereby the two ap­ decision making for water management under uncertainty. Freshw. Biol. 56, 124–132. https://doi.org/10.1111/j.1365-2427.2011.02592.x. proaches can work in tandem to achieve a deeper understanding of Beven, K.J., Binley, A.M., 1992. The future of distributed models: model calibration and specific places and hence to support more nuanced decision making uncertainty prediction. Hydrol. Process. 6, 279–298. about a given place. Beven, K.J., Binley, A.M., 2014. GLUE, 20 years on. Hydrol. Process. 28 (24), 5897–5918. https://doi.org/10.1002/hyp.10082. There are also key limitations of the models of everywhere approach Beven, K.J., Cloke, H.L., 2012. Comment on “Hyperresolution global land surface including the computational requirements if rolled out on a large scale modeling: Meeting a grand challenge for monitoring Earth’s terrestrial water” by and also the advances needed in environmental science to develop scale Eric F. Wood et al. Water Resour. Res. 48 (1). dependent parameterisations and embrace an underlying science of Beven, K.J., Cloke, H., Pappenberger, F., Lamb, R., Hunter, N., 2015. Hyperresolution information and hyperresolution ignorance in modelling the hydrology of the land everything, everywhere and at all times. surface. Sci. China Earth Sci. 58 (1), 25–35. Beven, K.J., Hall, J.W. (Eds.), 2014. Applied Uncertainty Estimation in Flood Risk Management. Imperial College Press, London.

44 http://naturalprocesses.jbahosting.com/.

18 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Beven, K.J., Lane, S., 2019. Invalidation of models and fitness-for-purpose:a rejectionist Hurrell, J.W., Holland, M.M., Gent, P.R., Ghan, S., Kay, J.E., Kushner, P.J., Lamarque, J. approach. In: Beisbart, C., Saam, N.J. (Eds.), Computer Simulation Validation - F., Large, W.G., Lawrence, D., Lindsay, K., Lipscomb, W.H., 2013. The community Fundamental Concepts, Methodological Frameworks, and Philosophical earth system model: a framework for collaborative research. Bull. Am. Meteorol. Soc. Perspectives. Springer, Cham, pp. 145–171 (Chapter 6) in. 94 (9), 1339–1360. Beven, K.J., Smith, P.J., 2015. Concepts of information content and likelihood in Jamshidi, M. (Ed.), 2011. System of Systems Engineering: Innovations for the Twenty- parameter calibration for hydrological simulation models. ASCE J. Hydrol. Eng. First Century, vol. 58. John Wiley & Sons. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000991. Johnson, R.E., 1997. Frameworks ¼ componentsþpatterns. Commun. ACM 40 (10), Beven, K., Smith, P.J., Wood, A., 2011. On the colour and spin of epistemic error (and 39–42. https://doi.org/10.1145/262793.262799. what we might do about it). Hydrol. Earth Syst. Sci. 15, 3123–3133. https://doi.org/ Kephart, J.O., Chess, D.M., 2003. The vision of autonomic computing. Computer 36 (1), 10.5194/hess-15-3123-2011. 41–50. https://doi.org/10.1109/MC.2003.1160055. Bierkens, M.F., Bell, V.A., Burek, P., Chaney, N., Condon, L.E., David, C.H., de Roo, A., Kon, F., Costa, F., Blair, G.S., Campbell, R.H., 2002. The case for reflective middleware. Doll,€ P., Drost, N., Famiglietti, J.S., Florke,€ M., 2015. Hyper-resolution global Commun. ACM 45 (6), 33–38. hydrological modelling: what is next? “Everywhere and locally relevant”. Hydrol. Landstrom,€ C., Whatmore, S.J., Lane, S.N., Odoni, N.A., Ward, N., Bradley, S., 2011. Process. 29 (2), 310–320. Coproducing flood risk knowledge: redistributing expertise in critical ‘participatory Blair, G.S., Henrys, P., Leeson, A., Watkins, J., Eastoe, E., Jarvis, S., Young, P.J., 2019. modelling’. Environ. Plan. 43 (7), 1617–1633. Data Science of the Natural Environment: a Research Roadmap (In preparation). Lane, S.N., Odoni, N., Landstrom,€ C., Whatmore, S.J., Ward, N., Bradley, S., 2011. Doing Blei, D.M., 2014. Build, compute, critique, repeat: data analysis with latent variable floodrisk science differently: an experiment in radical scientificmethod. Trans. Inst. models. Ann. Rev. Stat. Appl. 1, 203–232. Br. Geogr. 36 (1), 15–36. Box, G., 1980. Sampling and Bayes’ inference in scientificmodeling and robustness. J. R. Lahoz, W., Khattatov, B., Menard, R. (Eds.), 2010. Data Assimilation: Making Sense of Stat. Soc. Ser. A 143 (4), 383–430. Observations. Springer Science & Business Media. Clark, M.P., Nijssen, B., Lundquist, J.D., Kavetski, D., Rupp, D.E., Woods, R.A., Freer, J. Leavesley, G.H., Markstrom, S.L., Restrepo, P.J., Viger, R.J., 2002. A modular approach E., Gutmann, E.D., Wood, A.W., Gochis, D.J., Rasmussen, R.M., 2015. A unified for addressing model design, scale, and parameters. Hydrol. Process. 16, 173–187. approach for process-based hydrologic modeling: 2. Model implementation and case Lopez, P.G., Montresor, A., Epema, D., Datta, A., Higashino, T., Iamnitchi, A., studies. Water Resour. Res. 51 (4), 2515–2542. Barcellos, M., Felber, P., Riviere, E., 2015. Edge-centric computing: vision and Coulouris, G., Dollimore, J., Kindberg, T., 2011. Distributed Systems: Concepts and challenges. SIGCOMM Comput. Commun. Rev. 45 (5), 37–42. https://doi.org/1 Design, fifth ed. Pearson. 0.1145/2831347.2831354. Coxon, G., Freer, J., Westerberg, I.K., Wagener, T., Woods, R., Smith, P.J., 2015. A novel Lloyd, W., David, O., Ascough, J.C., Rojas, K.W., Carlson, J.R., Leavesley, G.H., framework for discharge uncertainty quantification applied to 500 UK gauging Krause, P., Green, T.R., Ahuja, L.R., 2011. Environmental modeling framework stations. Water Resour. Res. 51 (7), 5531–5546. invasiveness: analysis and implications. Environ. Model. Softw 26 (10), 1240–1250. Dadson, S.J., Hall, J.W., Murgatroyd, A., Acreman, M., Bates, P., Beven, K.J., McCallum, I., Liu, W., See, L., et al., 2016. Technologies to support community flood Heathwaite, L., Holden, J., Holman, I.P., Lane, S.N., O’Connell, P.E., Penning- disaster risk reduction. Int. J. Disaster Risk Sci. 7, 198. https://doi.org/10.100 Rowsell, E., Reynard, N., Sear, D., Thorne, C., Wilby, R., 2017. A restatement of the 7/s13753-016-0086-5. natural science evidence concerning catchment-based “natural” flood management McDonnell, J.J., Beven, K.J., 2014. Debates—the future of hydrological sciences: a in the United Kingdom. Proc. R. Soc. A 473, 20160706. https://doi.org/10. (common) path forward? A call to action aimed at understanding velocities, 1098/rspa.2016.0706. celerities, and residence time distributions of the headwater hydrograph. Water David, O., Ascough, J.C., Lloyd, W., Green, T.R., Rojas, K.W., Leavesley, G.H., Ahuja, L. Resour. Res. 50 https://doi.org/10.1002/2013WR015141. R., 2013. A software engineering perspective on environmental modeling framework McKinley, P.K., Sadjadi, S.M., Kasten, E.P., Cheng, B.H.C., 2004. Composing adaptive design: the object modeling system. Environ. Model. Softw 39, 201–213. software. Computer 37 (7), 56–64. https://doi.org/10.1109/MC.2004.48. Dean, J., Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Maskrey, S.A., Mount, N.J., Thorne, C.R., Dryden, I., 2016. Participatory modelling for Commun. ACM 51 (1), 107–113. https://doi.org/10.1145/1327452.1327492. stakeholder involvement in the development of flood risk management intervention Di Baldassarre, G., Schumann, G., Bates, P.D., 2009. A technique for the calibration of options. Environ. Model. Softw 82, 275–294. hydraulic models using uncertain satellite observations of flood extent. J. Hydrol. Maes, P., 1987. Concepts and experiments in computational reflection. In: 367 (3), 276–282. Meyrowitz, Norman (Ed.), Conference Proceedings on Object-Oriented Programming Di Baldassarre, G., Brandimarte, L., Beven, K.J., 2016. The seventh facet of uncertainty: Systems, Languages and Applications (OOPSLA ’87). ACM, New York, NY, USA, wrong assumptions, unknowns and surprises in the dynamics of human-water pp. 147–155. systems. Hydrol. Sci. J. 61 (9), 1748–1758. https://doi.org/10.1080/ Metcalfe, P., Beven, K.J., Hankin, B., Lamb, R., 2017. A modelling framework for 02626667.2015.1091460. evaluation of the hydrological impacts of nature-based approaches to flood risk Edwards, L., Mullagh, L., Towe, R., Nundloll, V., Dean, C., Dean, G., Simm, W., management, with application to in-channel interventions across a 29-km2 scale Samreen, F., Bassett, R., Blair, G.S., 2017. Data-driven decisions for flood risk catchment in the United Kingdom. Hydrol. Process. https://doi.org/10.1002/ management. Proc. Data Policy 2017. http://doi.org/10.5281/zenodo.884180. htt hyp.11140. p://dataforpolicy.org/papers/. Nearing, G.S., Tian, Y., Gupta, H.V., Clark, M.P., Harrison, K.W., Weijs, S.V., 2016. � Evers, M., Jonoski, A., Maksimovi�c, C., Lange, L., Ochoa Rodriguez, S., Teklesadik, A., A philosophical basis for hydrological uncertainty. Hydrol. Sci. J. 61 (9), 1666–1678. Cortes Arevalo, J., Almoradie, A., Eduardo Simoes,~ N., Wang, L., Makropoulos, C., Nundloll, V., Porter, B., Blair, G.S., Emmett, B., Cosby, J., Jones, D., Chadwick, D., 2012. Collaborative modelling for active involvement of stakeholders in urban flood Winterbourn, B., Beattie, P., Shaw, R., Shelley, W., Ullah, I., Brown, M., Dean, G., risk management. Nat. Hazards Earth Syst. Sci. 12 (9), 2821–2842. 2019. The design and deployment of an end-to-end IoT infrastructure for the natural Fenicia, F., Kavetski, D., Savenije, H.H., 2011. Elements of a flexible approach for environment. Future Internet 11 (6), 129. https://doi.org/10.3390/fi11060129. conceptual hydrological modeling: 1. Motivation and theoretical development. Oreizy, P., Gorlick, M.M., Taylor, R.N., Heimbigner, D., Johnson, H., Medvidovic, N., Water Resour. Res. 47, 11. Quilici, A., Rosenblum, D.S., Wolf, A.L., 1999. An architecture-based approach to Ferre,� T.P., 2017. Revisiting the relationship between data, models, and decision-making. self-adaptive software. IEEE Intell. Syst. 14 (3), 54–62. https://doi.org/10.1109/52 Gr. Water 55 (5), 604–614. 54.769885. Foster, I., Kesselman, C. (Eds.), 1998. The Grid: Blueprint for a New Computing Park, S.K., Xu, L. (Eds.), 2017. Data Assimilation for Atmospheric, Oceanic and Infrastructure. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Hydrologic Applications, vol. 3. Springer Science & Business Media. Foster, I., Kesselman, C., Tuecke, S., 2001. The Anatomy of the Grid: enabling scalable Prudhomme, C., Wilby, R.L., Crooks, S., Kay, A.L., Reynard, N.S., 2010. Scenario-neutral virtual organizations. Int. J. Supercomput. Appl. 15, 3. approach to climate change impact studies: application to flood risk. J. Hydrol. 390 France, R., Rumpe, B., 2007. Model-driven development of complex software: a research (3), 198–209. roadmap. In: 2007 Future Of Software Engineering (FOSE ’07). IEEE Computer Society, Renard, B., Kavetski, D., Kuczera, G., Thyer, M., Franks, S.W., 2010. Understanding Washington, DC, USA, pp. 37–54. https://doi.org/10.1109/FOSE.2007.14. predictive uncertainty in hydrologic modeling: the challenge of identifying input and Gilbert, J.M., Maxwell, R.M., 2017. Examining regional groundwater-surface water structural errors. Water Resour. Res. 46, 5. dynamics using an integrated hydrologic model of the San Joaquin River basin. Rougier, J., Sparks, S., Hill, L., 2013. Risk and Uncertainty Assessment for Natural Hydrol. Earth Syst. Sci. 21 (2), 923. Hazards. Cambridge University Press, Cambridge, UK. Habata, S., Yokokawa, M., Kitawaki, S., 2003. The earth simulator system. NEC Res. Dev. Simm, W.A., Samreen, F., Bassett, R., Ferrario, M.A.F.C., Blair, G.S., Young, P.J., 44 (1), 21–26. Whittle, J.N., 2018. SE in ES: opportunities for software engineering and cloud Hankin, B., Metcalfe, P., Johnson, D., Chappell, N.A., Page, T., Craigen, I., Lamb, R., computing in environmental science. In: ICSE-SEIS ’18. ACM 10, New York, ISBN Beven, K.J., 2017. Strategies for testing the impact of natural floodrisk management 9781450356619. measures. In: Flood Risk Management. InTech Publishers. https://doi.org/10.5772/i Smith, L., Liang, Q., James, P., Lin, W., 2015. Assessing the utility of social media as a ntechopen.68677. data source for flood risk management using a real-time modelling framework. Hazeleger, W., Severijns, C., Semmler, T., S¸ tefanescu,� S., Yang, S., Wang, X., Wyser, K., J. Flood Risk Manag. 10, 370–380. https://doi.org/10.1111/jfr3.12154. Dutra, E., Baldasano, J.M., Bintanja, R., Bougeault, P., 2010. EC-Earth: a seamless Smith, P.J., Hughes, D., Beven, K.J., Cross, P., Tych, W., Coulson, G., Blair, G., 2009. earth-system prediction approach in action. Bull. Am. Meteorol. Soc. 91 (10), Towards the provision of site specificflood warnings using wireless sensor networks. 1357–1363. Meteorol. Appl. 16, 57–64. Hey, T., Tansley, S., Troll, K. (Eds.), 2009. The Fourth Paradigm. Data-Intensive Towe, R., Dean, G., Edwards, L., Nundloll, V., Blair, G.S., Lamb, R., Hankin, B., Scientific Discovery, Microsoft Research. Manson, So, 2019. Rethinking Data-Driven Decision Support in Flood Risk Management for a Big Data Age (submitted for publication).

19 G.S. Blair et al. Environmental Modelling and Software 122 (2019) 104521

Voinov, A., Kolagani, N., McCall, M.K., Glynn, P.D., Kragt, M.E., Ostermann, F.O., Westerberg, I.K., McMillan, H.K., 2015. Uncertainty in hydrological signatures. Hydrol. Pierce, S.A., Ramu, P., 2016. Modelling with stakeholders–next generation. Environ. Earth Syst. Sci. 19 (9), 3951–3968. Model. Softw 77, 196–220. Westerberg, I., Guerrero, J.-L., Seibert, J., Beven, K.J., Halldin, S., 2011. Stage-discharge Vrugt, J.A., Sadegh, M., 2013. Toward diagnostic model calibration and evaluation: uncertainty derived with a non-stationary rating curve in the Choluteca River, approximate Bayesian computation. Water Resour. Res. 49 (7), 4335–4345. Honduras. Hydrol. Process. 25, 603–613. https://doi.org/10.1002/hyp.7848. Waldrop, M.M., 2008. Science 2.0 – is open access science the future? Sci. Am. 298. Wood, E.F., Roundy, J.K., Troy, T.J., Van Beek, L.P.H., Bierkens, M.F., Blyth, E., de Weiler, M., Beven, K.J., 2015. Should there be a community model? Water Resour. Res. Roo, A., Doll,€ P., Ek, M., Famiglietti, J., Gochis, D., 2011. Hyperresolution global 51 (9), 7777–7784. land surface modeling: meeting a grand challenge for monitoring Earth’s terrestrial water. Water Resour. Res. 47 (5).

20