EU-Brazil OpenBio EU-Brazil and e-Infrastructure for Biodiversity

EU-BrazilOpenBio Joint Action Plan

A vision to boost co-operation between Brazil and Europe, strengthening links with research and business communities

Final Version November 2013

European Commission Disclaimer

EUBrazilOpenBio - Open Data and Cloud Computing e-Infrastructure for Biodiversity (2011-2013) is a Small or medium-scale focused research project (STREP) funded by the European Commission under the Cooperation Programme, Framework Programme Seven (FP7) Objective FP7-ICT-2011- EU-Brazil Research and Development cooperation, and the National Council for Scientifc and Technological Development of Brazil (CNPq) of the Brazilian Ministry of Science and Technology (MCT) under the corresponding matching Brazilian Call for proposals MCT/CNPq 066/2010. This document contains information on core activities, fndings, and outcomes of EUBrazilOpenBio project, and in some instances, distinguished experts forming part of the project’s Strategic Advisory Board. Any references to content in both website content and documents should clearly indicate the authors, source, organization and date of publication. The document has been produced with the co-funding of the European Commission and the National Council for Scientifc and Technological Development of Brazil. The content of this publication is the sole responsibility of the EUBrazilOpenBio Consortium and its experts and cannot be considered to refect the views of the European Commission nor the National Council for Scientifc and Technological Development of Brazil. Table of Contents

Foreword...... 3

1. Where we stand today...... 5

1.1 Europe and Brazil Perspectives on ICT and Cloud Computing...... 5

2. Spotlight on EUBrazilOpenBio – an approach to e-infrastructure for biodiversity research ...... 8

2.1 EUBrazilOpenBio: Exploitation and Future Synergies...... 12

3. Facing up to the challenges ...... 16

3.1 e-Science in Horizon 2020 ...... 16

3.2 Challenges for e-infrastructure...... 18

3.3 Biodiversity: challenges and priorities...... 20

4. EUBrazilOpenBio Joint Action Plan...... 24

1 2 Foreword

Horizon 2020 is a time line. The overriding world regions will be able to take huge goal of the European Commission is to steps forward in making research data more make every researcher digital by this date reproducible, gaining transparency and through the development and usage of easing collaboration on existing research e-infrastructure across an increasingly wider data. range of disciplines while also meeting the EUBrazilOpenBio is a 2-year scoping analysis overall needs of society. aimed at understanding how to address 2020 is also a time line for international biodiversity challenges more effectively by biodiversity targets. The goal of the European creating a joint data infrastructure for Brazil Union is to “halt the loss of biodiversity and and Europe. It has demonstrated the benefits ecosystem services in the EU by 2020 and of Open Access in all its core aspects, restore them insofar as possible, stepping by focusing on the integration of open up the EU’s contribution to averting global data, software and services. It promotes biodiversity loss”. the concept of for scientific research by leveraging the achievements, Research and innovation are crucial to components and infrastructures developed addressing these challenges. The services in other projects, so that both regions can provided by the e-Infrastructures for research capitalise on earlier investments and bring are essential for progress in any type of to the table experiences on user-centric research. Such e-Infrastructures include approaches. It has also demonstrated the high-performance and high-throughput benefits of small-scale funding to enable computing, high-end storage, advanced this open integration, overcoming different networking, services like authentication approaches by focusing attention on the and authorisation, and services to support sharing of smarter approaches. The project research workflows. and its Board have conducted an in-depth No single country can tackle the big issues analysis of the biodiversity informatics and alone. No single institution or initiative cloud computing landscapes. This analysis can meet the growing demands around big also draws on studies by several partner, data, compute and storage to make research including the position paper presented by more cost effective and accelerate discovery. William A. Gray, Species 2000, as well as the Further, modern research is international views and visions of external experts. by nature. It requires integrated services The EUBrazilOpenBio Joint Action Plan and interoperable infrastructures across presents a summary of these findings, Europe and the world. To implement such highlighting priorities for the further an e-infrastructure commons, a high degree development of e-infrastructure in of collaboration and standardisation is Europe and beyond, and the enabling role required. e-infrastructure can play in implementing Open data and are essential to the 2020 and 2050 vision for biodiversity boost innovation and demonstrate the return and ecosystem services. It proposes of investments by national governments potential scenarios for future collaborative to society at large. enables work between Europe and Brazil, not only to the sharing of methods, software and other address complex biodiversity challenges but variants of virtual infrastructure. With also to build stronger links between business more investments to incentivise openness communities to contribute to sustainable and reward contributors, Europe and other development. 3 Our analysis of cloud computing needs in data. We expect that the many fruitful EU- Brazil and the present maturing market of Brazil discussions will continue in the near cloud offering in Europe has highlighted future to help realise the vision we are now the many different opportunities that exist shaping thanks to the pioneering approach for joint co-operation between Europe of EUBrazilOpenBio. and Brazil. It is therefore important to The Joint Action Plan is an initial step build on the successful exercise of the towards demonstrating that new and creative EUBrazilOpenBio project and ensure that approaches to scientific discovery will be the wider research community benefits possible by mastering the main technical from easy to access virtual computing and data challenges that lie ahead. Brazil infrastructures such as cloud computing to and Europe have much to contribute to the tackle the rising tide of large scientific data. creation of better services for researchers These opportunities range from new multi- at all the different levels. By embracing disciplinary approaches to biodiversity, the diversity of talent that exists in all including internationally recognised post- fields of research and informatics, future graduate qualifications and joint facilities, international co-operation can help make to small and medium-sized businesses collaborative research better, more open, creating value-add services around open and multidisciplinary.

Acknowledgements

Members of the EUBrazilOpenBio Strategic Jacco Konijn, University of Amsterdam and Advisory Board: Malcolm Atkinson, Director CReATIVE-B of Edinburgh Data-intensive research and Wouter Los, University of Amsterdam and UK e-Science Envoy; Fabrizio Gagliardi, LifeWatch Independent Consultant; Nelson Simões, Director General of Brazil’s National Research Peter Schalk, Acting Executive Secretary and Education Network, RNP Species 2000

William A. Gray, Species 2000 Wouter Addink, Naturalis

4 1. Where we stand today

1.1 Europe and Brazil Perspectives on ICT and Cloud Computing

There have been significant investments in disciplines underpinned by software that both Brazil and Europe to develop, deploy is built around two frameworks: Generic and foster the uptake of cloud and virtualised Worker and COMP superscalar (COMPSs). It infrastructures to make available a virtual has therefore contributed to the deployment environment for a distributed community, of easy-to-use platforms that can facilitate gaining capacity and enhancing collaborative access to a wide user base. work. Virtualisation (at storage, server, EGI is the world’s largest, most powerful and operating system, network and application most comprehensive distributed computing level) has developed significantly in recent e-infrastructure supporting world-class years, reducing IT expenses while boosting research conducted by over 22,000 scientists efficiency and agility. and researchers across 50 countries4. The EGI On the European side, e-infrastructures community is a federation of independent have evolved to encompass advances for national and community resource providers, cloud and grid middleware and applications, whose resources support specific research including efforts supporting open-standards communities and international collaborators based federation. EC-funded initiatives both within Europe and worldwide. EGI has include VENUS-C (2010-2012)1, StratusLab developed a cloud infrastructure platform (2010-2012)2 and EGI-InSPIRE (2010- as part of its strategic vision to support the 2014)3. In particular, the VENUS-C project wider scientific community. This transition to demonstrated that cloud is an effective the cloud is led by the Federated Cloud Task paradigm to provide computing power, not to integrate existing communities’ resources only to the research community but also to and make them interoperable based on the small companies for which HPC systems are use of open standards. This work is paving not economically affordable. The project the way towards a highly flexible European adopted a user-centric approach to evaluate model to deliver innovative cloud services the benefits of Platform-as-a-Service for the international research community. style cloud computing in diverse research This goal is also shared by the Helix Nebula5 1 VENUS-C, http://www.venus-c.eu. VENUS-C in initiative, which is setting up a European the Digital Agenda for Europe: Bringing open, cloud for science. Based on the experience user-centric cloud infrastructure to research gathered from “proof of concept” communities, https://ec.europa.eu/digital- agenda/en/news/bringing-open-user-centric- deployments, Helix Nebula’s architecture cloud-infrastructure-research-communities and group, led by a series of cloud-savvy SMEs, SMEs empowered by a pioneering approach to have defined a standards based federated cloud computing, http://ec.europa.eu/digital- cloud architecture to enable an open platform agenda/en/news/smes-empowered-pioneering- for science innovation. EGI.eu is contributing approach-cloud-computing. 2 StratusLab, http://stratuslab.eu/. 4 Figures retrieved September 2013, http:// 3 EGI, http://www.egi.eu/; EGI Federated www.egi.eu/infrastructure/operations/egi_in_ Cloud Task, https://wiki.egi.eu/wiki/ numbers/. Fedcloud-tf:FederatedCloudsTaskForce. 5 Helix Nebula, http://helix-nebula.eu/. 5 to the development of the architecture so on-going improvements to enable users to that the EGI publicly funded e-infrastructure store their files in the cloud and transform can be interfaced with Helix Nebula. Flagship the prototype into a service applications from more research disciplines pilot. RNP also coordinates the deployment that will stretch the functionality and impact of special ICT projects commissioned by of Helix Nebula have been identified for the Brazilian government, like the Brazilian deployment during 2013. Biodiversity Information System (SiBBr)7 and the Shared project (CDC). Early investments by the Brazilian Future strategic plans of RNP include the government include cloud computing pilots, deployment of a test bed for Future focusing on the federation of computing research, upgrading part of its backbone to resources, middleware development for 100Gbps and creating two new points of service management and PaaS for scientific international connectivity, at Rio de Janeiro applications, as well as upgrading the and Fortaleza (currently the international backbone of its National Research and links are connected from Sao Paulo). Education Network, RNP1. Examples of early initiatives include the SINAPAD network of Finally, RNP plays a leading role in Latin HPC centres2 and the OurGrid Community3 America, committing considerable financial complemented by RNP initiatives, such as resources and expertise to RedCLARA. the Just in Time (JiT) Cloud project4 and the Internationally, it is contributing to the Working Group MyScientificCloud (GT mc2) pioneering work of the NREN Global CEO within RNP. These initiatives have not only Forum, which is focused on addressing facilitated the development of e-science global challenges such as a well-defined activities but also substantially contributed and inclusive network architecture to raising awareness of e-science in the interconnecting NRENs on a global scale; country, as highlighted by the Brazilian global federated ; Ministry of Science, Technology and global real-time communications exchange; Innovation5. and global service delivery8. Its involvement in the forum, which includes AARNet RNP is driving migration to cloud services, (Australia), CANARIE (Canada), CERNET investigating feasibility and the potential for (China), CUDI (Mexico), DFN (Germany), a community cloud approach (e.g. sharing Internet2 (USA), Janet (UK), NORDUnet resources and contracts), as well as the (European Nordics), REANNZ (New Zealand), implications of buying versus outsourcing RedCLARA (Latin America), RENATER (France), services. The Cloud for Science Working SURFnet (Netherlands), clearly illustrates its Group (CNC) within RNP has deployed a significance in the international arena. cloud storage service prototype based on the OpenStack Swift object storage6 with Initiatives like EELA9 (I, II and, GISELA)10, 1 Brazilian National Research and Education EUBrazilOpenBio11, and the CHAIN-REDs Network, RNP (Rede Nacional de Ensino e project with the participation of RNP and Pesquisa), http://www.rnp.br/. 2 SINAPAD, https://www.lncc.br/sinapad/. 7 SiBBr, http://www.sibbr.gov.br/. 3 OurGrid, http://www.ourgrid.org. 8 G. Cattaneo, M. Claps, S. Conway IDC, S.Musella, 4 Just In Time Cloud, http://www.lsd.ufcg.edu.br/ S.Parker, N. Ferguson, Trust-IT, Cloud Computing relatorios_tecnicos/TR-3.pdf. for e-Science and Public Authorities – Final Report 5 Rafael H. R. Moreira, director of ICT policy of the (SMART 0055), pp. 21-23, http://cordis.europa.eu/ Ministry of Science, Technology and Innovation in fp7/ict/e-infrastructure/docs/final-report-clouds- Brazil, cloudConf LATAM 2012: http://cloudconf. study.pdf. com.br/arquivos/CloudConf2012_Rafael_Moreira. 9 EELA, http://www.eu-eela.eu. pdf. 10 GISELA, http://www.gisela-grid.eu. 6 OpenStack SwiftStack, http://swiftstack.com/ 11 EUBrazilOpenBio, http://www.eubrazilopenbio. 6 -swift/. eu. RedClara12 have played an important part between Europe and Brazil over the last in fostering international co-operation decade.

12 http://www.chain-project.eu/.

7 2. Spotlight on EUBrazilOpenBio - an open access approach to e-infrastructure for biodiversity research

“EU-BrazilOpenBio has successfully supported As a hybrid data infrastructure, co-operation between Brazil and Europe on the EUBrazilOpenBio provides a delivery model key theme of biodiversity. Its focus has been for data management capability, in which on demonstrating the benefits of integrating computing, storage, data and software and sharing data, tools and services in an are available “as-a-Service”1. It builds on open way to facilitate biodiversity research the cloud paradigm offering ‘computing communities across borders rather than on as a utility’, introducing the elasticity of technological innovation. Part of its lasting resources and infinite capacity as key legacy will be the cultural and environmental features with the aim of making data and data bridges between the two regions that it has management services available on demand. successfully built.” The platform has been designed based on the evaluation of user requirements. The Malcolm Atkinson, e-Science Institute and underlying architecture comprises a number National e-Science Centre, UK of interacting services underpinned by a “At Biodiversity Informatics Horizons 2013, service-oriented infrastructure approach, EUBrazilOpenBio presented an impressive with a focus on re-use and interoperability. array of facilities.” Its aggregative nature enables the creation Dave Roberts, Natural History Museum, UK of a “system of systems”, drawing on a Rather than creating a new monolithic range of synergies with Brazilian and infrastructure, EUBrazilOpenBio has focused on European initiatives to leverage some of demonstrating the benefits of an infrastructure the most relevant resources, e.g. textual model that is capable of harnessing existing publications, data sets, maps, taxonomies, disparate resources to provide a coherent 1 On the underlying technology paradigm for research environment for scientists. The e-infrastructure adopted by EUBrazilOpenBio, EUBrazilOpenBio infrastructure is the result see L. Candela, D. Castelli, P. Pagano, ‘gCube: a of the collaborative aggregation of data, service-oriented application framework on the tools, services and computational resources grid’, ERCIM News, vol. 72 pp. 48 - 48. ERCIM, into a coherent and integrated research 2008, http://ercim-news.ercim.eu/en72/rd/ gcube-a-service-oriented-application-framework- environment for the benefit of the biodiversity on-the-grid. gCube is a key enabling technology community. EUBrazilOpenBio supports the of the iMarine hybrid data infrastructure, www.i- open access movement, promoting the marine.eu. See also, ‘the iMarine e-infrastructure: concept of openness for scientific research by integrating technologies to unlock knowledge leveraging the achievements, components and on marine living resources’, November 2012, infrastructures developed in other projects, http://www.isgtw.org/announcement/imarine-e- infrastructure-integrating-technologies-unlock- so that both regions can capitalise on earlier knowledge-marine-living-resour, and ‘Surfing the investments and bring to the table experiences marine data wave’, September 2013, http://www. on user-centric approaches. isgtw.org/feature/surfing-marine-data-wave. 8 tools, services, computing and storage A good example is the cloud-enabled capabilities. Integration spans Species2000/ Ecological Niche Modelling Service ITIS Catalogue of Life2, the Virtual Herbarium developed in EU-BrazilOpenBio and the of Flora and Fungi3, Global Biodiversity synergy with BioVeL, integrating the service Information Facility (GBIF)4; speciesLink5; in the BioVeL workflow and working with the List of Species of the Brazilian Flora6 EGI Federated Cloud Task and CloudCaps openModeller7, as well as initiatives like mini-project to optimise the use case on BioVeL8 and i4Life9 and technologies such Ecology. as COMP Superscalar (COMPSs)10 and gCube11 EUBrazilOpenBio also has the capability and cloud infrastructures like VENUS-C. to support the creation and operation of Standards adoption for the implementation virtual research environments (VREs), a web- of the interfaces makes it possible not only based collaborative platform connecting to interoperate with other infrastructures researchers irrespective of geographical but also to simplify the adoption of location. VREs offer transparent and EUBrazilOpenBio by other communities12. seamless access to a shared set of remote

2 Species2000/ITIS Catalogue of Life, http://www. resources, such as data, tools and computing catalogueoflife.org/. capabilities central to their work. By 3 Virtual Herbarium of Flora and Fungi, Virtual leveraging the flexible approach of VREs, Herbarium of Flora and Fungi, http://inct. EUBrazilOpenBio has contributed to an florabrasil.net/noticias/lista-de-especies-da- evolving ecosystem for user-configurable flora-do-brasil-nova-versao/. e-science environments that help to build 4 Global Biodiversity Information Facility (GBIF), http://www.gbif.org/. communities. This VRE ecosystem and the 5 SpeciesLink, http://splink.cria.org.br/. creation of Virtual Research Communities 6 List of Species of the Brazilian Flora, http:// (VRCs) will continue to be an important wiki.eubrazilopenbio.eu/index.php/List_of_ feature in Horizon 2020. Species_of_the_Brazilian_Flora. 7 openModeller, http://openmodeller.sourceforge. net/. for Biodiversity’, Proceedings of the 5th 8 BioVeL, http://www.biovel.eu/. International Workshop on Science Gateways 9 i4Life – Indexing for Life, http://www.i4life.eu/. (IWGS 2013), June 2013, http://ceur-ws.org/ 10 COMP Superscalar, Barcelona Supercomputing Vol-993/paper7.pdf. See also, ‘Execution Center, http://www.bsc.es/computer-sciences/ grid-computing/comp-superscalar. of scientific workflows on federated multi- 11 gCube Framework, http://www.gcube-system. cloud infrastructures’, proceedings of the org/. 1st International FedICI 2013 Workshop 12 On key achievements of on Federative and Interoperable Cloud EUBrazilOpenBio from a technical Infrastructures, Euro-Par 2013, http://www. perspective, eee for example, ‘EU-Brazil Open dps.uibk.ac.at/~gabor/FedICI13/program. Data and Cloud Computing e-Infrastructure html.

9 Use Cases

EUBrazilOpenBio enables Brazilian and and biodiversity specialists to utilise richer European research communities to tap into data with the potential for the exchange high-quality data sources and tools with of information in both directions (from easy access to an even greater knowledge the regional Brazilian List of Flora to the base, such as taxonomic intelligence and Catalogue of Life and vice-versa), thus ecological niche modelling. helping to fill and close gaps between the different taxonomies. This consolidation Integration of regional and global is likely to be a significant boon to the taxonomies: Access to reliable taxonomic conservation efforts of scientists who make information classifying the names of use of these catalogues in their research species is of paramount importance for and should help provide a fuller picture of environmental science, for monitoring biodiversity levels in Brazil changes in biodiversity and for effectively addressing challenges related to climate Ecological Niche Modelling: change, both locally and on a global scale. EUBrazilOpenBio’s co-operation with the However, the ability to cross-reference Brazilian Virtual Herbarium is also part of between regional and global taxonomic the drive towards a more complete and data sets is often masked by complex integrated view, identifying knowledge differences in the taxonomic classification gaps in the biogeography of flora and using used. EUBrazilOpenBio supports the use of herbarium data to generate ecological niche the Catalogue of Life Cross-Mapping Tool, models. Occurrence points are retrieved developed by scientists at Cardiff University, from speciesLink, a network that integrates to improve taxonomic information relating data from distributed biological collections, to Brazil’s plant species. The tool cross- currently serving almost 4 million maps the regional List of Species of Brazilian plant specimen records. The modelling Flora, containing around 10,000 species strategy involves generating algorithms organised under 40,000 plant taxa names in openModeller, helping biodiversity (including synonyms), against the global researchers to model ecological niches of Species2000/ITIS Catalogue of Life, indexing plant and animal species across the world. about 150,000 plant species. The Catalogue The occurrence points are also retrieved of Life Cross-mapping Tool helps manage from GBIF with over 380 million points1. differences between catalogues by detecting, Such an approach indicates the ecological analysing and reporting not only differences requirements for a species to survive and between two checklists of species, but also maintain viable populations over time by differences in their taxonomic treatment. It relating locations where a species is known then identifies potential resolutions to help to occur with environmental variables that researchers resolve these differences. are expected to affect its distribution. This The idea is to benefit from the ‘taxonomic makes it possible to predict the impact of intelligence’ of the tool and support a climate changes on biodiversity and prevent pilot study that analyses the regional plant the spread of invasive species, which is one catalogue of Brazil against the global index of the challenges identified in the European of plants within the Catalogue of Life. This biodiversity strategy. Such an approach also will be used to improve the linkage between helps in conservation planning, identify the terms in the lists and to enhance both the Catalogue of Life and the Brazilian List 1 http://dagendresen.wordpress. of Flora. In turn, this will allow taxonomic com/2012/10/15/gbif-occurrence-data-access- 10 and-download/. geographical and ecological aspects of working, different standards and different disease transmission, as well as in guiding funding mechanisms. These experiences biodiversity field surveys, among many other demonstrate the importance of taking on uses. board cultural, linguistic and scientific differences with regard to data challenges. Ecological niche modelling has recently These experiences also highlight the need become one of the most popular techniques to be cautious about applying models in macroecology and biogeography. One developed elsewhere “lock, stock and barrel” of the reasons for this trend is the broad to other disciplines or in other parts of the range of applications that arise when world. It may therefore be appropriate to assessing and projecting the ecological consider the best approach in each domain niche of a species in different environmental in light of current experiences and the long- scenarios and geographical regions. The idea term experience in bodies concerned with in EUBrazilOpenBio is to investigate and data sharing in individual disciplines or propose efficient ways of generating a large technologies, and generic long¬standing number of ecological niche models through international bodies, such as CODATA. a web service interface that applications like the Brazilian Virtual Herbarium can leverage, Further, the experiences of EUBrazilOpenBio integrating a new web-based modelling demonstrate the effectiveness of small- application in the EUBrazilOpenBio virtual scale funding to enable the integration of research environment. open data, software and services. This open integration process has helped overcome different approaches by focusing attention The value of small-scale on mutual benefits stemming from sharing funding for nurturing an smarter approaches. open access culture The experience of EUBrazilOpenBio clearly demonstrates the value of addressing In many countries, there are now human challenges when it comes to requirements to make publicly available international co-operation while placing all data that comes from research funded biodiversity as a grand global societal by the public purse. It is important to find challenge at its centre. While it has focused sustainable ways to make this possible. It on two specific demonstration use cases, the is equally important to find ways to provide infrastructure is designed to meet the needs value-add analytical and search services that of a wider range of biodiversity applications. make using data easier. EUBrazilOpenBio has therefore created a EUBrazilOpenBio is a prime mover within the new and diverse approach to addressing open access movement in terms of building biodiversity with high potential for social close ties Brazil and Europe that nurture an impact. Key challenges ahead for the project open access culture. It has gained specific are increasing the user base, sustaining the experiences in data exchange across borders, services along with the “glue” that holds building bridges across different ways of them together.

11 2.1 EUBrazilOpenBio: Exploitation and Future Synergies

“EUBrazilOpenBio has made the first steps Roadmap2. This open platform is designed towards the goal of supporting key challenges to enable resource centres, technology in the area of biodiversity and ecosystems, providers and user communities to play an including at the legal level where open access active role and build on the current portfolio is a vital prerequisite for interoperability. of use cases with the aim of boosting Such an approach promises well for future innovation through scientific excellence. The co-operation potential with initiatives such federation approach is important to support as LifeWatch in future funding streams like multi-national deployment that reflects the Horizon 2020. local environment while driving technical and policy alignment as needed. The Jacco Konijn, University of Amsterdam ultimate goal is to build on its current user and CReATIVE-B1 base, particularly through federated cloud EUBrazilOpenBio has dedicated efforts services targeting the ‘long tail of science’3. to building synergies across a range of The aim of the EGI Cloud Infrastructure biodiversity initiatives, as well as with EGI as Platform is to provide researchers and a pan-European cloud-based e-infrastructure research communities with control over that will also leverage federation with deployed applications, elastic resource the Helix Nebula eScience Cloud. Many of consumption based on actual needs, service these synergies will be valuable in defining performance scaled with elastic resource ways to extend the user community and consumption. This approach significantly identify potential future collaborations that reduces waiting time through immediate capitalise on its achievements. processing of workloads.

Ecological niche modelling is being EGI Cloud Infrastructure increasingly used by the biodiversity research Platform community, which can greatly benefit from cloud computing technologies, especially “The successful collaboration with in complex experiments involving multiple EUBrazilOpenBio/BioVeL has demonstrated species, algorithms and environmental how the BioVeL community could benefit scenarios. Over the last two years, BioVeL from the full potential of the cloud and help and EUBrazilOpenBio have joined forces make web services and workflows sustainable with the EGI Federated Cloud Task Force, for the long term for biodiversity research in with the aim of making openModeller ready Europe.” for cloud deployment and on optimising the software4. The successful outcomes of Nuno Ferreira, User Community Support, EGI.eu 2 Roadmap on Distributed Computing The EGI Federated Cloud Task is a core Infrastructure for e-Science and Beyond in Europe, activity that is part of a drive to create a May 2012, http://www.cloudscapeseries.eu/ Content/eLibrary.aspx?id=96&Page=2&Cat=0!19. platform based on the adoption of open 3 S. Newhouse, ‘Federated Clouds and standards in a way that is aligned with the the role of standards’, Cloudscape V, 27- vision and recommendations of the SIENA 28 February 2013, Brussels, http://www. cloudscapeseries.eu/Content/CloudscapeSeries. aspx?id=247&Page=3&Cat=0!12!6. 1 Jacco Konijn, University of Amsterdam, Position 4 D. Lezzi, ‘The EUBrazilOpenBio-BioVeL use case 12 Paper for the EUBrazilOpenBio Public Workshop. in EGI’, EGI Cloud Infrastructure Platform: State this collaboration means that the BioVeL ecological research)8, which coordinates and community can use the EGI Federated Cloud manages large amounts of monitoring data. to its fullest potential, demonstrating how There is increasing consensus that our cloud technologies could ultimately make planetary environmental systems are web services and workflows sustainable for strongly interconnected. Biodiversity and the long term for biodiversity research in ecosystem processes are not only affected Europe5. by - but also buffer - changes in the climate system, oceanic and atmospheric LifeWatch processes, as well as by the lithosphere and hydrosphere9. Priority areas for biodiversity “EUBrazilOpenBio has developed easy-to-use research will need to focus on ecosystem technologies for different purposes, while services, on the impact of biodiversity loss, fostering cross-Atlantic cooperation. This and adaptation/mitigation strategies, which is possibly the most important legacy of the have to be considered in the framework project. The project has paved the ground for of Earth as a single complex and coupled bringing together different communities with system. Model and scenario development technological and sociological processes. It is may assist in decision support for evaluating an example for scaling up further initiatives the effect of potential interventions. It is in support of biodiversity system research in worthwhile considering how the growing co-operation with related initiatives. Scaling portfolio of enabling technologies may up with new sustained cooperation models of promote targeted data production from distributed initiatives is crucial to enter a new diverse organisations to support large-scale research area. This will also promote better analysis and modelling. valorisation of new developments with public Spain is one of the countries participating in and private partners.” LifeWatch with the aim of providing services to Wouter Los, LifeWatch Project Coordinator applications, especially from the biodiversity domain. The government has mandated the 6 The ESFRI initiative, LifeWatch , aims to Spanish Research Council (CSIC) to technically build an infrastructure for biodiversity articulate LifeWatch, which includes the and ecosystem research. Its plans include participation of the Barcelona Supercomputing integrating smart sensor networks from Center and the 13M department at the different locations, metadata standards, Technical University of Valencia. One of the communication capabilities. This integration goals is to build an infrastructure in Seville and requires facilitating and standardising data prototype a virtual lab with integrated real- gathering to provide pre-processed and time sensors in field. LifeWatch stakeholders in interoperable data to virtual labs and sensor Spain have manifested interest in the results of data portals. Virtual Lab integration in the the use cases developed in EUBrazilOpenBio. ICT-core will leverage the involvement of The project therefore expects to transfer the 7 LifeWatch in EUDAT through LTER-Europe services developed and tailor them to the (European monitoring network for long-term purposes of LifeWatch. of the art and Demo, EGI Technical Forum, 16-20 September 2013, Madrid. EU Brazil Cloud Connect 5 http://www.egi.eu/news-and-media/newsfeed/ news_2013_0038.html. 6 LifeWatch, e-Science European Research Infrastructure for Biodiversity and Ecosystem 8 LTER-Europe, European Long-term Ecosystem Research, http://www.lifewatch.eu/. Research Network, http://www.lter-europe.net/ 7 EUDAT, European Data Infrastructure, http:// 9 W. Los, Position Paper for EUBrazilOpenBio www.eudat.eu/. Public Workshop, September 2013. 13 EU Brazil Cloud Connect1 (EUBrazilCC) is »» It will introduce in Brazil the aimed at creating a federated research Cloudscapeseries.eu, which is now a self- e-infrastructure based on a user-centric sustained event in Europe. approach. It will adapt existing applications Two of the three EUBrazilCC use cases relate to tackle new scenarios in areas of mutual closely to biodiversity, leveraging several interest to Brazil and Europe with high social key outcomes of EUBrazilOpenBio, that is, impact and innovation: neglected diseases, the Leishmania Virtual Lab and research climate change and heart simulation. It will on Biodiversity and Climate Change. The integrate frameworks and programming EUBrazilOpenBio cloud-based ecological models for scientific gateways and niche modelling service, to be released complex workflows that meet not only the as , will be used in these use requirements of three specific use cases cases to compute, test and project models but potentially also a much larger user for predicting the distribution of species, community. targeting vectors and plants as indicators of It will drive advances in several research climate change. areas, such as virtualised resource federation using clouds that promote sustainability, in programming frameworks in the cloud, CloudWATCH including big data analysis, as well as the demand for high capability computing As an EU cloud observatory, CloudWATCH 2 and data in the cloud. The project will (cloudwatchhub.eu) aims to accelerate the pursue interoperability and integration adoption of cloud computing for business, of existing systems, such as coordinating government and research by focusing on the technical interoperability with Helix Nebula, real issues, practical, independent guides incorporating relevant standards, and and success stories. It will support the extending links with EGI. development of common standards profiles to support user community exploitation of The targeted user community extends cloud computing in the three areas of focus beyond the project boundaries. EU Brazil which can be endorsed by standards bodies. Cloud Connect has already received interest The project will also create a long-term from: sustainable, digital platform to showcase »» The SINAPAD network and Brazilian National the results of European cloud computing Institutes of Science and Technology: INCT research and development projects and, of Innovation in Neglected Diseases, INCT more generally, offer key information for Virtual Herbarium of Flora and Fungi, INCT service providers and users in Europe for on Scientific Computing in Healthcare supply to meet demand. Applications (with the direct participation Through its partnership with EGI.eu, it will of LNCC), INCT for Climate Change. define an outreach plan to disseminate the »» The LifeWatch-ESFRI initiative and the outcomes of the EGI federated use cases European Network for Earth System and support strategies aimed at ensuring Modelling (ENES). the long-term sustainability of all value-add cloud resources coming from public funding. • It will also build close ties with CloudwatchHUB.eu, with the aim of It will also monitor educating communities on standards to models such as OpenStack in Europe3. It ensure interoperability and broader choice. 2 www.cloudwatchhub.eu. 1 EU Brazil Cloud Connect – EU Brazil 3 For examples of OpenStack private cloud Cloud Computing for Science, www. deployments for research, see S. Maffioletti, 14 eubrazilcloudconnect.eu. Elasticluster: Provisioning computing clusters will assess pioneering national strategies looked into common biodiversity data driving federated approaches to research services, the potential for common layers, infrastructure leveraging secure cloud infrastructures, tools, reference models and deployments as enabling platforms for the identity, assessing also technical challenges, digital researcher 20204. such as real-time monitoring, genomic pipelines, and access to taxonomic data, as well as modelling, including earth and International potential for satellite data and modelling niches. As a integration and synergies result, EUBrazilOpenBio has been invited to contribute knowledge of mutual interest. The COOPEUS project5, which is co-funded by the EC under the Research Infrastructures action of the 7th Framework Programme and National Science Foundation (NSF, U.S.), brings together scientists and users from major environmental related research infrastructures in Europe and the US. The aim is to interlink these activities to create new synergies driving the creation of a global integration of existing infrastructures.

With regard to biodiversity challenges, COOPEUS, EUBrazilOpenBio, ENVRI6, ViBRANT7, EUBON8, the Global Biodiversity Information Facility (GBIF), LTER, EGI, EUDAT and LifeWatch started to explore international collaborative efforts at the EGI Technical Forum in September 2013. The aim was to understand how to share progress and define the drivers for future co-operation. In particular, the discussions in the cloud with Python and E. Fernandez, Federating clouds in OpenStack, EGI Technical Forum, 16-20 September 2013, Madrid, https:// indico.egi.eu/indico/materialDisplay. 4 See, for example, P. Kunszt, S. Maffioletti, D. Flanders, M. Eurich, T. Bohnert, A. Edmonds, H. Stockinger, S. Haug, A. Jamakovic-Kapic, P. Flury, S. Leinen, E. Schiller, Towards a Swiss National Research Infrastructure, the 1st International FedICI’2013 workshop: Federative and interoperable cloud infrastructures, 26 August 2013, Aachen, http://www.dps.uibk.ac.at/~gabor/ FedICI13/program.html. 5 COOPEUS, http://www.coopeus.eu/. 6 ENVRI, supporting common operations of ESFRI environmental research infrastructures, http:// envri.eu/. 7 ViBRANT, Virtual Biodiversity Research and Access Network for Taxonomy, http://vbrant.eu/. 8 EU BON, Building the European Biodiversity Observation Network, http://www.eubon.eu/. 15 3. Facing up to the challenges

3.1 e-Science in Horizon 2020

The overriding goal of the European themselves become the infrastructure – a Commission is to make every researcher valuable asset, on which science, technology, digital by 2020 through the development the economy and society can advance”3 and usage of e-infrastructure across an The GRDI2020 Roadmap is a vision to ensure increasingly wider range of disciplines while that global research data infrastructures also meeting the overall needs of society1. harness the growing amounts of scientific In the past, competition in science has been data and knowledge and optimise data beneficial in stimulating progress. However, movement across disciplines4. The aim is to current economic constraints call for a enable multi- and inter-disciplinary research renewed focus on research investments that by defining better data models and query give short-term returns on investment while language, developing advanced data tools minimising overlap with regard to large-scale and infrastructure services, supporting funding. Cost and energy efficiencies need to open linked data spaces, supporting move much higher up the list of priorities, for interoperability between scientific data and example, by leveraging the efficiencies that literature, and creating new professions such virtualisation and cloud can help deliver2. as data scientists. It is also important to improve co-ordination In addition to ensuring sustainability, of research e-infrastructure at the level of the following pillars are all central to national research centres, promoting the e-infrastructure in Horizon 2020: role of centres of expertise and optimising governance. Taking on the grand global »» Scientific data: wider range of disciplines challenges will require better co-ordination leveraging data; problems of scale and among stakeholder communities, identifying complexity; importance of sharing and synergies across different research federating data. communities. »» Computing facilities: software and The 2030 vision of the High Level Group on instruments, including virtual research Scientific Data is a “scientific e-infrastructure environments and virtual research that supports access, use and re-use, and trust communities. of data. In a sense, the physical and technical »» Networking: global reach and connectivity infrastructure becomes invisible and the data “at the speed of light”. 1 eScienceTalk Briefing, Horizon 2020, April 2013, 3 Riding the Wave – How Europe can gain from http://www.e-sciencetalk.org/briefings/EST- the rising tide of scientific data, Final Report Briefing-26-HorizonWeb.pdf. of the High Level Group on Scientific Data, 2 On the role of cloud computing in contributing A submission to the European Commission, to GHG emission reductions, see P. Thomond, The October 2010, http://cordis.europa.eu/fp7/ict/e- Enabling Technologies of a low-carbon economy: infrastructure/docs/hlg-sdi-report.pdf. A focus on cloud computing, February 2013, 4 C. Thanos, CNR, H. Hanahoe, Trust-IT, Global commissioned by the Global e-Sustainability Research Data Infrastructures – The GRDI2020 Initiative and , http://gesi.org/ Vision, March 2012, http://www.grdi2020. assets/js/lib/tinymce/jscripts/tiny_mce/plugins/ eu/Pages/SelectedDocument.aspx?id_ ajaxfilemanager/uploaded/Cloud%20Study%20 documento=6bdc07fb-b21d-4b90-81d4- 16 -%20FINAL%20report_1.pdf. d909fdb96b87. The requirement that publicly funded Research Data Alliance has been set up push research results be made available in open for the convergence of standards on how access also means that we need to be more data is stored and categorised8. vigilant in demonstrating the value of In May 2013, Neelie Kroes, Vice President of funding research. The European Commission the European Commission, highlighted the is driving open science as a priority in future economic and social benefits of big data - project proposals under Horizon 2020. The the fuel driving the knowledge economy9. In open science approach is also in accordance the public sector, better data allows services with the Nagoya Protocol5, adopted at the that are more efficient, transparent and 10th meeting of the Parties to the Convention personalised. Open access to results and on Biological Diversity (October 2010). associated data allow new ways to share, The open sharing of data will help compare, and discover: permitting whole researchers to focus their efforts more new fields of research. For scientists and effectively when working to tackle some of citizens, data is the key to more information the biggest challenges of the 21st century. and empowerment, and to new services The successful development and deployment and applications. At the policy level, the EU of the OpenAIRE framework6 allows revised Directive is aimed at making it easier researchers to deposit their publications to use and re-use public data, with lower online where they can be freely accessed by charges and without complicated conditions other researchers, demonstrating that open for re-use. access marks an important step towards an open data culture. OpenAIRE, LIBER and COAR strongly support the development of an open, interoperable e-infrastructure for Action addressing the area of Research Data scientific data through the engagement of e-infrastructure – Responses from LIBER, COAR the relevant actors, including libraries and and OpenAIRE, http://www.openaire.eu/en/ component/content/article/9/453. repositories supporting researchers in their 8 Research Data Alliance, https://rd-alliance.org/ 7 scientific endeavour . The international node and RDA Europe, https://europe.rd-alliance. 5 Nagoya convention http://www.cbd.int/abs/. org/Pages/Home.aspx. 6 OpenAIRE, Open Access Infrastructure for 9 The Economic and social benefits of big data, Research in Europe, http://www.openaire.eu/. SPEECH/13/450, 23/05/2013, http://europa.eu/ 7 DG Connect’s Proposal for a Framework for rapid/press-release_SPEECH-13-450_en.htm.

17 3.2 Challenges for e-infrastructure

In order to ensure the future sustainable in the scientific research process2. These two development of e-infrastructure and drive pillars of modern scientific communication international co-operation on grand global support researchers in storing, curating, challenges, it is important to address the sharing and discovering the data and following challenges. publications they produce. However, lack of integration hampers the very objective of modern scientific communication, which is Lack of service integration publishing, interlinking and discovery of all and interoperability of the outcomes of the research process, from research e-Infrastructures the experimental and observational datasets to the final paper. In order to bridge this The e-Infrastructure Reflection Group (e-IRG) gap, the paper proposes the development sees the integration of services and the of scientific communication infrastructures. interoperability of research e-Infrastructures Such infrastructures should be capable as crucial for paving the way towards a of facilitating interoperability between general-purpose European e-Infrastructure. data centres and research digital libraries Its White Paper 2013 identifies a number and of providing services that simplify of the current barriers to realising this the implementation of the large variety of vision1. According to e-IRG, the challenges modern scientific communication patterns. are mainly organisational and structural rather than technical. Diverse ways of Business and technical representing and funding key actors exist with insufficient cohesion between them. challenges Better co-ordination is therefore needed among the different e-infrastructure pillars, The EC Tender Study, Cloud computing for components and services, e.g. networking, e-Science and Public Authorities, identifies high throughput and high-performance challenges at the business and technical computing, data infrastructures, software/ levels with regard to the future evolution of middleware, including authentication and e-infrastructure. authorisation infrastructures and virtual The need to develop new business and research environments that are to be used funding models for e-infrastructure, by international research communities. where users start paying for the services Additional challenges include legal they consume and providers compete for issues and hurdles to accessing and using innovation money by generating revenue e-infrastructures. from their customer base. New funding ‘A vision towards scientific communication models should be easy to manage, because infrastructure’ highlights the importance otherwise the cost of fee for service of integrating research digital libraries and mechanisms outweighs the benefits. It is scientific data centres in order to ensure also important to move beyond the CAPex access to the results of complementary phases funding model in research funding in that

1 e-IRG White Paper 2013, http:// 2 D. Castelli, P. Manghi, C.Thanos, ‘A vision www.e-irg.eu/images/stories/dissemination/e- towards scientific communication infrastructure irg_white_paper_2013_short_version.pdf; Long – On bridging the realms of digital libraries and version, http://www.e-irg.eu/images/stories/ scientific data centers’, Springer, July 2013, dissemination/e-irg_white_paper_2013_-_final_ vol. 13, pp. 155-169 http://link.springer.com/ 18 version.pdf. article/10.1007%2Fs00799-013-0106-7. cloud computing does not fit well with the service delivery should respect national typical science grant budgets. There are two circumstances. The wider Latin American main challenges to be overcome: fostering region is also encountering difficulties in a change in funding policies and raising embracing cloud computing, where plans for awareness of the many opportunities of migrating to the cloud are failing to take off using cloud computing for research. Finally, due to the lack of adequate resource (both SMART 0055 proposes the creation of a data financial and human). These issues have and e-research marketplace of scientific and wider, international implications and call for social science applications to help support specific actions to assess, support and co- wider uptake. One possible approach could ordinate the adoption of new technologies. be the creation of application store-like services and possibly charging a small fee, payable to the data/software provider with a high flux of transactions, plus a consultation fee. From an EU and Brazilian collaboration perspective linking the market place and services across borders could ultimately boost opportunities for micro, small and medium-sized businesses in both areas.

The need for interoperable connectivity and cost-effective services. Many NRENs in Europe and across the globe are facing financial constraints at a time when their user communities are demanding new services. Despite several pioneering activities3, most NRENs do not have concrete plans to deploy cloud-based plans4, with the risk of fragmentation of services and unclear business models. The DANTE Strategy 2012-2015: More for all5 aims to ensure that all participating NRENs play their part in ensuring long-term sustainability and in defining their own strategic plan to better serve their communities in an increasingly global research environment. However, the DANTE strategy stops short of proposing a “one-size-fits-all” approach as new

3 See, for example, Advanced North Atlantic 100G Pilot https://www.surf.nl/en/ about-surf/subsidiaries/surfnet and the Enlighten Your Research Global initiative http://www. hpcwire.com/hpcwire/2013-06-20/five_leading_ national_research_and_education_networks_ launch_competition.html. 4 ASPIRE – The Future Roles of NRENs, September 2012, http://www.terena.org/activities/aspire/ docs/ASPIRE-future-of-nrens.pdf. 5 http://www.dante.net/Media_Centre/ Publications/Pages/DANTE_Strategy_2012-15. aspx. 19 3.3 Biodiversity: challenges and priorities

Modern society depends on data-driven developed at regional, national and multi- models to understand complex systems national scales, as a “system of systems”. and to coordinate most areas of life and The planned ESFRI LifeWatch3 research of planning, from climate to medicine and infrastructure, EBONE4 and EU BON5 will from economics to transport. Biodiversity ultimately form the European contribution. represents one of the major aspects that In order to address these issues, the white remains hard to integrate into these paper, A decadal view of biodiversity systems, even though the human race has informatics: challenges and priorities been observing and measuring biodiversity (April 2013)6 makes 12 recommendations and its patterns for centuries, and even focus on the following strategic areas for though we recognise the importance of such biodiversity informatics. Strategic area understanding not only for conservation one: reducing duplication, improving but for all aspects of land planning, primary collaboration, facilitating the creation of industry and health. new knowledge by synthesis using the data Coordination of programmes. One of the and tools generated. Strategic area two: most important lessons learnt from the EU- enhancing usability and better deployment BrazilOpenBio project is the very different of existing technologies. Strategic area approaches adopted by biodiversity three: developing new structures to facilitate initiatives in Brazil and Europe despite the the integration of foundational technologies many common challenges. One of the main by exploiting technologies in new ways. differences is the competitive environment Strategic are four: New technologies, such as in which initiatives have been funded at observatories employing novel sensors are EU level. A vast array of initiatives exist. delivering data in unprecedented volumes, Many of these projects directly address the especially molecular data, as the Genomics challenges of deploying e-infrastructures for Observatories Network has emphasised. biodiversity science. While they have similar This will require the development of new characteristics, they differ substantially technologies, or adaptations of technologies in their architectures and technology from related fields, new information systems, approaches. These different approaches and platforms offering overviews of detectors reveal the lack of a common understanding and experimental set-ups for biodiversity on how best to deploy e-infrastructures research to facilitate exploitation of the for biodiversity and ecosystem research. opportunities presented. As a result, there are overlaps, dead-ends, The White Paper also recommends that and often, complete lack of mainstream future proposals leverage completed industrial involvement. and existing funded projects to gain the Integration of existing tools and technologies maximum benefit for the future biodiversity is important within the broader framework infrastructure. Proposals should explain of GEOSS1, where GEO BON2 aims to deliver a decentralised and distributed informatics 3 http://www.lifewatch.eu/web/guest/home infrastructure. The system will be based on 4 http://www.wageningenur.nl/en/Expertise- a service oriented architecture approach, Services/Research-Institutes/alterra/Projects/ EBONE-2.htm. building mainly on contributing systems 5 http://www.eubon.eu/. 6 A. Hardisty, D. Roberts, A Decadal View of 1 http://www.earthobservations.org/geoss.shtml. biodiversity informatics: challenges and priorities, 2 http://www.earthobservations.org/geobon. April 2013, BioMed Central, http://www. 20 shtml. biomedcentral.com/1472-6785/13/16. how they have taken earlier and current names’ that need at a certain moment project results into account and demonstrate to be removed from circulation. The that they are building on them rather than EUBrazilOpenBio project has provided a more offering incompatible alternatives. Letters versatile cross-mapper web service working of Support from. other projects should be in combination with a partly automated used to demonstrate that community-wide piping tool funneling data back to the global discussion and acceptance of proposals has species database sources. Species 2000 will taken place prior to submission for funding, include the cross-mapping tools and piping and thus contributing to the community’s mechanism in the regular CoL services. decadal vision. Further developments in biodiversity The White Paper summarises the opinion of informatics should address key challenges a large part of the biodiversity informatics for the future with respect to the Catalogue community with respect to their views on of Life. future developments in terms of systems, Proliferation of digital sources for information needs and global sustainability. biodiversity data on the web. The growing Enormous progress has been made in number of (thematic) aggregator sites establishing a framework for sharing harvesting from these has resulted in a collection and observation data, such as complex environment for users to locate the Global Biodiversity Information Facility properly validated up-to-date taxonomic (GBIF)7 and its Global Biodiversity Informatics information. The community should consider Outlook (GBIO)8, at the species level by the a more collaborative approach to integrating Encyclopedia of Life (EoL)9 and with respect services and reduce the number of portals to genetic information by the Consortium for and sites. the Barcode of Life (CBOL)10. An integrated systems approach, which moves significantly Increasing number of unauthoritative beyond taxonomy and species observations, sources. The growing number of web sources is needed to truly understand biodiversity. with old names, misspellings and misapplied names hampers interoperability and the The Catalogue of Life (CoL) is a global reliable recombination of information effort that is instrumental in connecting sources. It further confuses the taxonomic the thousands of resources and making realm. There is a pressing need for a “cleaning them interoperable in a meaningful way. up” process. The biodiversity community This authoritative synonymic index is a true should come together to address this core-product of the taxonomic community, challenge in ways such as those facilitated unifying expert views and opinions in a by recent CoL initiatives. single database resource. European funded initiatives like 4D4Life and i4Life have laid Vulnerable funding basis and individual important foundations, facilitating a virtual specialists working on a shoestring budget. taxonomic community and underpinning an An integrated approach to a shared ICT iterative cycle of matching the CoL against infrastructure to store and treat data, and other sources with taxonomic components a more robust funding model for data that will result in an enriched CoL (more processing would help address these issues. names) plus a system with ‘unplacable Further automation of data processing in 7 http://www.gbif.org/. taxonomy to ensure reliable, up-to-date 8 Global Biodiversity Informatics Outlook: biodiversity information services. Priority Delivering Biodiversity Knowledge in the areas to further enhance the usefulness Information Age, September 2013, http://www. gbif.org/orc/?doc_id=5353. of the CoL for the biodiversity community 9 http://eol.org/. include filling the gaps in the CoL, such as 10 http://www.barcodeoflife.org/. 21 continuing efforts on fossil taxa. There is resources and offer the kind of high-level also a pressing need for stable identifiers. syntheses and indicators required to deliver Overcoming IPR issues to ensure the full a rich understanding of biodiversity and openness of the CoL. The community as a its interrelationships with ecosystems and whole should contemplate a full open data society. model to enhance usage and stimulate feedback. Moving towards Creative Common licenses underpinning data providing and Biodiversity Challenges and use is therefore recommended. the Aichi Targets

Sustainability of biodiversity information Nearly all the Aichi Targets2 depend services. Larger taxonomic facilities might on achieving true interoperability and consider adopting data services generated accessibility between biodiversity data by subsidized projects or grants and sources in order to further science and integrating them in their core business to deploy the knowledge for policy and ensure continuity. It is important to offer management3. The science community such services as part of the GBIF mission on needs proper maintainable ICT tools to sharing data. support this interoperability. A large ICT- Further integration of basic taxonomic tool-basis is available as a result of a great services. Integrating these services into number of biodiversity data sharing projects (automated) workflows and research in the EU. Now it is important to further environments of other disciplines has refine and deploy those tools for achieving several financial benefits, such as a more synergy between the information resources cost-effective use of resources, reducing and systems. duplication of effort. Future developments in biodiversity At the international level, the Global informatics must focus on completing a Biodiversity Informatics Outlook (GBIO)1 validated taxonomic index of all known offers a framework for coordinating and living organisms. Such a complete synonymic planning our informatics activity to address index is crucial to achieve interoperability these critical needs. It identifies four major between the various multidisciplinary areas requiring global investment and data sources in biodiversity research from collaboration. First and foremost, changes taxonomy to ecology, from molecular in culture and infrastructure are required data to physiology, from conservation to to ensure that all data can be managed in economic value of species. The importance stable, carefully-curated repositories that 2 http://www.cbd.int/sp/targets/. Aichi Targets, intelligently support use and analysis. http://www.cbd.int/sp/targets/. The EU target Secondly, all past and future observations for 2020 is: “to halt the loss of biodiversity and measurements of biodiversity need and ecosystem services in the EU by 2020 and to be delivered in digital forms that can restore them insofar as possible, and step up the EU’s contribution to averting global biodiversity be managed in this infrastructure. Thirdly, loss”. The EU vision for 2050 is: “Biodiversity and a range of comprehensive discovery and ecosystem services - the world’s natural capital access tools are required to deliver these – are preserved, valued and, insofar as possible, data in the form that researchers need. restored for their intrinsic value and so that they Lastly, investment is required to build and can continue to support economic prosperity and human well-being as well as avert catastrophic optimise models that exploit all available changes linked to biodiversity loss”, http:// 1 Global Biodiversity Informatics Outlook: ec.europa.eu/environment/nature/biodiversity/ Delivering Biodiversity Knowledge in the policy. Information Age, September 2013, http://www. 3 Position Paper presented by William A. Gray, 22 gbif.org/orc/?doc_id=5353. Species2000. of a unified global register of species based by using an e-science approach where on the opinion in the taxonomic community the services are provided through portals cannot be stressed enough. Automation that hide the underpinning details of the for generating and continuously updating software and data. It is important in these such a central taxonomic register from an types of environment that new services can array of quality sources is a condition for be created that extend the use of the data. success. A sturdy cross mapping mechanism For the Catalogue of Life this would mean for continuous checking names in the providing the data in an open manner with the central register against other sources, and basic services provided but extending these the automatic funnelling of unlisted names services with tools that allow the catalogue via piping tools towards the authoritative to be capturing additional information that resources in the taxonomic community for increases its coverage of the world species processing is a necessity to clean up the and their alternative names. millions of misspelled and incorrect names Global initiatives such as the Catalogue of in our resources. life are of great importance but prove hard to In biodiversity there is a large range of data fund and sustain. It is important that global and software services that are required to initiatives are built upon, but also form conduct any research successfully. Also part of, regional programs and activities. the data and the services are growing and This promotes a better local embedding, evolving all the time and it is impossible to shared responsibility and ownership, and define rigid standards for these. It is therefore a more flexible funding strategy. A global important that a federated approach is used versus regional hub structure will serve this which enables data prepared in diverse purpose well. ways and services created under different It is essential to work on deployment of systems to interoperate seamlessly. This technology that is feasible, affordable and can be achieved by storing services and maintainable. Some tools may be open- data in a heterogeneous environment where source other can be commercial. We must new services and data can be added to the recognise an infrastructure of data, and an environment in a way that is simple and infrastructure of tools. allows them to evolve. This can be achieved

23 4. EUBrazilOpenBio Joint Action Plan

Together with its Strategic Advisory Board, enterprises. EUBrazilOpenBio has analysed the current EUBrazilOpenBio and its Strategic Advisory biodiversity informatics, cloud computing Board propose potential future scenarios to and policy landscape to identify possible address the global challenges through joint future synergies that can help tackle key collaboration. These scenarios highlight challenges within the context of Horizon 2020 co-operation opportunities not only for and beyond. The analysis spans biodiversity multi-disciplinary research but also for strategies, global community efforts on building stronger ties between the business biodiversity such as the Global Biodiversity communities, and supporting sustainable Informatics Outlook (GBIO-GBIF), the Brazil- development through the green economy. EU high-level policy dialogue, including the Most of the proposed scenarios require current Joint Action Plan 2012-20141, and relatively small-scale investments. the VI Brazil-EU Summit (January 2013)2, the EU Biodiversity Strategy 20203 as well as the Central to the proposed future collaborative outcome document of Rio +20 (June 2012). scenarios is the ability to harness existing technologies with a view to contributing The emphasis on sustainable development to a “system of systems”, by adopting a through the implementation of the green service-oriented approach to addressing economy is one of the primary objectives the challenges posed by biodiversity loss. of the Rio +20 outcome document, The Future collaborative work on biodiversity Future we Want. The January 2013 VI Brazil- and climate change should therefore take EU Summit reiterates the importance of on board the recommendations of the sustainable development through continued White Paper, A decadal view of biodiversity co-operation between Brazil and Europe, informatics: challenges and priorities, as highlighting the role of ICT, research and well as recommendations formulated by development in tackling grand global other relevant reports cited in the paper4. challenges. The Summit Statement also stresses the need to strengthen contacts As more and more scientific disciplines between business communities, promoting become data hungry, and a rising number of both investments and innovation. In social scientists (economists, e-humanities particular, it calls for actions increasing researchers) follow suit, governments and support to micro, small and medium international bodies are starting to commit to developing frameworks that allow 1 V European Union-Brazil Summit Statement and Joint Action Plan, pp. 5-28, http://www.consilium. convenient, secure and intelligent access and europa.eu/uedocs/cms_Data/docs/pressdata/EN/ exchange of data. data is foraff/124878.pdf. presenting new opportunities to create new 2 VI Brazil-EU Summit Joint Statement, 24 4 A. Hardisty, D. Roberts, op cit. See also The January 2013, Brasilia, http://eeas.europa.eu/ LERU Biodiversity Working Group: Challenges delegations/brazil/documents/eu_brazil/ for biodiversity research in Europe, Procedia declaration_summit_january2013_en.pdf. Social and Behavioral Sciences 2011, 13:83-100, 3 European Parliament Resolution http://ac.els-cdn.com/S1877042811001777/1- (2011/2307(INI)), April 2012 and Our life s2.0-S1877042811001777-main. insurance, our natural capital: an EU Biodiversity pdf?_tid=6dd1c724-08dd-11e3-b46f- Strategy to 2020, (COM(2011)0244), http:// 00000aab0f6b&acdnat=1376923467_ ec.europa.eu/environment/nature/biodiversity/ f3b7667bf49902d2507f02ad7d8a5732. comm2006/pdf/2020/1_EN_ACT_part1_ 24 v7%5B1%5D.pdf value-add services, including the creation of towards realising the Commission’s wish new start-ups to push services to market5. that consortia formation becomes more For example, current case studies in the collaborative and less competitive7. There health sector are demonstrating cost savings is now consensus on a common goal, through shared analytics and services6. The which is to deliver predictive modelling for future collaborative scenarios also centre on biodiversity. The community also recognises the priority of nurturing an open data culture, that this is a multi-decadal ambition. The leveraging new opportunities at every level first steps will be to conduct a scoping study, of society. equivalent to the design studies with which the astronomers start a new telescope. Research data Goals for the short term include: e-infrastructure supporting »» Clarity of vision with greater focus on end- multidisciplinary science, goals in the bigger picture. analytics and international »» Good, simple tools with syntactic cooperation operability. »» Community identity.

Biodiversity is inherently connected to fields »» Links, both networking and links between such as ecology, genetics, epidemiology, people, links between machines holding climate science, economics, social data resources and links with other anthropology and theoretical modelling. communities, especially ecology and Research approaches therefore need to policy. become increasingly multi-disciplinary and trans-boundary. An immediate challenges is to question the need to digitise all sources of information The biodiversity informatics community and to start asking specific questions about has recently made significant step what the end-goals of an activity that 5 Brazil’s open government initiative is dados. requires considerable effort. Answering gov.br. Sixteen EU member states have currently this question will dictate what further data implemented an open government data is needed and can also the dictate the data initiative: Austria, Belgium, Denmark, Estonia, quality required. Finland, France, Germany, Greece, Ireland, Italy, Netherlands, Portugal, Slovak Republic, Spain, The 3 key words are: integration; co- Sweden, and the UK. Norway and Switzerland are operation; promotion. the two associate countries with an implemented initiative. EUBrazilOpenBio and the synergies it has 6 See, for example, use cases incubated by Open established at various levels propose the Data Institute (ODI, UK), http://www.theodi.org/, following action lines for future co-operation such as OpenCorporates. With a small staff of just two persons, the company managed to collate Usability and better deployment. A user- data from 51 million companies with the aim friendly, user-centric approach is crucial of creating an open database for the corporate for successful uptake by the wider research world (similar to OpenStreetMap) connecting community, from biologists to statisticians and adding clarity to corporate data. See also the science|business webinar on Big Data with Neelie and ecologists, who have much to contribute Kroes, VP of the European Commission, 23 May to cross-disciplinary science but who 2013, http://www.sciencebusiness.net/events/ typically lack advanced IT skills. The same SmarterData/ and the NIST workshop on Cloud is true for biodiversity researchers and Computing and Big Data, January 2013 with examples of new developments in healthcare 7 Biodiversity Informatics Horizons 2013, data, http://www.nist.gov/itl/cloud/nist-joint- 3-6 September 2013, Rome, http://conference. cloud-and-big-data-workshop-webcast.cfm. lifewatch.unisalento.it/index.php/EBIC/BIH2013. 25 professionals working at natural history such as adopting an approach similar to museums and botanical gardens, who have the VENUS-C Open Call with free trial much to gain from the current achievements, usage and technical support. Such an for example, EUBrazilOpenBio. Sufficient approach was also acknowledged during resources dedicated to marketing service(s), the EUBrazilOpenBio Public Workshop particularly outside the project community in September 2013. This workshop also and ideally in significant numbers. highlighted the need to enlarge the community vision and ambitions by Ireland, Italy, Netherlands, Portugal, Slovak addressing challenges at much larger scale Republic, Spain, Sweden, and the UK. Norway with increased funding to match the level and Switzerland are the two associate of difficulty involved. countries with an implemented initiative. »» Providing opportunities for technology New knowledge creation and integration of transfer, which is high on the Brazil and foundational technologies. Use cases should European ICT agendas. Planning should focus on the creation of new knowledge by include the involvement of specialised synthesis activities using the data and tools commercial organisations to facilitate and thus generated, including open data and data strengthen enterprise clusters within the encoding, and developing new structures cloud ecosystem and encourage a risk- using existing technologies, such as building taking culture. A shift of focus towards a repository for classifications and a single professional commercial assistance portal for currently accepted names. that prepares research results for Overall, the e-infrastructure landscape needs commercialisation would be an important to be part of a wider ecosystem of cloud- part of the technology transfer process to services that help create economies of scale. help avoid the “valley of death”. This will For example, NREN jointly procured services require changes in the ways academic could enable better links between research technology transfer offices typically and other public institutions to open up new operate. Use cases could also demonstrate collaboration opportunities. the potential for new uses of existing Short- to mid-term opportunities identified solutions, for example, cross-mapping with a view to supporting these future tools to address social challenges like developments include: criminal records; services that can be scaled out to public authorities (e.g. »» Fostering the development and provision climate/weather monitoring services) and of analytics . educational services adapted for smart »» Supporting an international, inter- devices. university consortium for socio-economic A parallel development could be the creation research to help define impact assessment of an EU-Brazil joint Biodiversity and Business models and related studies of mutual Platform along the lines of the European interest to Brazil and Europe. model1. The platform could promote both »» Encouraging the involvement of savvy SMEs commercial offers and relevant exploitable to support the development of innovative assets from funded initiatives to ensure data and science-tailored services around long-term sustainability, including their an open and extensible open science commercialisation and internationalisation. cloud stack, including increased access to the cloud through smart devices [61]. 1 http://ec.europa.eu/environment/biodiversity/ business/index_en.html. EP Resolution, The procurement of cloud computing April 2012, Action 47, http://ec.europa.eu/ for processing and storage would play environment/nature/biodiversity/comm2006/ 26 a key role in encouraging user uptake, pdf/EP_resolution_april2012.pdf. Over time, the platform could incorporate it is important to foster a “mind set” shift products and services in areas such as smart that focuses on real-world examples, with cities, sustainable urban development and users communicating their progress through sustainable tourism. Such services relate a blog or user stories and the involvement directly to the implementation of the green of cloud champions sharing best practices economy. Commercial services should be and practical advice. Such an approach based on clearly defined business models will enable the shift from definitions and and ensure corporate social responsibility. functionalities towards tangible returns on investment. Supporting Training and General educational programmes should eSkills Development focus on the benefits of cloud computing and on real issues and concerns of business leaders, governments and public authorities, The experiences of EUBrazilOpenBio have such as legal issues, contracts, data shown that training programmes are crucial protection, trust and security. Educational to increase the adoption of cloud computing services also need to communicate the and ICT tools and to ensure that underlying relevance and maturity of standards for systems are efficient and available in the interoperability. future. Research e-infrastructures are an important vehicle for achieving results that The Strategic Advisory Board within are vital to creating better knowledge and EUBrazilOpenBio highlights the following more informed policy making. Without the considerations: availability of high-quality hands-on and »» Overcoming skill shortage in specific areas, remote training facilities, without which such as digital research. it will be hard to foster the uptake of tools and services that have the proven potential »» Supporting the recruitment and training of to improve biodiversity research. Training is operational system professionals to ensure needed not only at the level of the resources that underlying systems run as efficiently but also at the level of the tools, as shown as possible, now and in the long term. by the EUBrazilOpenBio ecological niche »» Drawing on lessons learnt from current modelling use case. Future directions need training programmes, proposals should to ensure that training on the use case also consider the availability, structure of includes training on niche models as the existing training facilities and allocate tools are not useful without the underlying sufficient budget for promotional scientific knowledge, which is also a project campaigns, including tactics to overcome recommendation to GBIF. sociological barriers, timeliness and Investments need to respond to these tangible learning outcomes. requirements through dedicated educational »» Addressing data sharing issues. Despite and training programmes at all levels but increasing recognition of Intellectual individually designed to meet a specific Property Rights (IPRs) from different data purpose and with clear outputs that can be providers, there is still a need to focus quantified and certified. on making, at least partially, data public In the field of biodiversity, one possible and sharable to foster better interaction. approach could be to ensure budget Commercial usage is not an issue since allocation for co-located training sessions at fostering industry on adopting biodiversity community events (e.g. training sessions at preservation is part of their aims. events for botanists). In the scientific domain, 27 Supporting large-scale, Supporting a market for long-term information open science and open processes data services, including opportunities for public- The EU-Brazil Joint Action Plan 2012-2014 private partnerships: short- highlights the benefits of establishing joint facilities in the field of biodiversity and term-long-term sustainable tourism. Such facilities could leverage opportunities to explore very large- This scenario identifies ways in which the scale, very long-term information gathering involvement of businesses and public- processes in nature. EU-Brazil collaborative private partnerships could support open projects in the field of biodiversity could science and in line with one combine with others to stimulate an of the recommendations of the 2013 EU- international PhD training college, bringing Brazil Summit is to strengthen ties between together the best students from different the business communities. disciplines, studying under a common One possible way of building closer ties programme with common goals, meeting between the Brazilian and European business virtually and annually. communities would be to create conditions The Strategic Advisory Board within favourable to generating a data market. For EUBrazilOpenBio proposes the following example, new environmental monitoring short, mid- and long-term timelines with a and healthcare services, which could be view to developing large-scale, long-term cloud-based and provided by new start- information processes: ups leveraging the benefits of the cloud. Further, there are increasing opportunities for »» Short term: enable a team of cross- businesses to contribute to the development of discipline experts to plan the criteria, new services, such as for the healthcare sector, curriculum and topics of joint facilities, the smart cities and environmental sciences. scope and a roadmap for their development, including investments needed. Another opportunity lies in supporting EU-Brazil coordination activities coming »» Mid-term: support the planning and together in a single “EU-Brazil Co-operation delivery of an international European Hub”, aggregating and promoting products, Summer/Brazil Winter School, its services and solutions in the private sector objectives, student selection criteria, and through public-private partnerships. It programme, evaluation criteria, promotion could offer a directory of EU and Brazilian and dissemination of outcomes. businesses interested in expanding in »» Long-term: establish an international the respective regions, career and job PhD studentship supported by reciprocal opportunities. It could also highlight eSkills recognition in EU and Brazil. The shortages that can be addressed through programme committee driving the specialised training schemes. studentship should ensure that the Possible revenue streams resulting from a syllabus design addresses challenges not virtual market for data might include: only linked with biodiversity but also data- intensive science and cloud computing »» Free of charge usage on a cloud based on a multi-disciplinary approach infrastructure for research that is publicly and that the programme is of sufficiently funded. high standard to serve as a model for »» A nominal fee for sporadic usage by other similar initiatives elsewhere. 28 research communities. »» Low entry-fee for a start-up using the data It is important to share experiences and generated by a research group; charges new knowledge emerging from pioneering to large pharmaceutical and other large NRENs so that best practices can be enterprises that makes intense use of the adapted to specific circumstances. For data for research and development. example, supporting data-intensive science disciplines, overcoming cross-border Pathways towards these new approaches interoperable challenges, and launching could include a small-scale study providing a public-private partnerships with the data-roadmap that defines what is the most commercial sector (equipment vendors and valuable data and why, e.g. “core reference carriers). data” that is useful many times. It should propose areas in which value creation Joint coordination activities are not only around data can help unleash the innovation important to increase collaboration between potential and propose ways in which funding the EU and Brazil but also to ensure that might be channelled towards mentoring pioneering developments benefit the data-driven start-ups, as well as for training wider, global open science community. The researchers and journalists in data literacy. role of the Brazilian NREN and EU NRENs in addressing major global challenges and providing user-centric services presents new Coordination activities opportunities to build closer links between supporting advances at the R&E sector and other public institutions the network layer and (e.g. libraries, museums and hospitals). opportunities for public- Future joint co-ordination activities through small-scale funding schemes could play private partnerships a role in promoting leadership and best practices around service delivery, standards Researchers rely on a highly pervasive and interoperability. It could help identify affordable network, where cloud services are potential public-private partnerships for increasingly helping to create a key enabling the integration of new services, such as for platform. Without being able to connect healthcare centres, libraries and museums. researchers around the world “at the speed Other outputs could include the evaluation of light” we cannot take on the grand global of business models and portfolio of services, challenges of the 21st century. identifying best practices and new services Leading NRENs are currently involved in that could be replicated elsewhere or adapted advancing the network for tomorrow’s to suit local or national circumstances. research community, addressing One possible approach for a future support interoperability challenges and defining and coordination action could be to build new business and governance models to on current analyses on the adoption of gain better efficiencies. However, the pace new technologies and business modelling, of technological advances and provision of helping to identify areas for the involvement cloud-based services by NRENs in Europe of both public and private organisations. is diverse and fragmented. While there are Closer co-operation with national resource examples of well-defined strategies and providers (e.g. National Grid Initiatives or deployed services, there is a serious risk NGIs for short) should also be encouraged to of fragmentation of services and unclear help drive federated research infrastructures business models, among other difficulties and ensure the sustained use of digital identified. Similar issues are being faced resources. across Latin America. Ideally it would provide a roadmap for 29 short-term actions to create an ecosystem Such an approach should also focus on of services from NRENs, including joint success stories and insights for policy procurement leading to economies of scale makers and the wider public along the lines and preferential rates for R&E sector, such of the eScienceTalk project and its work with as contracts with telecom providers. Actions iSGTW. This would also play a key role in should focus on ensuring that services helping to change mind-sets as one of the are “innovation enablers”, combining in- main barriers to driving change, identifying house solutions to meet specific needs and recruiting champions of cloud and while procuring existing tools and services digital tools, new business models and new that researchers find useful to create an organisational structures suited to today’s ecosystem that effectively supports the service-oriented approach. digital researcher of 2020.

30 Authors

Members of the EUBrazilOpenBio Strategic Advisory Board Malcolm Atkinson, Director of Edinburgh Data-intensive research and UK e-Science Envoy Fabrizio Gagliardi, Independent Consultant Nelson Simões, Director General of Brazil’s National Research and Education Network, RNP

EUBrazilOpenBio Consortium Rosa M. Badia, Barcelona Supercomputing Center; Vanderlei Canhos, CRIA; Donatella Castelli, National Research Council of Italy; Leandro Ciufo, RNP; Alex Grey, Cardif University and Species2000; Alex Hardisty, Cardif University; Silvana Muscella, Stephanie Parker and Sara Pittonet, Trust-IT Services EUBrazilOpenBio EUBrazilOpenBio (288754) is a Small or medium-scale focused rese- arch project (STREP) funded by the European Commission under the European Commission Cooperation Programme, Framework Programme Seven (FP7)

Esse projeto é resultante do Edital MCT/CNPq Nº 066/2010 - Progra- ma de Cooperação Brasil – União Europeia na Área de Tecnologias da Informação e Comunicação - TIC