A New Initiative for Tiling, Stitching and Processing Geospatial Big Data in Distributed Computing Environments

Total Page:16

File Type:pdf, Size:1020Kb

A New Initiative for Tiling, Stitching and Processing Geospatial Big Data in Distributed Computing Environments ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume III-4, 2016 XXIII ISPRS Congress, 12–19 July 2016, Prague, Czech Republic A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS A. Olasz a*, B. Nguyen Thai b, D. Kristóf a a Department of Geoinformation, Institute of Geodesy, Cartography and Remote Sensing (FÖMI),5. Bosnyák sqr. Budapest, 1149 (olasz.angela, kristof.daniel)@fomi.hu b Department of Cartography and Geoinformatics, Eötvös Loránd University (ELTE), 1/A Pázmány Péter sétány, Budapest, 1117 Hungary, [email protected] Commission ICWG IV/II KEYWORDS: Distributed computing, GIS processing, raster data tiling, data assimilation, remote sensing data analysis, geospatial big data, spatial big data ABSTRACT Within recent years, several new approaches and solutions for Big Data processing have been developed. The Geospatial world is still facing the lack of well-established distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. The goal of such systems is to improve processing time by distributing data transparently across processing (and/or storage) nodes. These types of methodology are based on the concept of divide and conquer. Nevertheless, in the context of geospatial processing, most of the distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Moreover, flexibility and expendability for handling various data types (often in binary formats) are also strongly required. This paper presents a concept for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept (https://github.com/posseidon/IQLib/) developed in the frame of the IQmulus EU FP7 research and development project (http://www.iqmulus.eu). The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. The above-mentioned prototype is presented through a case study dealing with country-wide processing of raster imagery. Further investigations on algorithmic and implementation details are in focus for the near future. 1 INTRODUCTION principal capability in a way to transform information to knowledge. Our goal is to find a solution for Geoprocessing of big The progress and innovation is no longer hindered by the geospatial data in a distributed ecosystem providing an ability to collect data. The most important issue is how we environment to run algorithms, services, processing modules exploit these geospatial big data (Lee and Kang, 2015). We without any limitations on implementation programming consider that, we are facing the paradigm shift from data- language as well as data partitioning strategies and driven research to knowledge-driven scientific method in Big distribution among computational nodes. As a first step we Data which was considered as a challenge by R. Kitchin in would like to focus on (i) data decomposition and (ii) 2014. In our previous work (Nguyen Thai and Olasz, 2015) distributed processing. The challenges associated with each the Big Data concept and the well-known four dimensions: focus area, related methodology and first results are analyzed Volume, Velocity, Variety (originally Laney came up with and discussed in the paper. this three dimensions in 2001) and Veracity have been discussed taking into account of geospatial considerations 2 PROBLEM DESCRIPTION and characteristics. Additional dimensions have been continuously appearing to describe better the concept of the workflow such as: Value, Validity, Visibility, etc. (Li et al, 2.1 Geospatial Big Data 2015). Moreover, we have concluded a fifth “V” to Geospatial Big Reliable analysis of the geospatial data is extremely Data as Visualization, because it has a great importance to important base for being able to support better decision communicate the desired information in Geographical making with location-aware data even in our changing Information Science from the very beginning. From geo- World. The challenges for handling geospatial big data location information transform into knowledge going through include capture, storage, search, sharing, transfer, analysis, the pipeline of Data Life Cycle in every phase of conversion and visualization (Jewell et al., 2014). Furthermore, with (collect, aggregate, analyse, return as knowledge, share) newly adapted data management requirements and initiatives, Visualization is fundamental (Nguyen Thai Binh and Olasz, even more open data will be available on the web which need 2015). Beyond, we consider that in Big Data environment to be handled, the latent information be shared and extracted Visualization plays an important role in (1) geospatial data knowledge applied in the level of decision making as well processing (due to volume and variety) that help analysts to (Wu and Beng, 2014.). Big data, open data and open identify trends, relations, correlations and patterns in an government has joint interest in location and in many efficient way; (2) facilitate broadcasting geo-information to challenges considered in geospatial aspect will soon benefit citizens (and decision makers) employing interactive and eye from it (Jewell et al., 2014). We consider GI analysis as a tracking approaches. According to our conclusion Visualization is a definite geospatial component of Big Data. This contribution has been peer-reviewed. The double-blind peer-review was conducted on the basis of the full paper. doi:10.5194/isprsannals-III-4-111-2016 111 ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume III-4, 2016 XXIII ISPRS Congress, 12–19 July 2016, Prague, Czech Republic theory geospatial big data is defined as volume, variety and 2.2 Defining Geospatial Big Data update frequency rate that exceed the capability of spatial computing technology (Lee and Kang, 2015, Li et al., 2015, Spatial data (also known as geospatial data, geo-information, Kambatla et al., 2014). In Table 1. we have collected the geodata, etc) have many definitions depending from the main characteristics of geospatial big data for each type of background of the author. All of them emphasize the formats such as: representation formats, GIS operations, geographic location of the phenomena to be described as volume, velocity, variety and visualization aspects. To have a basic criteria. The nature of the digital representation of the better understanding on what are the main attributes of continuous space can be grouped in 4 or 5 type. Traditionally geospatial data because it is hard to delineate the margin we consider two type of geospatial data vector and raster starting to “exceed the capability of spatial computing (Elek, 2006) owing to the development of information technology”. To estimate the size of the processable amount technology nowadays we can have higher abstraction type of of data are use-case specific, there are some good examples data such as point clouds, graph networks. An additional (Evans et al., 2014) where the authors tried to identify the particular kind of location-aware data is also examined by Geospatial Data and Geospatial Big Data differences. analysts; social media-like data which requires a particular approach to collect and process as well. Along with Big Data Data type Formats GIS operations Volume Velocity Variety Visualization thanks to OGC standards available amount of vector data real time and web-gis platforms non point, line, polygon Overlapping vector (for instance nation-wide monitoring, and Vector researchers are able to use (multi) geoprocessing cadastre, or land cover,roads, rapid response is GIS data for many different waterways, utility network, etc) emerging purposes 3D modeling, urban 3D view (perspective) modeling,simulation,flight available amount of point cloud time sensitive 3D together with thematic from above camera view, data (or TIN) for the creation of data requires rapid content with reduced 3D point cloud,TIN (or visibility operations, semi- DSM, DTM modelling, feature processing (disaster information are essential representation Triangular mesh) automatic point cloud extraction and simulation management and to spreading information feature detection, requires huge computational simulation) for different level of end- classification, terrestrial capacity users laser scanning, BIM consider in a GIS real time spatio available (free) series terrestrial, processing to deliver results of earth temporal earth aerial and satellite (multispectral several types observation monitoring observation data Local, Focal, Zonal (Global) and hyperspectral) imagery of previously and processing novel processing is need Raster grid Map Algebra processing, (airplane, UAV), earth observation mentioned solution in visualization more than ever image analysis data requires huge computational data need to also needed to transform (independently capacity using raster image be combined information human from the extent of processing methods to extract readable the processing) the relevant trillions of edges, nodes for graph real time information in network analysis and processing available from location monitoring of routing visualization graph (nodes, routing, network analysis, based networks (also from social moving objects, Network techniques are
Recommended publications
  • Akka Java Documentation Release 2.2.5
    Akka Java Documentation Release 2.2.5 Typesafe Inc February 19, 2015 CONTENTS 1 Introduction 1 1.1 What is Akka?............................................1 1.2 Why Akka?..............................................3 1.3 Getting Started............................................3 1.4 The Obligatory Hello World.....................................7 1.5 Use-case and Deployment Scenarios.................................8 1.6 Examples of use-cases for Akka...................................9 2 General 10 2.1 Terminology, Concepts........................................ 10 2.2 Actor Systems............................................ 12 2.3 What is an Actor?.......................................... 14 2.4 Supervision and Monitoring..................................... 16 2.5 Actor References, Paths and Addresses............................... 19 2.6 Location Transparency........................................ 25 2.7 Akka and the Java Memory Model.................................. 26 2.8 Message Delivery Guarantees.................................... 28 2.9 Configuration............................................. 33 3 Actors 65 3.1 Actors................................................ 65 3.2 Typed Actors............................................. 84 3.3 Fault Tolerance............................................ 88 3.4 Dispatchers.............................................. 103 3.5 Mailboxes.............................................. 106 3.6 Routing................................................ 111 3.7 Building Finite State Machine
    [Show full text]
  • Up up and Out: Scaling Software with Akka
    UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonér CTO Typesafe @jboner Scaling software with Jonas Bonér CTO Typesafe @jboner ScalingScaling softwaresoftware with with ScalingScaling softwaresoftware with with Akka (Áhkká) The name comes from the goddess in the Sami (native swedes) mythology that represented all the wisdom and beauty in the world. It is also the name of a beautiful mountain in Laponia in the north part of Sweden ScalingScaling softwaresoftware with with Manage System Overload Scale UP & Scale OUT How can we achieve this? How can we achieve this? Let’s use Actors How can we achieve this? What is an Actor? What is an Actor? What is an Actor? • Akka's unit of code organization is called an Actor What is an Actor? • Akka's unit of code organization is called an Actor • Like Java EE servlets and session beans, Actors is a model for organizing your code that keeps many “policy decisions” separate from the business logic What is an Actor? • Akka's unit of code organization is called an Actor • Like Java EE servlets and session beans, Actors is a model for organizing your code that keeps many “policy decisions” separate from the business logic • Actors may be new to many in the Java community, but they are a tried-and-true concept (Hewitt 1973) used for many years in telecom systems with 9 nines uptime Program at a Higher Level Program at a Higher Level Program at a Higher Level • Never think in terms of shared state, state visibility, threads, locks, concurrent collections, thread notifications etc.
    [Show full text]
  • URP 6276) Web
    URP 6276 Internet Geographic Information Systems Fall 2014 Instructor: Zhong-Ren Peng, Ph.D. (Architecture building, Room 462, phone 352-291-1491, e-mail: [email protected]) Teaching Assistant: Xinyu Fu ([email protected]) Prerequisite: URP 6270 or by consent with Instructor Required Textbook: Zhong-Ren Peng and Ming-Hsiang Tsou, 2003, Internet GIS: Distributed Geographic Information Services for the Internet and Wireless Networks, New York: John Wiley & Sons. Course Overview This course focuses on the advanced topic in geographic information systems (GIS): Internet GIS. Internet GIS is a network-centric GIS technology that uses the Internet and the World Wide Web as a primary means of providing access to the functionality (i.e., analysis tools, mapping capability) of GIS and to the spatial data and other data needed for various GIS applications. It provides users capability to work interactively with maps and conduct spatial analysis on the Web. It has many applications in urban planning, e-government and e-business. There are two goals of the course. The first one is to offer you theoretic background so that you will have a better understanding of the concepts of Internet GIS, such as how Internet GIS works, the different architecture models, and the state of art of the Internet GIS technology. The second goal is to offer you hands-on experience to use Internet GIS programs so that you can gain necessary skills to work with Internet GIS programs. The course will start with an introduction to networking basics and the evolution of software architecture, specifically the client/server model.
    [Show full text]
  • A Software Framework for the Actor Model Focusing on the Optimization of Message Passing
    AICT 2018 : The Fourteenth Advanced International Conference on Telecommunications Actor4j: A Software Framework for the Actor Model Focusing on the Optimization of Message Passing David Alessandro Bauer, Juho Mäkiö Department of Informatics and Electronics University of Applied Sciences Emden/Leer Emden, Germany Email: [email protected], [email protected] Abstract—Common actor implementations often use also the Scale Cube by Abbott [6], which describes the three standardized thread pools without special optimization for the dimensions of scalability. message passing. For that, a high-performance solution was To ensure high parallelization, its one benefit to use worked out. The actor-oriented software framework Akka uses multi-core systems. The computer world of the last few internally a ForkJoinPool that is intended for a MapReduce years has been characterized by a change ("The Free Lunch approach. However, the MapReduce approach is not relevant for message passing, as it may lead to significant performance Is Over" [7]) from constantly increasing computing power losses. One solution is to develop a thread pool that focuses on to multi-core systems due to technical limitations. In the message passing. In the implementation of the Actor4j particular, technical progress always lags behind practical framework, the message queue of the actors is placed in requirements (Wirth's law [8]). Up to now, Moore's law was threads to enable an efficient message exchange. The actors are “that the number of transistors available to semiconductor operated directly from this queue (injecting the message), manufacturers would double approximately every 18 to 24 without further ado.
    [Show full text]
  • Geospatial Web Application Development to Access Irrigation Asset Data: Veterans Affairs Palo Alto Health Care System
    Geospatial Web Application Development to Access Irrigation Asset Data: Veterans Affairs Palo Alto Health Care System by Megan Theresa Gosch A Thesis Presented to the Faculty of the USC Graduate School University of Southern California In Partial Fulfillment of the Requirements for the Degree Master of Science (Geographic Information Science and Technology) May 2016 Copyright ® 2016 by Megan Theresa Gosch DEDICATION To my mom, Carolyn Ann Eriksson, for everything. Gone too soon but forever in our hearts. ii ACKNOWLEDGMENTS I would like to thank my family for their continued support throughout my graduate coursework and thesis writing. Knowing the time you have allowed me to pursue my dream of finishing my master’s degree resulted in time not spent with you all. I hope for my three boys that you can now see that going to school when you are passionate about something is worth all the hard work. Thank you to all my teachers at USC, but especially Dr. Darren Ruddell my thesis chair, and my committee members Dr. Jennifer Swift and Dr. Robert Vos. You made this process a whole lot easier with your continued support and guidance. To Steve McGrath, Supervisory Engineer of the Veterans Affairs Palo Alto Health Care System, who supported this project throughout the entire process. Thank you also to Joanna Fong for her continued assistance and feedback. Last, but not least, to my husband Dave. Your love and encouragement are a continual inspiration to me. iii Table of Contents Dedication ......................................................................................................................................
    [Show full text]
  • Assessment of Multiple Ingest Strategies for Accumulo Key-Value Store
    Assessment of Multiple Ingest Strategies for Accumulo Key-Value Store by Hai Pham A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Master of Science Auburn, Alabama May 7, 2016 Keywords: Accumulo, noSQL, ingest Copyright 2016 by Hai Pham Approved by Weikuan Yu, Co-Chair, Associate Professor of Computer Science, Florida State University Saad Biaz, Co-Chair, Professor of Computer Science and Software Engineering, Auburn University Sanjeev Baskiyar, Associate Professor of Computer Science and Software Engineering, Auburn University Abstract In recent years, the emergence of heterogeneous data, especially of the unstructured type, has been extremely rapid. The data growth happens concurrently in 3 dimensions: volume (size), velocity (growth rate) and variety (many types). This emerging trend has opened a new broad area of research, widely accepted as Big Data, which focuses on how to acquire, organize and manage huge amount of data effectively and efficiently. When coping with such Big Data, the traditional approach using RDBMS has been inefficient; because of this problem, a more efficient system named noSQL had to be created. This thesis will give an overview knowledge on the aforementioned noSQL systems and will then delve into a more specific instance of them which is Accumulo key-value store. Furthermore, since Accumulo is not designed with an ingest interface for users, this thesis focuses on investigating various methods for ingesting data, improving the performance and dealing with numerous parameters affecting this process. ii Acknowledgments First and foremost, I would like to express my profound gratitude to Professor Yu who with great kindness and patience has guided me through not only every aspect of computer science research but also many great directions towards my personal issues.
    [Show full text]
  • Geotrellis Documentation Release 1.0.0
    GeoTrellis Documentation Release 1.0.0 Azavea Aug 10, 2018 Home 1 Why GeoTrellis? 3 2 Contact and Support 5 3 Hello Raster! 7 3.1 Changelog................................................8 3.2 Contributing............................................... 27 3.3 Setup................................................... 30 3.4 Quick Start................................................ 31 3.5 Kernel Density.............................................. 33 3.6 Reading GeoTiffs............................................. 40 3.7 Extract-Transform-Load (ETL)..................................... 46 3.8 Core Concepts.............................................. 50 3.9 Using Rasters............................................... 77 3.10 Using Vectors............................................... 94 3.11 Spark and GeoTrellis........................................... 114 3.12 The ETL Tool.............................................. 126 3.13 Extending GeoTrellis Types....................................... 137 3.14 GeoTrellis Module Hierarchy...................................... 142 3.15 Tile Layer Backends........................................... 146 3.16 Vector Data Backends.......................................... 152 3.17 Frequently Asked Questions....................................... 154 3.18 Example Archive............................................. 156 3.19 Architecture Decision Records...................................... 159 3.20 Proj4 Implementation.......................................... 174 3.21 High Performance Scala........................................
    [Show full text]
  • Reactive Programming with Scala, Lagom, Spark, Akka and Play
    Issue October 2016 | presented by www.jaxenter.com #53 The digital magazine for enterprise developers Reactive Programming with Scala, Lagom, Spark, Akka and Play Interview with Scala creator Martin Odersky The state of Scala The Lagom Framework Lagom gives the developer a clear path DevOpsCon 2016: Our mission statement This is how we interpret modern DevOps ©istockphoto.com/moorsky Editorial Reactive programming is gaining momentum “We believe that a coherent approach to systems architec- If the definition “stream of events” does not satisfy your ture is needed, and we believe that all necessary aspects are thirst for knowledge, get ready to find out what reactive pro- already recognized individually: we want systems that are Re- gramming means to our experts in Scala, Lagom, Spark, Akka sponsive, Resilient, Elastic and Message Driven. We call these and Play. Plus, we talked to Scala creator Martin Odersky Reactive Systems.” – The Reactive Manifesto about the impending Scala 2.12, the current state of this pro- Why should anyone adopt reactive programming? Because gramming language and the technical innovations that await it allows you to make code more concise and focus on im- us. portant aspects such as the interdependence of events which Thirsty for more? Open the magazine and see what we have describe the business logic. Reactive programming means dif- prepared for you. ferent things to different people and we are not trying to rein- vent the wheel or define this concept. Instead we are allowing Gabriela Motroc, Editor our authors to prove how Scala, Lagom, Spark, Akka and Play co-exist and work together to create a reactive universe.
    [Show full text]
  • Interactive Topographic Web Mapping Using Scalable Vector Graphics
    University of Nebraska at Omaha DigitalCommons@UNO Student Work 12-1-2003 Interactive topographic web mapping using scalable vector graphics Peter Pavlicko University of Nebraska at Omaha Follow this and additional works at: https://digitalcommons.unomaha.edu/studentwork Recommended Citation Pavlicko, Peter, "Interactive topographic web mapping using scalable vector graphics" (2003). Student Work. 589. https://digitalcommons.unomaha.edu/studentwork/589 This Thesis is brought to you for free and open access by DigitalCommons@UNO. It has been accepted for inclusion in Student Work by an authorized administrator of DigitalCommons@UNO. For more information, please contact [email protected]. INTERACTIVE TOPOGRAPHIC WEB MAPPING USING SCALABLE VECTOR GRAPHICS A Thesis Presented to the Department of Geography-Geology and the Faculty of the Graduate College University of Nebraska in Partial Fulfillment of the Requirements for the Degree Master of Arts University of Nebraska at Omaha by Peter Pavlicko December, 2003 UMI Number: EP73227 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation WWisMng UMI EP73227 Published by ProQuest LLC (2015). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 THESIS ACCEPTANCE Acceptance for the faculty of the Graduate College, University of Nebraska, in Partial fulfillment of the requirements for the degree Master of Arts University of Nebraska Omaha Committee ----------- Uf.A [JL___ Chairperson.
    [Show full text]
  • Evolution of Grid-GIS Systems
    International Journal of Computer Science and Telecommunications [Volume 3, Issue 3, March 2012] 36 Evolution of Grid-GIS Systems ISSN 2047-3338 Hari Singh 1 and Seema Bawa 2 1N.C. College of Engineering, Panipat, Haryana, India 2Thapar University, Patiala, Punjab, India [email protected], [email protected] Abstract — Grid is the collection of heterogeneous resources, [14]. A philosophical and technical comparison between providing good opportunities for the users of virtual Legion and Globus is done [15]. Performance of Alchemi and organizations forming a Grid for coordinated resource sharing Globus is compared [10]. Similarities and differences between and problem sharing. Much Grid middleware software exists for Grid computing and such distributed computing systems as implementing a Grid framework. Apart from many application P2P, areas of Grid Computing, Geographic Information System has CORBA, Cluster Com puting, and DCE are explained [2]. been an emerging application area for Grid Technology in recent time. In this paper, the developments in the field of Grid-GIS are With the help of Grid middleware sharing of all kinds of discussed after a brief overview of Grid Computing and GIS resources, including data resources, computing resources, separately. Some observations are also discussed in the storage resources, information resources, knowledge concluding section of the paper. resources, etc. has become possible. This paper describes the evolution of Grid-GIS systems. Index Terms — Grid Computing, GIS, Web-GIS and Grid-GIS Section II describes the Grid Computing Technology and the underlying architecture. Section III describes the Geographic Information System (GIS), accessing data from GIS databases, I. INTRODUCTION problems of GIS databases such as heterogeneity and interoperability, and spatial data handling language, N earlier days when mainframe computers were used for Geographic Markup Language (GML).
    [Show full text]
  • Lightweight Affinity and Object Capabilities in Scala
    http://www.diva-portal.org Postprint This is the accepted version of a paper presented at ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Citation for the original published paper: Haller, P., Loiko, A. (2016) LaCasa: Lightweight Affinity and Object Capabilities in Scala. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (pp. 272-291). Association for Computing Machinery (ACM) https://doi.org/10.1145/3022671.2984042 N.B. When citing this work, cite the original published paper. © Author | ACM 2016. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, http://dx.doi.org/10.1145/3022671.2984042. Permanent link to this version: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-197902 LACASA: Lightweight Affinity and Object Capabilities in Scala Philipp Haller Alex Loiko KTH Royal Institute of Technology, Sweden Google, Sweden ∗ [email protected] [email protected] Abstract difficulty of reasoning about program behavior and software Aliasing is a known source of challenges in the context of architecture [3], and it can introduce data races in concur- imperative object-oriented languages, which have led to im- rent programs. These observations have informed the devel- portant advances in type systems for aliasing control. How- opment of a number of type disciplines aimed at providing ever, their large-scale adoption has turned out to be a surpris- static aliasing properties, such as linear types [33, 51, 64], ingly difficult challenge.
    [Show full text]
  • Comparing Languages for Engineering Server Software: Erlang, Go, and Scala with Akka
    Comparing Languages for Engineering Server Software: Erlang, Go, and Scala with Akka Ivan Valkov, Natalia Chechina, and Phil Trinder School of Computing Science, University of Glasgow G12 8RZ, United Kingdom [email protected], {Natalia.Chechina, Phil.Trinder}@glasgow.ac.uk Abstract functional or object-oriented, with high-level coordination models, Servers are a key element of current IT infrastructures, and must e.g. actors as in Erlang [2] or a process algebra as in Go [6]. Indeed, often deal with large numbers of concurrent requests. The program- the success of some server-based companies is even attributed to ming language used to construct the server has an important role their use of specific languages. As examples WhatsApp’s success in engineering efficient server software, and must support massive is attributed to their use of Erlang [27]; the video streaming leader concurrency on multicore machines with low communication and Twitch uses Go to serve millions of users a day [13]. Research synchronisation overheads. contributions of this paper include the following: This paper investigates 12 highly concurrent programming lan- • A survey of programming language characteristics relevant for guages suitable for engineering servers, and analyses three repre- servers (Section 2.1). sentative languages in detail: Erlang, Go, and Scala with Akka. • The design and implementation of three benchmarks to analyse We have designed three server benchmarks that analyse key per- the multicore server capabilities of Erlang, Go, and Scala with formance characteristics of the languages. The benchmark results Akka (Section 3). suggest that where minimising message latency is crucial, Go and Erlang are best; that Scala with Akka is capable of supporting the • An evaluation of Erlang, Go, and Scala with Akka for key largest number of dormant processes; that for servers that frequently server capabilities, i.e.
    [Show full text]