Scalable and Reactive Data Management for Mobile Internet-Of-Things Applications with Actor-Oriented Databases

Total Page:16

File Type:pdf, Size:1020Kb

Scalable and Reactive Data Management for Mobile Internet-Of-Things Applications with Actor-Oriented Databases UNIVERSITY OF COPENHAGEN PhD in Computer Science Scalable and Reactive Data Management for Mobile Internet-of-Things Applications with Actor-Oriented Databases Yiwen Wang Supervised by Marcos Antonio Vaz Salles April 2021 Yiwen Wang Scalable and Reactive Data Management for Mobile Internet-of-Things Applications with Actor-Oriented Databases PhD in Computer Science, April 2021 Supervisors: Marcos Antonio Vaz Salles University of Copenhagen Faculty of Science PhD Degree in Computer Science Sigurdsgade 41 2200 Copenhagen 四载春秋,畏作黄粱,虽非寒窗,亦是苦读。知己无柳絮高才,幸有大贤焉 而为其徒,博学慎思,人十能之而己百之,笃行亦则足恃矣。今停笔止言之 际,回首旦暮,赠以诗酒共年华,赚得知交满天下。愿今后去往之地,皆为 热土,期明朝漫漫修远之路,皆伴长风济沧海。 iii Acknowledgements My PhD work conducted in the context of the Future Cropping partnership (Fu- ture Cropping partnership website 2018), supported by Innovation Fund Den- mark. Experimental evaluation partially supported by the AWS Cloud Credits for Research program. In addition, this work was partly supported by the International Network Programme project "Modeling and Developing Actor Database Applications", funded by the Danish Agency for Science and Higher Education (number 7059-000528) and by FAPESP CEPID CCES 13/08293-7. Additional funding provided by FAPESP project 17/02325-5 and by CNPq- Brazil, Department of Computer Science, University of Copenhagen and Pro- gramming technology foundations for Accountability, Privacy-by-design & Robustness in Context-aware Systems (case number: 9131-00077B). Throughout the working of this nearly four years PhD pursuing adventure, I have received a great deal of support and assistance from many aspects. First and foremost, I would like to express my special appreciation to my supervisor, Professor Marcos Antonio Vaz Salles, whose expertise was invaluable in guiding me in the research. I would like to thank you for your insightful feedback pushed me to sharpen my thinking as a researcher and brought my work to a higher level. During the special time when the world been attacked by Covid-19, everything has become more complicated and challenging. Thank you for providing me supports in all aspects and even arrange time to help me every week during your parental leave and vacations. I have deep gratitude for that in my heart. You are the best supervisor in the world. Meanwhile, I would like to thank Professor Yongluan Zhou, Professor Claudia Bauzer Medeiros, Professor Julio Cesar Dos Reis, Vivek Shah and Kasper Myrtue Borggrens, for their extraordinary cooperation, inspirations, insightful comments, and suggestions during my PhD study. Special thanks to Professor Christian S. Jensen for hosting me at Aalborg University for my environmental v exchange. I would like to acknowledge my colleagues in Sigurdsgade 41 for our discussion, brainstorming, and wonderful joy time. I would also like to thank all my friends. I would like to offer my special thanks to Ziming Luo, who has provided company for me from high school to university, and also in my current PhD study. Thank you for providing happy distractions to rest my mind outside of my research, cooking delicious food, and taking care of my pet fish, Panda. I would particularly like to thank my beloved husband. I cannot thank you enough for encouraging me throughout my PhD experience. I really appreciate your wise counsel and sympathetic ear. You are the victim of my crying, screaming, disordering, and weird behaviors in my stressed time. Meanwhile, I would like to thank my lovely family-in-law for embracing me with the family atmosphere when I am thousands of kilometers away from my hometown. Finally, I could not have completed my PhD study without the support of my parents, my sister, and my deceased grandparents. Thank you for your unwavering support and belief in me. I know you are always there for me. Also, I think this dissertation is a perfect prenatal education material for my unborn niece. vi Abstract Development and utilization of Internet-of-Things (IoT) applications are in- creasing at an unprecedented speed in many domains, such as supply chain and logistics tracking, smart agriculture, e-health, and intelligent transporta- tion systems, to name a few. Among these applications, mobile IoT applications have received special attention by virtue of having an ever more significant impact on people’s day-to-day and society. Mobile IoT applications employ widely used portable and movable IoT entities that collect data, share informa- tion, and interact with each other to achieve common goals. While a host of data management solutions may be employed in the architecture of IoT appli- cations, ranging from edge data processing to historical analytical database systems, a particularly interesting component is the IoT data platform, which is a cloud-resident infrastructure for online management of IoT data. The growing connection coverage and increasing requirements in mobile IoT applications bring about special problems and challenges to be explored in IoT data platforms. Firstly, scalability is a necessary requirement for an IoT data platform because of the explosive growth of the IoT. Secondly, tight low-latency reactivity in an IoT data platform is required while managing a massive amount of highly concurrently generated data. Thirdly, support for heterogeneous data types is demanded in an IoT data platform due to the variety of IoT devices. Fourthly, elasticity is also required in an IoT data platform to handle dynamically changing IoT workloads. Last but not least, guidance on ensuring the dynamic and flexible development of IoT data platforms is crucial for application developers. Existing approaches have explored how to support different properties required in IoT data platforms, but they fall short of fulfilling our goal of providing programmable, scalable, and reactive data management for mobile IoT ap- plications. Recently, Actor-Oriented Databases (AODBs) have been proposed vii as a new cloud-based data management architecture for distributed, scalable, stateful, and interactive applications. We find that AODBs are naturally suit- able for satisfying the requirements and solving the challenges of building IoT data platforms. Therefore, we inspect the distinct requirements of building IoT data platforms through two case studies, and we investigate how to ef- fectively model and build IoT data platforms with AODBs. Then, we focus on easing the complexity of building mobile IoT data platforms with AODBs while providing scalability and reactivity. We propose a novel class of data management systems called Moving Actor-Oriented Databases (M-AODBs), which integrate the new abstraction of moving actors for reactive moving objects with two data concurrency semantics for different application use scenarios. A first implementation, Dolphin, is presented to validate the design of M-AODBs. Afterward, to handle the typically skewed spatial distributions in mobile IoT applications, we apply varied classic spatial partitioning techniques in Dolphin to evaluate, analyze, and manage bottlenecks due to data skew. Our experimental study illustrates the impact of spatial partitioning in Dolphin and unveils several promising directions for future work. viii Resumé Udvikling og anvendelse af Internet-of-Things (IoT) applikationer sker med en hidtil uset hastighed på flere anvendelsesområder, såsom forsyningskæde og logistiksporing, smart landbrug, e-sundhed og intelligente transportsystemer. Blandt disse applikationer har mobile IoT-applikationer fået særlig opmærk- somhed i kraft af at have en stadig større indflydelse på samfundet og folks dagligdag. Mobile IoT-applikationer anvender meget bærbare og bevægelige IoT-enheder, der indsamler data, deler information og interagerer med hinan- den for at nå et større fælles mål. Mens en række datastyringsløsninger kan anvendes i arkitekturen af IoT-applikationer, der spænder fra kantdatabehan- dling til historiske analytiske databaser, så er en særlig interessant komponent IoT-dataplatformen, som er en cloud-resident infrastruktur til online styring af IoT-data. Den voksende forbindelsesdækning og stigende krav i mobile IoT applika- tioner medfører specielle problemer og udfordringer, der skal udforskes i IoT-dataplatforme. For det første er skalerbarhed et nødvendigt krav til en IoT-dataplatform på grund af den eksplosive vækst i IoT. For det andet kræves en lav-latency-reaktivitet i en IoT-dataplatform, når der skal administreres en massiv mængde stærkt samtidig genererede data. For det tredje kræves det at heterogene datatyper understøttes i en IoT-dataplatform på grund af forskelligheden blandt IoT-enhederne. For det fjerde kræves elasticitet i en IoT- dataplatform for at kunne håndtere dynamisk skiftende IoT-arbejdsbelastninger. Sidst men ikke mindst, så er der behov for vejledning til at sikre en dynamisk og fleksibel udvikling af IoT-dataplatforme for applikationsudviklere. Eksisterende løsninger har udforsket, hvordan de forskellige egenskaber der kræves i IoT-dataplatforme kan understøttes, men opfylder ikke vores mål om at sikre programmerbar, skalerbar og reaktiv geodatastyring til mobile ix IoT-applikationer. For nylig er Actor-Oriented Databaser (AODBs) blevet fores- lået som en ny skybaseret datastyringsarkitektur for distribuerede, skalerbare, stateful og interaktive applikationer. Vi finder, at AODBs naturligvis er velegnet til at opfylde kravene og løse udfordringerne der er relateret til opbyggelsen af IoT-dataplatforme. Derfor undersøger vi de specifikke krav der er relateret til opbygningen af IoT-dataplatforme gennem to casestudier, samt hvordan
Recommended publications
  • Advantages of Schema Less Database
    Advantages Of Schema Less Database BoydIs Yanaton yacks counterbalanced very tortuously while or removed Blake remains after two-bit thriftier Durward and coralline. dighted Sometimes so anthropologically? streamless Septifragal Anselm malapertly.impone her pavise scrutinizingly, but suppressed Gabriello husband retributively or genuflect The schema of what does a useful because a particular style of your quiz! These databases eliminates the database of the biggest tech conferences worldwide switch from google classroom activity, updating and less than it easy to access an. SQL vs NoSQL Comparative Advantages and Disadvantages. So that schemas you need to databases reliable, advantages less language. PDF Comparison between relational and NOSQL databases. Document oriented databases have various advantages. Alongside increasing data structures, advantages of schema less database designer creates a bottleneck to use to a property graph storage nodes are quite high quality to large scalable databases. Amazon SimpleDB This is primarily a schema-less database that nature meant by handle smaller. Millions of database requires experts, advantages less to store chooses from the advantage of one of a very time and have a single source. What store a NoSQL Graph Database Ontotext Fundamentals. Schema Theory emphasizes the mental connections learners make between pieces of information and can if a shower powerful component of the learning process. NoSQL databases and the advantages and disadvantages of NoSQL. People use schemata to organize current live and defeat a consult for future understanding. Schemata and scripts ELLO. Why Use MongoDB Advantages & Use Cases Studio 3T. The most important benefit in the flexibility that facilitate database system provides. Because of commercial perspective, advantages of less database schema.
    [Show full text]
  • An Evaluation of Compilation-Based PL/PGSQL Execution Tanuj Nayak CMU-CS-21-101 February 2021
    An Evaluation of Compilation-Based PL/PGSQL Execution Tanuj Nayak CMU-CS-21-101 February 2021 Computer Science Department School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Andy Pavlo (Chair) Todd C. Mowry Submitted in partial fulfillment of the requirements for the Fifth Year Master’s Program. Copyright © 2021 Tanuj Nayak Keywords: User Defined Functions, Compilation, Inlining Abstract User Defined Functions (UDFs) are an important analytical feature in modern Database Management Systems (DBMSs) due to their server-side execution proper- ties. These properties allow complex analytical queries to execute without serializing intermediate data over a network. However, query engines often incur significant overheads when executing UDFs due to them being non-declarative in contrast to SQL queries. This contrast causes a lot of context switching between UDF and SQL execution. As a given UDF invokes more SQL queries, these overheads become more noticeable. In this thesis, we investigate the extent to which compilation allow us to overcome such overheads. Compilation for executing SQL queries has become popu- lar in database research in the past decade, especially in the context of main memory DBMSs. It has been shown to deliver significant improvements to query execution performance. We compare the technique of compiling UDFs with query inlining, another recent UDF execution technique. To make this comparison, we implemented a UDF compilation framework in NoisePage, a main-memory compilation-based DBMS. In this framework we compile UDFs into a domain-specific language (DSL) function and evaluated it against query inlining. We find that this framework has greater support across UDF language features than inlining frameworks and allows for more efficient functions.
    [Show full text]
  • CMU-CS-21-106 May 2021
    On Building Robustness into Compilation-Based Main-Memory Database Query Engines Prashanth Menon CMU-CS-21-106 May 2021 School of Computer Science Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee Andrew Pavlo (Co-Chair) Todd C. Mowry (Co-Chair) Jonathan Aldrich Thomas Neumann, Technische Universität München (TUM) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright © 2021 Prashanth Menon This research was sponsored by Google, Intel ISTC-CC, and the National Science Foundation under grant numbers CNS-1065112, IIS-1423210, CNS-1423172, IIS-1718582, and CCF-1822933. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: Code Generation, Query Compilation, Adaptive Query Processing, Vectorized Pro- cessing To my family, here and yet to arrive. Abstract Relational database management systems (DBMS) are the bedrock upon which modern data pro- cessing intensive applications are assembled. Critical to ensuring low-latency queries is the effi- ciency of the DBMSs query processor. Just-in-time (JIT) query compilation is a popular technique to improve analytical query processing performance. However, a compiled query cannot overcome poor choices made by the DBMSs optimizer. A lousy query plan results in lousy query code. Poor query plans often arise and for many reasons. Although there is a large body of work exploring how a query processor can adapt itself at runtime to compensate for inadequate plans, these techniques do not work in DBMSs that rely on compiling queries.
    [Show full text]
  • AWS Glue Studio User Guide AWS Glue Studio User Guide
    AWS Glue Studio User Guide AWS Glue Studio User Guide AWS Glue Studio: User Guide Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. AWS Glue Studio User Guide Table of Contents What is AWS Glue Studio? ................................................................................................................... 1 Features of AWS Glue Studio ....................................................................................................... 2 Visual job editor ................................................................................................................ 2 Job script code editor ......................................................................................................... 2 Job performance dashboard ................................................................................................ 3 Support for dataset partitioning .......................................................................................... 3 When should I use AWS Glue Studio? ........................................................................................... 3 Accessing AWS Glue Studio ........................................................................................................
    [Show full text]
  • 732A54 / TDDE31 Big Data Analytics Topic: Dbmss for Big
    732A54 / TDDE31 Big Data Analytics Topic: DBMSs for Big Data Olaf Hartig [email protected] Relational Database Management Systems ● Well-defined formal foundations (relational data model) schema instance/ state Figure from “Fundamentals of Database Systems” by Elmasri and Navathe, Addison Wesley. 732A54 / TDDE31 Big Data Analytics Topic: Database Management Systems for Big Data Olaf Hartig 2 Relational Database Management Systems ● Well-defined formal foundations (relational data model) ● SQL – powerful declarative language – querying – data manipulation – database definition ● Support of transactions with ACID properties (Atomicity, Consistency preservation, Isolation, Durability) ● Established technology (developed since the 1970s) – many vendors – highly mature systems – experienced users and administrators 732A54 / TDDE31 Big Data Analytics Topic: Database Management Systems for Big Data Olaf Hartig 3 Business world has evolved ● Organizations and companies (whole industries) shift to the digital economy powered by the Internet ● Central aspect: new IT applications that allow companies to run their business and to interact with costumers – Web applications – Mobile applications – Connected devices (“Internet of Things”) Image source: https://pixabay.com/en/technology-information-digital-2082642/ 732A54 / TDDE31 Big Data Analytics Topic: Database Management Systems for Big Data Olaf Hartig 4 New Challenges for Database Systems ● Increasing numbers of concurrent users/clients – tens of thousands, perhaps millions – globally distributed
    [Show full text]