Dependability of Container-Based Data-Centric Systems

Dependability of container-based data-centric systems Petar Kochovski, Vlado Stankovskia ∗∗ aUniversity of Ljubljana, E-mail: [email protected] Summary Nowadays, many Cloud computing systems and applications have to be designed to be able to cope with the four V's of the Big Data problem: volume, variety, veracity and velocity. In this context, an important property in such systems is dependability. Dependability is a measure of the availability, reliability, tractability, response time, durability, security and other aspects of the system. Dependability has been well established as an important property in services science and systems engineering, however, data-centric systems may have some specific requirements due to which a variety of technologies are still maturing and evolving. The goal of the present work is to establish the main aspects of dependability in data-centric systems and to analyse to what extent various Cloud computing technologies can be used to achieve this property. Particular attention is given to the use of container-based technologies that represent a novel, lightweight form of virtualisation. The practical part of the chapter presents examples of highly dependable data-centric systems, such as ENTICE and SWITCH suitable for the Internet of Things (IoT) and the Big Data era. Keywords: dependability; Big Data; container-based systems; storage. 1 Introduction The overall computing capacity at disposal to humanity has been exponentially increasing in the past decades. With the current speed of progress, supercomputers are projected to reach 1 exa Floating Point Operations per Second (EFLOPS) in 2018. The Top500.org Web site presents an in- teresting overview of various achievements in the area. However, there are some new developments, which would result in even greater demands for computing resources in very near future. While exascale computing is certainly the new buzzword in the High Performance Comput- ing (HPC) domain, the trend of Big Data is expected to dramatically influence the design of the World's leading computing infrastructures, supercomputers and cloud federations. There are various areas where data is increasingly important to applications. These include many areas of research, environment, electronics, telecommunication, automotive, weather and climate, biodi- versity, geophysics, aerospace, finance, chemistry, logistics, energy and so on. In all these areas, Web services are becoming the predominant delivery method of Information and Communication Technology (ICT) services. Applications in these areas need to be designed to process exabytes of data in many different ways. All this poses some new requirements on the design of future computing infrastructures. ∗∗Corresponding author. 1 2 The recent high development and implementation of the Internet of Things (IoT) has led to installing billions of devices that can sense, actuate and even compute large amounts of data. The data coming from these devices poses a threat to the existing, traditional data management approaches. The IoT produces a continuous stream of data that constantly needs to be stored in some data center storage. However, this technology is not intended for storing the data, but processing it as well and giving the necessary response to the devices. Big Data is not a new idea, because it was firstly mentioned in an academic paper in 1999 [1]. Though much time has passed since then, the active use of the Big Data has started just few years ago. As the amount of data produced by the IoT kept rising, organizations were forced to adapt technologies to be able to map with IoT data. Therefore, it could be stated that the rise of the IoT forced the development and implementation of the Big Data technologies. Big Data has no clear definition, despite the fact that it is first thought of size. It is based on its four main characteristics, also known as the 4V's: volume, variety, velocity and veracity. The volume is related to the size of data, which is growing exponentially in time. Thus, no one knows the exact amount of data that is being generated, but everyone is aware that it is enormous quantity of information. Although most of the data in the past was structured, during the last decade the amount of data has grown and most of it is unstructured data now. IBM has estimated that 2.3 trillion gigabytes of data are created every day and 40 zettabytes of data will be created by 2020, which is an increase of 300 times from 2005 [2]. The variety describes that different types of data, because various data types are generated by industries as financial services, health care, education, high performance computing and life science institutions, social networks, sensors, smart devices, etc. Each of these data types differ from each other. This means that it is impossible to fit this kind of various data on a spreadsheet or into a database application. The velocity measures the frequency the generated data needs to be processed. As required the data could be processed in real-time or processed when it is needed. The veracity is required to guarantee the trustworthiness of the data, which means that the data has the quality to enable the right action when it is needed. With the above analysis it is obvious that the Big Data plays a crucial role in the today's society. Due to the different formats and size of the unstructured data, the traditional storage infrastructures could not achieve the desired QoS and could lead to data unavailability, compli- ance issues and increased storage expenses. A solution, which addresses the data-management problems is the data-centric architecture, based on which are designed the data-centric systems. The philosophy behind DCS is simple, as the data size grows, the cost of moving data becomes prohibitive. Therefore, the data-centric systems offer the opportunity to move the computing to the data instead of vice versa. The key of this system design is to separate data from behavior. These systems are designed to organize the interactions between applications in terms of the state- full data, instead of the operations to be performed. As the volume and velocity of unstructured data increase all the time, new data management challenges appear, such as providing service dependability for applications running on the Cloud. According to the author [3] dependability can be defined as the ability of a system to provide dependable services in terms of availability, responsiveness and reliability. Though, dependability has many definitions, in this chapter we will try to depict the dependability in a containerized, component based system, data-centric system. While many petascale computing applications are addressed by supercomputers, there is 3 a vast potential for wider utilisation of computing infrastructures by using the form of cloud federations. Cloud federations may relate to both computing and data aspects. This is an emerging area which has not been sufficiently explored in the past and may provide a variety of benefits to data-centric applications. The remaining of this chapter is structured as follows. Section 2 describes the component- based software engineering methodology, the architectural approach for data management and container interoperability. Section 3 points out the key concepts and relations in dependability, where the dependability attributes, means and threats are described. Section 4 describes the serving of virtual machine and container images to applications. Section 5 elaborates the QoS management in the software engineering process. Section 6 concludes the chapter. 2 Component-based Software Engineering In order to be able to build dependable systems and applications we must rely on a well-defined methodology that would help analyze the requirements and the trade-offs in relation to dependability. Therefore, it is necessary to relate the development of dependable data-centric systems and applications to modern software engineering practices in all their phases such as requirements analysis, component development, workflow management, testing, deployment, monitoring and maintenance. With the constant growth of the software complexity and size, the traditional software development approaches have become ineffective in terms of productivity and cost. The Component Based Software Engineering (CBSE) has emerged to overcome these problems by using selected components and integrating them into one well-defined software architecture, where each component should be presented as a functionality independent from other components of the whole system. As a result of CBSE implementation, during the software engineering process the developer selects and combines appropriate components instead of designing and developing the components themselves - they are found in various repositories of Open Source software. The components can be heterogeneous, written in different programming languages, and integrated in an architecture, where they communicate with each other using well-defined interfaces. 2.1 Life-cycle Every software development methodology addresses a specific life cycle of the software. Al- though, life cycles of different methodologies might be very different, they could all be described by set of phases that are common for all of them. These phases represent major product lifecycle periods and they are related to the state of the product. The existing CBSE development lifecycle has separated component development from the system development [4]. Although the component development process is in many aspects similar to system development there are some notable differences, for instance components are intended for reuse in many different products, many of which have yet to be designed. The component life-cycle development, shown on Figure .1, can be described in 7 phases [4]: 1. Requirements phase: During this phase the requirement specification for the system and the application development are decided. Also during the requirements phase the availability of 4 the components is calculated. 2. Analysis and Design phase: This phase starts with a system analysis and a design providing overall architecture. Furthermore this phase develops a detailed design of the system.

Load more