TECHNISCHE UNIVERSITÄT DRESDEN

FAKULTÄT INFORMATIK INSTITUT FÜR SYSTEMARCHITEKTUR PROFESSUR FÜR RECHNERNETZE PROF. DR. RER. NAT. HABIL DR. H. C. ALEXANDER SCHILL

Diplomarbeit

zur Erlangung des akademischen Grades Diplom-Medieninformatiker

Vertrauenswürdige und soziale Verwaltung von Ressourcen als Dienstleistung in persönlichen Cloudkontrollzentren

Stephan Zepezauer (Geboren am 30. Mai 1980 in Elsterwerda)

Betreuer: Dr.-Ing. Josef Spillner

Dresden, 27. März 2012

Selbstständigkeitserklärung

Hiermit erkläre ich, dass ich die von mir am heutigen Tag dem Prüfungsausschuss der Fakultät Infor- matik eingereichte Diplomarbeit zum Thema:

Vertrauenswürdige und soziale Verwaltung von Ressourcen als Dienstleistung in persönlichen Cloudkontrollzentren vollkommen selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt sowie Zitate kenntlich gemacht habe.

Dresden, 27. März 2012

Stephan Zepezauer

1

Contents

1 Introduction 4 1.1 Motivation...... 5 1.2 Scenario...... 7 1.3 Outline ...... 8

2 Basics 10 2.1 CloudComputingDefinition ...... 10 2.2 Cloud Computing Reference Model ...... 12 2.3 ServiceModelsoftheCloud ...... 12 2.3.1 Infrastructure as a Service ...... 13 2.3.2 PlatformasaService...... 15 2.3.3 SoftwareasaService...... 16 2.4 Fundamental Deployment Models of the Cloud ...... 17 2.4.1 PublicCloud ...... 17 2.4.2 CommunityCloud ...... 18 2.4.3 PrivateCloud...... 20 2.4.4 HybridCloud...... 20 2.5 Bordering of the Cloud Domain and Emphasis of the Individual ...... 21 2.5.1 Toward the Integration of Regional Clouds for Businesses and Communities . 22 2.5.2 Services in Home-based Cloud Computing Environments ...... 23 2.5.3 A proper Distinction between Home Cloud and Personal Cloud ...... 25 2.6 Mechanisms for the Development of Trusted Clouds ...... 26 2.6.1 PrivacyintheCloud ...... 26 2.6.2 Security of Outsourced Data ...... 27 2.6.3 Trustworthy Cloud Computing ...... 29 2.7 Summary ...... 30

3 Requirements Analysis 32 3.1 ProblemContext ...... 32 3.2 Requirements ...... 36 3.3 Summary ...... 37

4 Stateofthe Art 38 4.1 PersonalCloudProducts ...... 38 4.2 ResearchProjects ...... 41 2

4.3 Tools...... 48 4.4 SummaryandDiscussion ...... 52

5 System Design 56 5.1 Global Process and Delimitation ...... 56 5.2 Resource Integration ...... 57 5.2.1 FunctionalUnits ...... 58 5.2.2 Functional Unit Description ...... 59 5.2.3 Remote Device and Resource Discovery ...... 60 5.2.4 Local Device and Resource Discovery ...... 63 5.2.5 Resource Information Discovery ...... 65 5.2.6 Manual Resource Integration ...... 65 5.3 TheResourceManagement ...... 66 5.3.1 ResourcePlan...... 66 5.3.2 ResourceLabeling ...... 67 5.4 DataManagement...... 71 5.5 Application Programming Interface ...... 72 5.6 Summary ...... 72

6 Implementation 74 6.1 PrototypeOverview...... 74 6.1.1 Package: ResourceManagement ...... 74 6.1.2 Package: PluginController ...... 75 6.1.3 Package: DataManagement ...... 75 6.1.4 Package:Logging ...... 75 6.2 Resource Integration ...... 75 6.3 Gathering of Resource Information ...... 78 6.4 Application Programming Interface ...... 78 6.5 PrototypeExtension...... 78 6.6 Summary ...... 79

7 Evaluation 80 7.1 Evaluation of Requirements ...... 80 7.2 Scenario-based Evaluation ...... 82

8 Summary and Outlook 85 8.1 FutureWork...... 86

9 Abbreviation 88

Bibliography 90

List of Figures 95 3

List of Tables 96

A Service Discovery Protocols 97

B Scenario-based Evaluation 113 1. INTRODUCTION 4

1 Introduction

Cloud computing emerged from the Service-Oriented Architecture (SOA) paradigm that defines prin- ciples for the development of , in which well-defined business functionalities are provided as a service. According to this paradigm, all resources of the Cloud are made available as services and can be accessed through the Internet by offering a user-centric interface that acts as a point of access for costumer’s needs and requirements.

The term Cloud originated from the early days of the Internet, where internal processes and the complexity of message propagation were hidden in a Cloud of interconnected computer networks. This image of a global network system was derived from network diagrams from the early 1960s, whereby the network was drawn as a cloud in which data was delegated across carrier backbones from one location to another [RR10]. Another idea of this time was to provide computing power and even specific applications through a utility business model - just as water and electricity utilities operate today. The idea of utility computing was first proposed by John McCarthy in 1961 and is realized by the current cloud computing technology1. But the IT-related technologies that existed at this time, could not implement such a futuristic computing model and so it took more than forty years until Cloud computing emerged in technology and public circles.

The new millennium brought innovations in the field of Internet technologies (SOA, Web 2.0, Web Services), distributed computing (clusters, grids) and hardware (virtualization, multi-core chips). As some of these innovations were in their infancy, they were first seen as hype, but later became pop- ular in academia and major industry, followed by specifications and standardizations [VBB11]. The maturity of these technologies mainly influenced the advent of Cloud computing.

Cloud computing is determined by virtualized resources, dynamically scaled services and the avail- ability of resources as a web-delivered service to customers (e.g., companies, developer teams). The services provided by Cloud vendors mostly contain infrastructure, platform and software (applica- tions) as a service. This way, the applications of Cloud users, deployed in the cloud, are not tightly bound to the underlying IT infrastructure and are managed by the vendors. For example, a Software- as-a-Service (Saas) vendor is responsible for all of the hardware and software management consumed by their customer base. Thus, the customers can obtain additionally required resources immediately and are liberated from the maintenance burden they would otherwise have for their own applications. From the customer’s perspective, cloud computing allows the reduction of management cost, whereby the customers can focus on their business activities. As with other new technologies, this new way of IT infrastructure management brings new challenges and risks. When considering the customers’ data is stored and processed by a Cloud vendor, questions about privacy, trust and security of information arise immediately.

1http://computinginthecloud.wordpress.com/2008/09/25/utility-cloud-computingflashback-to-1961-prof-john-mccarthy/ 1. INTRODUCTION 5

Besides the aforementioned relationship between customers and Cloud vendors, also known as pub- lic cloud, in which the vendors provide services to a variety of customers distributed over the whole world, separated by different goals and management domains, there exist other solutions to organize the relationship between customers’ and vendors’ domains. A private cloud contains all Cloud ser- vices where customers and vendors belong to the same organizational unit and thus, control over the data remains with the customers or their organization [BKNT11]. If companies, managing a private cloud, outsource data into the public cloud for the purpose of storage or processing, then this rela- tionship is called hybrid cloud. Another case is the provision of resources for a community, called community cloud, which constitutes the context of this thesis. In such a community each mem- ber should have the possibility to provide own resources and to use resources of other community members in a flexible and trustworthy environment. When considering the storage and processing of data by remote vendors, whereby data and applications are distributed over several locations, then the latency for transfered data is a further challenge. Regional/local clouds are a solution for such a challenge and provide the disintegration of dependencies from the public cloud providers.

Imagine a community of regionally collaborating musicians make up a social network, which allows them to use resources of other musicians according to the rules of the social network. Such a process facilitates the storage and sharing of different kinds of media, such as audio files, music sheets, and digital graphics, and even the computation of specific task. Even with this simple scenario, one can already notice the underlying concept of this thesis: The concept of trustworthy and social manage- ment of resources as a service. This thesis provides solutions for resource owners to automatically discover available personal resources to integrate them into the Cloud (community cloud, region- al/local cloud) and assures that the resources can be synchronized trustworthy with heterogenous computing devices.

1.1 Motivation

Today’s cloud computing solutions are mainly open to the public (public cloud), borderer on the premises of an enterprise (private cloud), or both in one (hybrid cloud). Such provisioning services can fulfill the basic requirements (mostly resource provision and sharing) of a community, such as the one mentioned at the beginning of the chapter. But the self-sufficient character of communities cannot be satisfied with these solutions, because communities attempt to keep their independence by providing own resources with a minimum of outer influences. Naturally, the independence also relates to the individual members of communities, which want to control and manage their resources and data in an autonomous manner. With this in mind, communities or even smaller businesses should be sup- ported by technical solution, which allows them to establish their own, independent community cloud (includes a business cloud). Furthermore, the collaboration of different communities can positively influence the resource utilization of community members, because there were more participants pro- viding resources. To provide a minimum of outer influences the collaboration should be limited to a specific region (regional/local cloud), in which all participants are physically located.

By bringing in their own devices, community members can provide resources of different heteroge- 1. INTRODUCTION 6 nous devices, such as storage or computational devices. This way, a member can share actual unused resources or integrate new devices with a specific resource capacity to serve the community. For such a purpose technical solutions should include an automatic discovery of member devices and their capacity and workload. Thus, a community member can decide what amount of unused resources he/she want to provide.

In case of smaller businesses, which require a more reliable and trustworthy resource utilization (a self-organized community of regionally collaborating musicians can compensate a specific service downtime, e.g., through arrangements), it is obvious that technical solutions should provide a profile of partners depending on the availability, accessibility and performance characteristics of their re- sources (referred to as quality of service). This information can help a business to choose a reliable partner for the utilization of resources. On the other hand, if the quality of service information are meaningful enough to offer reliable resource provision, the participants can deal in resources and make a business out of it.

With the evolution of computer systems in the last decades, the communication within our society has changed. People using the Internet typically communicate via social networking applications or websites, such as social networks, online chats, forums and microblogging services. Social networks are one of the most important applications to provide an effective communication not only for a community, but also for a business, because they offer a wide range of functionality including chats, content management and file sharing. Examples are Facebook2, LinkedIn3 or XING4, which are all centralized solution. A trend can be seen towards decentralized social networks, such as the Diaspora5 project, whereby each member has an own personal server and can communicate with a certain group of people. However, linking a social network with a resource provisioning service is a strong motive for this thesis. By providing or consuming of resources depending on identities that are authenticated by and related within a social network, the interactions of providers/consumers would be more reliable.

Another trend is the management of personal data (e.g., media files, documents) through personal clouds, which emerged from the growing amount of personal devices (e.g., mobile devices, different home devices) and personal data, and the idea to get the own data from anywhere at any time. Personal clouds facilitate the access of personal data through different devices, the synchronization of files and the sharing of files with an authenticated group of user. If the sharing of files is realized by linking the personal cloud with the social network, the work comes full circle. It follows the overall motivation: The provision/consumption of resources, data and even applications that are located in personal clouds by utilizing the memberships and relations within a social network.

The work presented in this thesis helps to realize such an idea.

2http://www.facebook.com 3www.linkedin.com 4www.xing.com 5http://diasporafoundation.org/ 1. INTRODUCTION 7

1.2 Scenario

To better understand the functionality that the system component will offer, three use cases are intro- duced next, consisting of several composite actions of users provided via Graphical User Interfaces (GUI). The applications, which provide such a GUI will communicate in-/directly with the developed and implemented system component.

The illustration in Figure 1.1 provides a possible constellation of distributed members of a social community. On the left side are two members who are in a room. The members on the right site of the illustration are geographically separated from each other and the members on the left side. Each member has multiple devices that can be utilized for resource, data and software provisioning. Fur- thermore, members can lease resources of other members, share and synchronize data and software.

Use case 1: The user Bob, which is illustrated in Figure 1.1 as the blue colored person, integrates a new device in his personal cloud. The new device is visible through a list of personal devices, but marked as disabled. By selecting the newly arrived device Bob can view the device properties, such as storage capacity or performance values, and activates them for the provisioning via a check- box. If some device properties are not accessible, Bob can specify the missing properties manually. Subsequently, Bob presses the button “Add device” to assign the free resources of the device for a provisioning process. The device is now enabled. Bob decides to provide 20 gigabyte of the device’s storage capacity for a marketplace or for friends. Therefore, he uses a provisioning tool that is respon- sible for the initialization and specification of the resource allocation. In the following, Bob can view the specific properties of his provided virtual storage resources and how many percentage of these resources are leased by others. Furthermore, the newly added device contains data. Bob navigates through his personal data, whereby he selects data files and folders for the sharing within the per- sonal cloud or with some friends, and for the synchronization within the personal cloud. Finally, Bob confirms the selection. The shared files/folders are now accessible through other personal devices of Bob or for friends. If data files/folders are marked for synchronization and the device is not available, they are still accessible.

Use case 2: Bob is visited by his friend Alice, with whom he want to exchange some data. Alice is illustrated in Figure 1.1 as the orange colored person. Both are also friends in a social networking community. Bob can see Alice’s mobile phone and presses the button “Connect”. Then Alice has to accept the request and both are connected through their devices. Bob chooses the data transfer protocol, with which the shared files are transfered to Alice’s mobile phone and goes into the kitchen to make coffee. At the same time, Alice connects to the social network via her browser to get Bob’s Internet address. After this, she calls Bob’s website and authenticates herself. Thus, she can see all of the files/folders Bob want to share with her. She navigates through the list, selects the files she is interested in, and submits her choice. A few moments later she can access the selected files from within her mobile phone. Meanwhile, Bob returned from the kitchen with two cups of hot coffee in his hands. He goes to his laptop and checks if Alice has some files she want to share with him. He briefly examines the list of files/folders that are accessible for him and chooses “Synchronize all files”. After the synchronization is completed, Alice shuts down her mobile phone and both enjoy their coffee listening to Alice’s music tracks that are now accessible in Bob’s personal cloud. 1. INTRODUCTION 8

Use case 3: Bob has to disable a personal device that is available for friends within the cloud. To do this, Bob navigates through the list of his personal devices, selects the device, which is to be dis- abled and confirms. Subsequently, Bob is shown a warning indicating that the device contains leased resources (e.g. virtual storage resources) and shared files that are unaccessible for remote consumers if he disables the device. Furthermore, Bob can navigate through the list of leased resources and shared files that contains usage information and can select those, which he would like to synchronize with the cloud. Bob selects John’s (the gray colored person) leased resources, because he knows that John is dependent on them. After the selection he submits and receives a warning indicating that the device still contains shared files. Because Bob has consciously not selected some shared files for synchronization, he ignores the warning and submits the action. The device is now disabled and can be removed securely from the personal cloud. But the synchronized resources and files are still accessible.

Figure 1.1: Scenario.

1.3 Outline

The underlying diploma thesis contains the basics (Chapter 2) about cloud computing service models and deployment models. This chapter further provides a functional overview of home and personal clouds and ends with a description of mechanisms to establish trustworthy cloud computing. Accord- ing to the gained scientific findings from the basics, the requirements analysis (Chapter 3) describes the problem of providing personal resources to other applications. Based on this analyzes require- ments for a component that delivers resources-as-a-service are formulated. The related work presents the state-of-the-art products and projects (Chapter 4) to analyze the used strategies, mechanisms and technologies. In Chapter 5 the concept for the system design is presented, including a detailed de- 1. INTRODUCTION 9 scription of the resource integration, followed by the resource management, as well as the description of the application programming interface that allows to access and manage personal resources. As a proof of concept the conceptional ideas are realized by means of a prototype presented in Chapter 6. The chapter describes selected parts of the prototype and forms the basis for the evaluation. The concept and the prototype are evaluated in Chapter 7. Chapter 8 summarizes this thesis and reflects ideas on future work. 2. BASICS 10

2 Basics

This chapter presents and describes the most important fundamentals that are related to the task of the thesis. By providing these information to the readers, they experience the necessary knowledge to understand the conceptual and technical ideas, which are presented in the remainder of this thesis.

2.1 Cloud Computing Definition

The advent of Cloud computing was influenced by the advancement and the maturity of different technologies, mentioned at the beginning of Chapter 1. The main ones are now described in brief. Hardware virtualization allows the sharing and utilization of a computer system’s resources (e.g. computation and storage capacity). Thereby, a single physical platform supports the running of mul- tiple operating systems and software stacks. Grid computing provides the aggregation and sharing of resources, which are distributed over a computer system network to accelerate the computation and the management of specific data sets processed by applications, mostly scientific application. Be- cause of the diversity of the grid infrastructure this technology featured some problems in the past - for example, it was not possible to enforce quality of service (QoS) and the availability of resources couldn’t be guaranteed - which could be solved by virtualization technology. TeraGrid1 and EGEE2 are examples for large production grids, while Berkeley Open Infrastructure for Network Computing (BOINC)3 is an open-source middleware for volunteer grid computing that was originally developed to support the SETI@home4 project. While grid computing provides a loosly coupled architecture of distributed computers, cluster computing combines independent computers into a single system to achieve higher availability, higher performance and greater reliability5. An Example for a cluster is MOSIX6. MOSIX provides a high-performance computing (HPC) on X86 based clusters, multi-clusters and Clouds. It is implemented as a software layer (applications can run on different computers) - and automatically allocates resources to run them on other nodes. Utility computing provides a business model, in which computing resources are offered as services similar to the public utility model (e.g., electricity, and water). The customer can use resources at a particular time (just-in-time) and pays only for the capacity of the provisioned resources. Autonomic computing is used to decrease an administrator’s involvement in respect to the management of a system. This is achieved through the use of adaptation engines, which optimize the system based on monitoring data.

1https://www.xsede.org/wwwteragrid/archive/about.html 2http://egee1.eu-egee.org/ 3http://boinc.berkeley.edu/ 4http://setiathome.berkeley.edu/ 5http://www.codeproject.com/KB/IP/ClusterComputing.aspx 6http://www.mosix.org/ 2. BASICS 11

Consequently, software technologies for data center automation providing the concepts of autonomic computing can handle the management of running applications or the automation of Virtual Machine (VM) provisioning. An approach to automate business processes in order to increase productivity is service-oriented architecture (SOA) based on Web services (WS) [LG10]. Web Services allow the sharing of information according to the communication of applications over a network - mostly the Internet - and enable internal applications to communicate with services outside their own domain. Cloud Applications can be built by composing different Web services that are accessible via standard protocols such as SOAP7 and REST8. Figure 2.1 shows the aforementioned technologies in relation to each other and to the cloud. [VBB11]

Figure 2.1: Convergence of various advances leading to the advent of cloud computing [VBB11].

According to NIST9, which is an agency of the U.S. Department of Commerce promoting U.S. inno- vation and industrial competitiveness by advancing measurement science, standards, and technology, Cloud computing is defined as follows [MG]:

“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models”.

Cloud computing exhibit the following characteristics:

7SOAP is a lightweight protocol intended for exchanging structured information in a decentralized, distributed environ- ment. [SOA] 8Representational State Transfer (REST) is an architectural style for distributed hypermedia systems. [Fie] 9National Institute of Standards and Technology (NIST) 2. BASICS 12

• On-demand self-service: The provision of computing capabilities is carried automatically without human interaction and by the need of the customer. • Broad network access: Computing capabilities can be accessed by heterogenous client plat- forms over the network and through standard mechanisms. • Resource pooling: Computing resources (e.g., storage, processing, memory, network band- width, and virtual machines) are provided in a multi-tenant model that can serve multiple cus- tomer. According to customer need, different physical and virtual resources are allocated and released dynamically, generally without control or knowledge over the exact location of the provided resources. • Rapid elasticity: Rapid elasticity means that computing capabilities are provided or released rapidly to quickly scale out or scale in. • Measured Service: According to the type of service, Cloud systems automatically provide control and optimization of resources through a metering capability. Thus, resource usage can be monitored, controlled, and reported. The service models and the deployment models are described in more detail in the next sections to gain more insight into the Cloud model and the context of this thesis.

2.2 Cloud Computing Reference Model

The Cloud computing stack is layered according to Figure 2.2. At the bottom is the physical resources layer, which provides the IT infrastructure. This layer is basically the foundation of the Cloud and consists of different kinds of resources (e.g., clusters, data centers or compute servers). Above the physical layer is the core middleware, which is responsible for the management of the physical re- sources. It is at this layer that an appropriate runtime environment for applications is provided and physical resources are exploited at the best. Besides the previously mentioned hardware virtualization exists yet another well-known virtualization technology, the programming level virtualization, which is taken to manage the execution of dedicated applications. Physical resources and core middleware can be regarded as a complete platform to transparently host applications. The user level middleware represents the connection to the platform and can therefore offer environments and tools for the de- velopment and deployment of applications. Finally, software vendors or enterprises located at the user level can use the Cloud to manage applications and to provide services to customers. [BPV09]

2.3 Service Models of the Cloud

Depending on the layered cloud computing stack (see Section 2.2), into which all the key components are organized and classified, vendors offer services related to a specific subclass of services whose characteristics fulfill the requirements of a specific market sector [BPV09]. There are three main classes of cloud computing services: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) 2. BASICS 13

Figure 2.2: Cloud computing reference model [BPV09]. and Software-as-a-Service (SaaS). The following sections take a closer look at these three service classes and are summarized in Table 2.1.

Category Service Content Product Type Products IaaS Provides Virtual Amazon EC2 virtualized hardware Infrastructure Management, and S3, and storage. Storage Management GoGrid PaaS Provides Programming APIs Google AppEngine, a platform for devel- and frameworks; Microsoft Azure, oping applications hosted Deployment system Manjrasoft Aneka in the Cloud. SaaS Provides Web applications SalesForce.com (CRM) applications that are and services (Web 2.0) Clarizen.com accessible anytime and (Project Management) from anywhere. Google Documents, Google Mail (Automation)

Table 2.1: Cloud computing services classification [BPV09].

2.3.1 Infrastructure as a Service

Infrastructure-as-a-Service (IaaS) aims at providing computer infrastructure as a service. It offers virtual or physical resources in terms of memory, CPU type and power, storage, and operating system [BPV09]. By outsourcing of resources to an IaaS cloud provider a customer or organization is able to reduce the cost for limited computer resources. That means that there is no need to buy resources such as computation, storage or network equipment. Instead, resources can be rent and paid based 2. BASICS 14 on usage. On the other side, if an IT administrator has to install new servers to provide some kind of services, he is confronted with the maintenance of a large number of physical machines (depends on the amount of clients and their needs), which is a time-consuming process with low guarantees on deployment’s time and cost [ER11]. Virtualization technologies make it possible to manage the resources in a more flexible and generic way, and thus are commonly supplied by the providers. Most of the many IaaS cloud providers share the following features [LMS+11]:

• on-demand provisioning of computational resources • virtualization technologies to lease these resources • interfaces to manage those resources • pay-as-you-go cost model • large data centers to provide a seemingly unlimited amount of resources Customers can use the provided resources as a service, which delivers a predefined, standardized infrastructure that addresses the needs of a customer’s software. Such software can include operating systems and applications, that are controlled by the customers. The main advantages of this approach are that the customers don’t need to manage or control the underlying cloud infrastructure, the ability to manage service-demand peaks and valleys, and the reduction of the complexity in adding new features or capabilities [RR10]. Furthermore, customers can always use the latest technology with a rapid service delivery and thus achieve a much faster time to market [REL09].

IaaS provides services for the application infrastructure, whereby the physical infrastructure consist- ing of servers, network devices, and storage disks is offered as provisioned services to customers [CAAS10]. Customers mostly rent virtual machines with a customized operating system and storage unit that are configurable and scalable, and deployed in the virtual infrastructure of the data centers of the IaaS providers. Thus, it is possible to provide on-demand rationing of resources and assuring specific a resource provisioning when needed. In the following, popular IaaS solutions are presented.

Popular IaaS solutions are Amazon Elastic Compute Cloud (EC2)10, which provides a computing in- frastructure and hardware virtualization and Amazon Simple Storage Service (S3)11 offering a storage service to host and access large data sets. [BPV09]

EC2 provides VM instances with a software stack that can be managed like a normal physical server. Consequently, the remote VM can be started and stopped, it is possible to install software packages or to attach virtual disks to the machine, and to configure access permissions and firewalls rules [VBB11]. Customers get the opportunity to monitor and control their applications as an instance. The service management includes the modification of workload capacity, concurrent computational tasks and dynamic provision of additional services, which allows to handle additional workload [REL09].

Another cloud provider that offers storage and infrastructure is Mosso12. Mosso was launched in February 2008 and is a direct competitor of Amazon’s EC2 service. The customers can use virtual servers that are capable of running Linux or Windows applications with the benefits of a scalable

10http://aws.amazon.com/de/ec2/ 11http://aws.amazon.com/de/s3/ 12http://www.cloudcomputingzone.com/2010/05/06/cloud-computing-and-mosso/ 2. BASICS 15 platform [RR10]. The idea is to combine cloud computing with a traditional managed/shared server environment in which the servers are arranged in clusters [REL09].

The Xen Cloud Platform (XCP)13 is an open-source IaaS platform, which supports different oper- ating systems (e.g., Linux, Windows), network and storage support, and management tools. The OpenStack14 project offers OpenStack Compute, an open-source software designed to manage and control a scalable IaaS platform including access control through users and projects, and OpenStack Object Storage for service providers offering a IaaS storage platform.

Further infrastructure services are GoGrid and Flexiscale. Both have a data center architecture and offer automatic reconfiguration of the infrastructure. For more information see [VBB11].

2.3.2 Platform as a Service

Platform-as-a-Service (PaaS) aims at providing an application or development platform with which the customers are enabled to create and deploy their own web applications hosted by a service provider. The key advantage of this approach is that the developer doesn’t need to care about the IT infrastructure (e.g., hardware, patches or backups) and can thus concentrate on the application de- velopment. Development and runtime environments of applications are hosted by the PaaS provider who offers application frameworks and Application Programming Interfaces (API) [CVKB11]. A scalable middleware abstracts from the physical resources of a distributed system and manages the application execution and dynamic resource provisioning. PaaS offers services to manage the full life cycle of the application development, including development, testing, deployment, hosting and management. At the same time, different tools are provided to create web-based user interfaces, to control concurrency management or scalability, to measure performance, or to compose applications via interfaces like SOAP or REST [RR10].

In addition to hosted PaaS offerings there are also stand-alone Cloud application platforms that allow enterprises or organization to use Cloud-based platforms. These platforms are characterized by an increased portability and standards support, reduced lock-in, increased visibility and a broader support for enterprise technologies such as Java and .NET [App09].

An example of PaaS is Google App Engine, which offers an environment for developing and hosting of Web applications. Applications can be developed for the Java/Python runtime environment in a scalable manner without managing the underlying infrastructure, run in their own secure and reliable environment (sandbox), cannot write to the filesystem and only read files uploaded with the applica- tion code. There are two types of distributed data storage, which are distinguished by availability and consistency. Provided services include a high performance in-memory key-value cache for temporary data, a URL fetch service to retrieve web resources and a mail service [Goo].

The Windows Azure Platform is a commercial PaaS, which allows the deployment and hosting of applications including their data. Applications are developed with the Windows Azure Software De- velopment Kit (SDK) and can run as a Web application or background process. Developers can create

13http://xen.org/products/cloudxen.html 14http://openstack.org 2. BASICS 16 applications in different programming languages (C#, Java, PHP, Python, Ruby) involving the com- munication with other services of the platform. Azure provides a storage service with the following types of data: BLOBs to store collections of binary data, Queues for the reliable communication between application and structured data tables. By using the service Windows Azure Connect devel- opers can deploy a virtual private network (VPN) to connect to a local network [Win].

The following solutions are examples of open-source PaaS. AppScale15 provides an open-source hybrid cloud platform that implements popular APIs, such as Google App Engine and MapReduce (via Hadoop). Further solutions are Cast16, CloudFoundry17, and Nodester18.

Aneka is a commercialized platform to create distributed applications for the Cloud that can leverage resources from different sources and supports different application models. Aneka provides a service- oriented container framework to realize different programming models and the framework is based on .NET technology. For more information about Aneka see [CVKB11].

2.3.3 Software as a Service

Software-as-a-Service (SaaS) is located at the top of the Cloud computing stack, which means that the necessary hardware, development platforms and provided applications are supplied and maintained by the service provider. Unlike PaaS, this type of service focuses primarily on the deployment of applications. [BPV09]

SaaS provides management and access to applications remotely via Internet. Configurations and patch updates are performed at a central location, mostly in an automated manner, without having to download or install application enhancements by the customer. The architecture should possess the ability to scale, because applications can be used by many customers (multi-tenant architecture). The advantages over traditional software deployment are data compatibility across the enterprise, facilitated and enterprise-wide collaboration, and global accessibility. [RR10]

The central location and delivery of software as a Web applications makes it possible to access ap- plications from nearly every device and by different users. Thus, customers can create and edit documents with nothing more then a Web browser and save them in their own device or as a backup in the Cloud. At the same time, users can share their documents or work together on one document for a distributed collaborative cooperation. In the following, same of such solutions are presented.

Software that facilitates online collaboration provides the same features as inhouse solutions, but has the benefit to rent the service for a particular time. The provided services contain document management, project management, content management, business email, and calendar. A further feature is the synchronization of calenders or email accounts with other services. Examples for such online collaboration applications are Microsoft’s Office 36519, HyperOffice20, and Wice21.

15http://code.google.com/p/appscale/ 16http://cast-project.org/ 17http://www.cloudfoundry.org/ 18http://nodester.com/ 19http://www.microsoft.com/en-us/office365/ 20http://www.hyperoffice.com/ 21http://wice.de/ 2. BASICS 17

Salesforce.com22 is an example of a SaaS provider, which offers business productivity applications (like Customer relationship management software (CRM)) [VBB11]. Users can access applications via the Internet without worrying about the technical details. Salesforce.com ensures security, confi- dentiality, and integrity of data by administrative, physical, and technical mechanisms. Nevertheless, the customers must safeguard “the security of their customer data in their use of the service” [Sal].

Another commercial SaaS provider is Clarizen.com23, which provides online project management software for team collaboration and project execution.

Google offers free services to provide applications for office automation (e.g., Google Mail, Google Documents, and Google Calendar) accessible through Web interfaces. These services provide the storage of user data on their own computers and on as a backup on Google’s servers.

2.4 Fundamental Deployment Models of the Cloud

The different service models of the Cloud (see Section 2.3) have shown how a computer system, which is composed of many heterogenous units, beginning with the infrastructure components and ending with the applications, can be divided into single system domains to make it accessible, usable and flexible for many users. This section presents the fundamental deployment models of cloud computing, which can theoretically include all of the before mentioned services of the Cloud. As the community cloud is the most important deployment model according to the context of this thesis, it is described in more detail.

2.4.1 Public Cloud

Basically, a public cloud makes resources available, which are offered by a service provider for the general public (general public stands for individual users or corporations [KV10]). A customer can initialize the dynamic provision of resources by accessing Web applications/Web services (SOAP or RESTful services) offered publicly by a service provider [ER11]. The provider sets specific guidelines for the hosting of resources and the utilization of resources is mostly effected by a “utility based pay- per-use consumption model” [EHL+11].

The public cloud is designed to communicate with multiple users, whereby a scalable service platform is able to share hardware infrastructure, software infrastructure, innovation and development, and maintenance and upgrades. Typically, a public cloud provider offers security task (e.g., logging, monitoring, and implementation of controls) to guarantee the reliability of a service. The general public can leverage the remote hosting of resources to eliminate the complexity of infrastructure maintenance. From an economic point of view, this type of cloud computing allows a company to save financial investments and to concentrate on more important areas of their business. [KV10]

Besides the mentioned benefits, there are things which should be considered. A customer should be aware of the aftermath of moving critical applications to the Cloud. Because specific customer

22https://www.salesforce.com 23http://www.clarizen.com 2. BASICS 18 requirements, like customized configuration requirements and service-level agreements (SLAs) to achieve uptime requirements, must comply with the prerequisites of the provider. Security issues concerning sensitive data are the most important requirements a customer should care about. Fur- thermore, outsourcing of IT infrastructure or sensitive data to the Cloud - that means a third-party management is responsible for logging, monitoring, etc. - implies the reduced feasibility of control at the physical and logical layers of the Cloud. [KV10]

The most popular examples for public clouds include services such as Amazon Web Services, Google App Engine, Salesforce.com, and Azure.

2.4.2 Community Cloud

Generally, communities are groups with a common interest, such as music or software, but are inde- pendent of specific characteristics, such as age or height, of the individual participants. Depending on the social aspect of communities, the main issues are membership, influence, integration and fulfillment of needs, and shared emotional connection. The relation between the participants is dis- tinguished by tolerance and trust in order to keep the community together. [BMA10]

Community clouds are characterized by “a shared infrastructure that is employed by and supported by multiple companies” or a group of individual participants. These solutions aim to support groups, which have common requirements relative to the sharing of resources (e.g., joint compliance require- ments, noncompetitive business goals, or specific levels of security). [KV10]

Figure 2.3 illustrates the relations between users within a community and the roles a user can take. A node represents a user who is able to consume (green) and provide computing resources (yellow). Furthermore, a node can be a coordinator (red) for resource provisioning, with the goal to make resources accessible by other nodes as well as to obtain resources from multiple nodes. This approach distinguishes from a public cloud by combining a provider and a consumer in a single node, and by the distribution of the resource provisioning, whereas the roles of a public cloud are strictly separated. [MB09]

Figure 2.3: Community Cloud [MB09].

The following properties describe the specific character of a Community Cloud. It will also compare to vendor Clouds (e.g., Google, Amazon, Microsoft) as appropriate. [MB09] 2. BASICS 19

• Openness: While vendor Clouds provide proprietary computing techniques depending on code, standards and data, the Community Clouds are more open and remove dependence on vendors. • Community: The community owns the infrastructure, whereby the social aspect is brought into the foreground and the economic scalability can enhance competition and innovation. • Individual Autonomy: Nodes of the Community Cloud have their own self-interest in relation to consume and provide resources. The contentedness of the users can be achieved by giving them more self-determination, instead of centralized control. • Graceful Failures: Failures in vendor Clouds can propagate through the entire system, because of their centralized structure. In contrast, the Community Cloud consists of many different nodes with specific capabilities that can compensate the failures of single nodes. So, with distributed nodes expected to provide resources for the community, robustness and resilience can be increased and system-wide failures can be avoided. • Convenience and Control: The members of the Community Cloud will be given a part of the control, because of their involvement in the infrastructure creating a democratic turn-out. Consequently, the convenience is not influenced by the control negatively. • Community Currency: Community currency supports the transfer of goods and services within a community, without the involvement of a central authority (e.g., national government). The Community Cloud facilitates its own currency to allow the sharing of resources. • Quality of Service: To guarantee a high degree of quality of service (QoS) will be a challenge, because of the heterogeneous nature of the system and the different aspects of QoS that are required by the nodes. With the help of the Community currency it is possible to ensure the requested resource provision. When a node delivers a high quality of resource provision, then the price for resource utilization would be higher. So, the price of a service should reflect its quality depending on the Community currency, whereby the QoS could be better than in vendor Clouds from an economic point of view. • Environmental Sustainability: The data centers of the vendor Clouds intensely provide re- sources to satisfy their customer’s needs, while the Community Cloud distributes resources of under-utilized user machines within a flexible architecture, which leads to a reduced carbon footprint than vendor Clouds. By shifting the cloud model to a user-centric community cloud the heterogenous infrastructure owned by a user is provided as a service. On top of the infrastructure provision runs a distributed platform, which is composed of the resources provided by the different members (see Figure 2.4). Thus, the dis- tributed platform can be seen as a self-organized software that supports user-controlled applications. Finally, these applications run on the platform and provide the control of propagation of resource information and utilization of resources. Access of services and APIs is similar as in conventional operating systems, but based on the resource sharing through distributed privacy, security and authen- tication. 2. BASICS 20

Figure 2.4: Services in the Community Cloud [BMA10].

2.4.3 Private Cloud

Private clouds are solely used and administrated within the boundaries of an organization or company, including trusted third parties, to provide a cloud computing environment for their employees. The required IT infrastructure is similar to traditional IT solutions, which results in higher startup costs compared to public clouds. The benefits of private clouds are greater control over the data, the in- frastructure, and the deployment of applications. Consequently, including best practices for securing a company, security in a private cloud is regarded to be tighter. [KV10]

A descendant of the private cloud is the virtual private cloud, which provides the access and control of isolated and secure resources of a public cloud through the use of network virtualization techniques, such as Virtual Private Network (VPN). Examples of vendors offering a virtual private cloud are Amazon Virtual Private Cloud24 and OpSource25. [EHL+11]

2.4.4 Hybrid Cloud

Hybrid cloud is a combination of two or more clouds (private, community, or public) with the aim to share resources, such as data or applications. Figure 2.5 shows how the private cloud relates to a public cloud. For example, an enterprise is considering to deploy noncritical software applications outside their IT infrastructure, then the company can transfer the applications to the public cloud. At the same time, the company keeps critical or sensitive applications in the private cloud [KV10]. The same applies for a community cloud, in that users share resources amongst each other. If additional resources are required to fulfill specific task, then public clouds would be in a position to provide these resources.

An unattractive case for administrators are unsteady service-demand peaks, which require additional computing resources for a particular time. The establishment and maintenance of additional com- puting resources within a private cloud can be avoided by outsourcing of data to a service provider (public cloud). Such a process requires that resources such as applications are designed to be moved to the remote cloud. The dynamic deployment of applications related to the case just described is called cloud bursting [KV10].

The benefits of hybrid clouds are the support of highly scalable hosting environments, the elimination

24http://aws.amazon.com/de/vpc/ 25http://www.opsource.net/ 2. BASICS 21

Figure 2.5: Hybrid Cloud [KV10]. of deployment and maintenance of additional IT infrastructure, and reduced costs.

Examples offering a hybrid cloud model are Amazon Virtual Private Cloud, OpenNebula26 and Co- hesiveFT VPN-Cubed27. These services make use of IPSec VPN tunneling capabilities to securely connect to the public cloud [EHL+11]. Further solutions are Eucalyptus28, OpenStack and Nimbus29.

2.5 Bordering of the Cloud Domain and Emphasis of the Individual

The public cloud provides services for the general public and facilitates a dynamic resource provi- sioning, whereby a specific provider is responsible for data availability. This kind of cloud model is characterized by a strict separation between service provider and customer. In contrast, community clouds consists of different member nodes, whereby a single node can be a combination of provider and consumer of services. Furthermore, the community cloud unites members, which have common interests and requirements for the sharing of resources. The private cloud provides service exclusively on the premises of companies. Thus, a company operates with its own cloud infrastructure and has full control over data and services. Finally, the hybrid cloud combines goals of the public and private cloud, whereby a company can outsource data when needed.

The before mentioned deployment models of the Cloud (see also Section 2.4) can exhibit some dis- advantages for businesses, depending on the laws of different countries and the time to transfer data. Although there are major public cloud providers like Amazon offering services from data centers, which are located in specific regions (e.g., Amazon’s EC2 data centers for Europe are located in

26http://opennebula.org/ 27http://www.cohesiveft.com/vpncubed/ 28http://www.eucalyptus.com/ 29http://www.nimbusproject.org/ 2. BASICS 22

Ireland), the possibility to choose a regional/local cloud provider can bring same benefits for the customers and business.

The features of a private cloud should also be available for the home consumer. Solutions offering such features with a minimum of technical configurations are called home clouds and serve an in- dividual or a small group of people. Such solutions offer the sharing of data, applications and even computing power with heterogenous home devices. If such a home cloud promises the backup of data by a public provider, then it can adopt a hybrid state. The same is true for personal clouds, which also can exclusively utilize resources and data from the public cloud.

In this section, the benefits of regional clouds for businesses and communities are described in more detail. Furthermore, deployment models (home cloud, personal cloud) are presented, which are de- signed to serve a single person or a small group of people.

2.5.1 Toward the Integration of Regional Clouds for Businesses and Communities

From a functional point of view regional or local clouds provide the same deployment model as public clouds. They use web interfaces to negotiate agreements between the provider and customers, and the infrastructure is utilized through virtualization technology. The difference between both models is the physical distribution of the underlying infrastructure. While popular public cloud providers conduct their infrastructure technology across different countries, or even across continents, regional clouds mainly operate on infrastructure that resides on the providers premises (can also include local governments) or within a country to the greatest possible extent.

Popular cloud providers distribute their data centers over locations residing on different continents. Amazon’s Elastic Compute Cloud (Amazon EC2) provides the deployment of virtual machines (or instances) in specific regions (e.g., the data centers in Europa are located in Ireland30) and consists of several availability zones, which are isolated from each other. Instances can be replicated to different availability zones to protect them from errors of a single zone. If an error occurs the request can be directed to another instances. The Windows Azure Content Delivery Network (CDN) offers 24 physical nodes31 (posted: 2011-02-24) around the globe. Data can be copied to various nodes of the global network in a pay-as-you-go cost model to improve delivery of performance-sensitive content by maximizing bandwidth.

In contrast, regional clouds do not provide the distribution of customer data across the globe or a continent, but restrict them to a tighter regional domain. The Chief Technology Officer (CTO) of VISI.com and ReliaCloud32, Jason Baker, said: “the fact that a company can know for sure where its data is located at all times - meaning, in which legal jurisdictions - is more than comforting”33. This statement can be interpreted as meaning that data centers of regional clouds are more reliable for

30http://aws.amazon.com/de/ec2//185-0575415-0585204/ 31http://blogs.msdn.com/b/windowsazure/archive/2010/08/09/20-nodes-available-globally-for-the-windows-azure- cdn.aspx 32http://www.reliacloud.com/ 33http://www.readwriteweb.com/cloud/2010/04/the-regional-cloud-a-case-study. 2. BASICS 23 customers - especially customers who are located in the same region - than for example the availability zones, respectively regions, by Amazon.

In case of a split architecture, in which the applications and data centers are eventually distributed over a wider sections, a challenge would be to reduce the latency of communication. Regional cloud providers “offer cloud services close to your physical data center” [Mur09], which should be a solu- tion for the latency problem. The execution time of tasks is very important for businesses, because fast business processes reduce costs and increase the acceptance of the customer to the service. Ad- ditionally, the customers can benefit from local support and the knowledge where their data is stored. The Chairman of NextCloud34 stated that his company is “creating the kind of low-latency, high- bandwidth environment” that really allows an enterprise to move into the cloud and argues that most businesses “want to have a consultative relationship” with cloud providers35.

Different countries have different privacy laws in terms of compliance and regulations of the pro- tection of data. The advantage of utilization of services through regional cloud providers is that the customers are in fully compliance with a country’s laws. A further example of a regional cloud provider is Scaleup Technologies36 that stores customer data in data centers at Berlin and thus, guar- antees the protection of data through the German Data Protection Act (Bundesdatenschutzgesetz, or BDSG) [Sub10].

The combination of a wide range of services offered and the effective utilization of information tech- nology brings many advantages for a community using regional services. These services can help to access information by building a shared data management or adopt new business models to save cost. Further benefits are the collaboration of departments of a community, an effective decision making and the utilization of existing IT capabilities to deploy such solutions. IBM provides tech- nical solutions based on cloud computing for the scenario just described. IBMs solution “Smarter Government”[Sma] presents a suite of cloud-based solutions to help regional and local governments administrate their data, share their capabilities and manage their business processes cost effectively. Another example for a collaborating community is IBMs solution “Smarter Cities”37, which provides the coordination of city agencies and resources to deliver services to citizens [IBM11].

2.5.2 Services in Home-based Cloud Computing Environments

Nowadays, ordinary consumer can use a wide spectrum of end-devices to communicate with other people, manage their data or consume media, such as music or video. Besides personal computers used for activities just described, mobile personal devices, such as smartphones or tablets, are getting more powerful with increasing performance of processing and storage capacity. The result - from a technological perspective - is an environment that provides several devices, with which a user has the ability to execute, manage, or process similar task. The communication between these personal devices and infrastructure devices in an home environment leads us to the home network.

34http://www.nextcloud.co/ 35http://www.clouditproonline.com/article/public-cloud/nextcloud-offers-regional-cloud-136063 36http://www.scaleup.it/ 37ftp://public.dhe.ibm.com/software/solutions/soa/pdfs/IBM_Intelligent_Ops_Center_Solution_Brief.pdf 2. BASICS 24

A home network allows the communication between different digital devices with the goal to share files or documents, to connect to the Internet simultaneously, or to use accessories, such as network printers or scanners. An easy way to share files in a home network is to integrate a local network- attached storage (NAS), which operates as a file server connected to the home network to provide data access to multiple users through access control, backup and data management [Spe10]. A chal- lenge such a solution faces is the provision of files in an appropriate format for heterogenous devices with different capacity properties. More important challenges are the synchronization of data stored on a specific device with other network devices, and the provision of device data if the device is temporarily not available in the home network. However, the utilization of data through heteroge- nous devices places the burden of application management on the user side. With the idea in mind to additionally deliver applications, or even resources from a centralized application provision unit, the before mentioned home network transforms into a home cloud.

Home-based cloud computing, also known as a home cloud, is defined as an environment, in which “consumers can unify their home data storage and have an application platform from which to run a personalized set of cloud applications” [Wes11b]. The home cloud facilitates that all devices of a home network acting together seamlessly to optimize the sharing and synchronization of data. Imagine the following situation: An authorized device that has new data, and being temporarily not available, connects to the home cloud. The device would be detected by the Cloud and the new data set would be copied into the Cloud - what means that the data is distributed over resources of different devices - and not onto a single machine or server. Then, the Cloud monitors the status of the newly added device and can provide the data from alternative resources if it is no longer available [Wes11a, Sto08]. Such a home-based virtual storage or home storage cloud ensures that the data is protected from loss. To view and share data among different network devices, applications (e.g., backup application) are deployed centrally in the home cloud. Thus, it is not necessary to install, configure and maintain the applications for every device.

Creating a home cloud does not necessarily mean that the data remains in the home network. The combination of both home cloud and external public cloud offers the simplification of data manage- ment [Wes11b, Roq08]. An application (SaaS) deployed in the home cloud is able to communicate with public cloud services - for example, for managing different social networking services. In such a case, a user can easily update his/her profiles with the same versions of files by different devices [WTGB10]. Other applications can be used to manipulate files or documents by different kinds of computing devices. Once the manipulated files are stored, they would be available for other network devices because of the synchronization of data in the home cloud. The manipulation of media files such as video leads to a high consumption of computational resources. Leveraging of computational resources of other devices increases the application of computationally intensive software for a wide range of devices and provides a lower latency response. Nevertheless, if the computational consump- tion of home devices is too high, or is not wanted, and the higher and less predictable latencies of external service can be tolerated, the computational tasks can be delegated to public cloud service [KGS11].

Further, the combination of a home cloud with public clouds brings potential issues with privacy and security of data. These concerns are discussed in detail in Chapter 2.6. 2. BASICS 25

This section described the most important services that a home cloud can provide to their users, sum- marized in Table 2.2. The following section examines the characteristics and differences to another deployment model of the Cloud, the personal cloud.

2.5.3 A proper Distinction between Home Cloud and Personal Cloud

While the concept of a home cloud (see Section 2.5.2) is mainly focused on the utilization of data, ap- plications and resources in a home network consisting of heterogenous devices, and the consumption of external cloud services in case of additional resource provision, the concept of a personal cloud is designed more openly to provide own resources from anywhere and anytime.

In their paper “Policy Expressivity in the Anzere Personal Cloud” [RYJ+11], Oriana Riva, Qin Yin, Dejan Juric, Ercan Ucan, and Timothy Roscoe have described personal clouds as an ensemble of owned heterogenous devices, which consist of physical and virtual resources. These devices (e.g., mobile phone, laptop, and personal computer) can dynamically provide and acquire new resources to manage a user’s personal data (music collection, photos, contacts, etc.) in a home network. A synchronization of data based on device capability properties should be supported and the ensemble of devices can be extended by public cloud services, if utilization limits of resources have been reached.

A distinction between personal clouds and home clouds will be determined if the mobility of users and the need to access their own data from anywhere at any time is considered. According to Tian et al. [TSH11], personal cloud computing is categorized into three types: online storage, online desktop38, and Web-based applications. The mentioned types are services offered by public cloud providers, which are consumed solely by mobile devices without any local resource sharing. This case demonstrates that the definition of a personal cloud can also mean that the resource provision happens one-sided by external cloud providers.

Most of the commercial personal cloud solutions are storage-centered and provide access to personal servers39 or devices (e.g., Akitio [AKI08], Iomega [Iom08], Tonido [Ton08]). Users can access and share their stored data from any Internet connected computer. The replication and synchronization of personal data can be accomplished between personal servers or servers and clients, whereby the server is a single machine that communicates with a remote server or with client software running on heterogenous personal devices. Thus, even a backup of data residing on a personal server can be replicated with a remote personal server. Such providers reject the integration of public cloud services for privacy, security, and cost reasons in today’s technologies. But a trend toward a wider distribution of personal clouds and a broader utilization is evident [Gru10]. For smaller businesses it is important to access critical data whenever it is necessary. Thus, by including public storage cloud providers would deliver a greater assurance that their data is always accessible. Another important trend would be the utilization of personal storage devices to create a community backup pool, in which the members can decide how many storage resources they want to make available for other community members.

A summary of the before mentioned properties of personal clouds and home clouds is presented in

38An online desktop is an online version of the personalized setting of a user’s desktop, accessible through the Internet. 39A personal server is a local storage device that stores backup data. 2. BASICS 26 the following table:

Deployment Type Local External External Multiple Remote Sharing Resource Resource Resource Heterogenous Access With Other Provision Provision Provision Devices People (solely) Personal Cloud X X X X X X Home Cloud X X X X

Table 2.2: Characteristics of personal and home clouds.

Based on this summary one can recognize that personal clouds provide more options to access per- sonal files and documents (sharing with other, solely outsourcing), and serve a resource sharing com- munity, whereas home clouds mainly allow access to data for a specific person. Because the features of the home cloud are also features of the personal cloud and the amount of features of the personal cloud is higher, the home cloud is a subset of the personal cloud.

2.6 Mechanisms for the Development of Trusted Clouds

Customers outsource and transfer their data to data centers of public cloud providers or personal machines of members of the same community to effectively take advantage of a remote resource provision, securely share information and save cost. Depending on the customers requirements and the security mechanism that are offered by the cloud providers and vendors, the customers have more or less trust in the provided services. From the developer’s point of view it is very important to know and implement the mechanisms to develop a trustworthy system, whereby the acceptance of the customers to utilize the system would be the highest. This section describes what kind of information must be protected and what mechanisms exist to securely and trustworthy manage the data and information of customers.

2.6.1 Privacy in the Cloud

Privacy is difficult to define, because it is unique to each individual. There is no clear definition of privacy, because the thinking about privacy depends on time and context (e.g., moral, social or political background). Definitions ranging from the right to be left alone to the ability to control the release of information about oneself. In the context of Cloud computing privacy is declared as “the ability to protect information about oneself, and to also have some level of control over any information that has already been released” [AC11]. This definition applies more to a commercial, consumer background. An organization involved in the Cloud provision would require the application of laws, policies, standards and processes to handle Personally Identifiable Information (PII) [PB11].

A classification of information a user should consider as private is given in [AC11]: 2. BASICS 27

• Personally Identifiable Information (PII): This type of information includes all information, which can be used to identify a person. Single information, such as credit card number or social insurance number, are uniquely associated with a person. Other information, such as name and birthday of a person, can be combined to identify a person. • Sensitive Information: Sensitive information, such as wage, age, sex, and religion, can be used to deduce a group of people, but not explicitly to a single person. An exception is the following case. If an employee of a company is the only one with a specific religion, belonging to no other employee, it would be possible to identify the employee based on sensitive information. In such a case the sensitive information becomes PII. • Usage Information: This classification of information contains all information that can be used to track the history of any activity of a person. A person can’t be identified by a small amount of collected information. If large amounts of usage information of a person are collected over a period of time, then this kind of information becomes PII, because it can lead to a person. Cloud Computing provides among others the storage of user data by a vendor owning and controlling the underlying infrastructure. Additionally, information such as PII is stored and processed by the vendor. Necessary requirements for users are visibility and control of their own data, and the freedom to determine who gets access to personal information. This also applies to authorized secondary usage, whereby a user should be able to decide if personal information are transfered to third parties (e.g for advertising). Cloud providers are responsible for the backup of user data that is realized by replicating data in multiply data centers. If multiple parties are involved in such a procedure, a user should know about the participating parties. This also means that data proliferation should take place in a certain jurisdiction. Another point is the involvement of third parties to ensure legal requirements for personal information. But it is unclear how these parties can be categorized as trustworthy and how data can be assigned for a legal processing. [PB11]

2.6.2 Security of Outsourced Data

Cloud computing facilitates software mechanisms to communicate with the outside world by pro- viding web interfaces that customers use to negotiate service level agreements, and application pro- gramming interfaces to execute commands and process data. These applications transfer sensitive information and huge amounts of data that is stored in the backend of the Cloud. Thus, the need to secure software and data management is transferred from the customer to the cloud supplier. To provide the most reliable practice for the customer’s data and applications, the cloud supplier has to develop a secure execution environment and secure communications.

To provide a secure execution environment, software can be configured depending on many parame- ters, which is a very complex task. An important issue is to prevent malicious programs to transfer and execute code embedded in data at a high privilege level [KV10]. The entry point to the Cloud, through which a customer can for instance control leased applications, is a potential door for malicious attacks and must be protected through strong authentication methods. Another security vulnerability is based on the programming language. The C language cannot prevent buffer overflows. Consequently, the programmer has the burden to implement safe programming techniques to check boundary limits and 2. BASICS 28 function calls. Better security mechanism are provided by the object-oriented language Java, which prevents buffer overflows, the use of uninitialized variables and invalid code by the Java Virtual Ma- chine (JVM).

As mentioned above, secure communication is another mechanism to establish reliability. To reach this goal the transmission and storage of data must be secured by cryptographic methods. A secure cloud communication includes “the structures, transmission methods, transport formats, and security measures that provide confidentiality, integrity, availability, and authentication for transmissions over private and public communications networks” [KV10].

In their security guide “Security Guidance for Critical Areas of Focus in Cloud Computing V2.1” [Clo09], the Cloud Security Alliance has identified several domains of concern for cloud computing. The domains are divided into strategic and policy issues within a cloud computing environment, and more tactical security concerns and implementation within the architecture. The following character- istics concentrate on the tactical security concerns and implementation:

• Incident Response, Notification, and Remediation: Cloud providers host a huge number of customers that can deploy applications to the cloud fabric. The design of these applications can be deficient in the sense of data integrity and security. Monitoring of these applications as well as external incidents is very complex with an exponential increasing volume of notifications. Application-level firewalls, proxies and application logging tools provide granular narrowing of incidents to a specific customer. • Encryption for Confidentiality and Integrity: Generally, the cloud provider network is more secure then the open Internet. With an architecture consisting of different components and maybe different organizations sharing the cloud, there is a need to encrypt multi-use credentials, such as credit card numbers, passwords, and private keys, in transit over the Internet and within the cloud provider’s network. To control data that is stored by the cloud provider, customers can hold the cryptographic keys and decrypts the data. Iaas environments commonly provide encryption solutions supported by third party tools. Data encryption within Paas and SaaS environments is more complex, and need to request from their providers. • Key Management: If cloud providers offer the protection of data, then they have to support robust key management schemes. This includes that the stored keys are protected and access to the keys must be limited to specific roles, as well as the backup and recoverability of keys must be implemented. • Identity Provisioning: An enterprise outsourcing parts of their business activities has the need to extend user management processes to cloud services. Customers should avoid proprietary solutions and leverage standard connectors, such as SPML40. • Authentication: A vital requirement for companies is the authentication of users that utilize cloud services. To leverage existing systems and processes a dedicated VPN is a valuable authentication strategy. To leverages existing identity management systems, such as SSO41

40Service Provisioning Markup Language (SPML) is an XML-based framework that facilitates the exchange of user, re- source and service provisioning information between cooperating organizations. 41Single sign-on (SSO) 2. BASICS 29

solution or LDAP42 based authentication, a dedicated VPN tunnel will be a better solution. Further authentication mechanisms that should be provided are one-time passwords, biometrics, digital certificates, and Kerberos. • Authorization and user profile management: An organization with a specific amount of em- ployees operating in the Cloud is anxious about the privileges of their employees. To give them access to a specific scope of actions it is necessary to establish trusted user profiles and pol- icy information in an auditable way. Access control involves the identification of authoritative sources of policy and user profile information, the request and enforcement of policies, and a format to specify these information. A company should consider that information about the identity of their employees stored by the cloud provider could be a risk. Secured access to the cloud is possible through a direct VPN or through an industry standard such as SAML43 and strong authentication. Proprietary solutions for Identity-as-a-Service (IDaaS) are a risk, be- cause of the lack of transparency. A better solution is to use open standards for the components of the Identity and Access Management (IAM).

2.6.3 Trustworthy Cloud Computing

With the possibility to use remote resources provided by specific vendors, the customer risk the vio- lation of privacy and security of outsourced data. Resources can be controlled securely by the appli- cation of virtual private clouds, as mentioned in Chapter 2.4.3.If the vendors don’t allow to control the cloud resources or provide control mechanisms that are not sufficient for the overall protection of private data, it is hard for the customer to avoid unauthorized access or secondary usage by tech- nical mechanisms. Consequently, the customer must trust the vendors. This trust is based generally on contracts (SLAs) and mechanisms to detect and remedy violations, or penalties for violation of SLAs. Increasing of security mechanism means that customers are willing for outsourcing their data. [PB11]

Trust can be classified according to the information origin. There are two models based on interactions and observations of members: the direct and the indirect model [BEv+11]. At this point members have software agents that are responsible for the interaction between the members. In the direct model, trust is the result of the direct interaction with other agents (past interactions are also considered) and indirect observation of the interaction among agents. In both cases, an agent calculates a trust level that can be stored in the database of the agent. Agents in the indirect model get information from their neighbors, because a direct interaction with specific agents is not possible. The indirect model is used in the initial communication phase, in which the interaction with unknown members is low. [BEv+11]

Another way to increase trust is reputation, which is derived from a social background. The opinions of the members of a community can be aggregated and illustrate the level of trust. So, the members of the community can rank a specific vendor (e.g., based on security properties) and others can select a good vendor depending on the ranking. A trust model based on reputation should provide the fol-

42Lightweight Directory Access Protocol (LDAP) 43Security Assertion Markup Language (SAML) 2. BASICS 30 lowing features [BEv+11]:

• foster the production of assessments, • provide facilities for collecting and storing assessments, • offer rewarding assessment mechanisms based on its levels of correctness. The trust level of members can be made globally visible and accessible to all members of a com- munity. In this case, the trust value of a specific member x is calculated by the opinions that all other members made about member x. Trust that is globally visible assumes that each member of the community has the same concept for the assessment of good and bad characteristics of members. The need for personal representation of trust leads to a local visibility. In such a case, the members can have different concepts for the assessment of other members, something that is acceptable for one member may be unacceptable for another one. Thus, the members can’t completely apply on the opinion of the other members which results in higher interaction between members to built a trust level. Consequently, the local approach is recommended for small or intermediate communities. [BEv+11]

Furthermore, trust in the relation to cloud computing is differentiated by persistent trust and dynamic trust. Persistent trust is related to the level of confidence in long-term underlying properties or infras- tructure. This kind of trust is not only based on technological concerns, but also on social influences. Because the customer are interested to know who (e.g., auditors, security experts, regulators) is con- trolling the infrastructure and what they are controlling exactly. Dynamic trust is related to certain states, contexts, or short-term and variable information. Dynamic technological-based trust need to be combined with technological and social mechanisms to provide persistent trust. [PB11]

The customer transfer their data to the Cloud providers, which means all the information can be accessed by the provider (if not encrypted) and used for unauthorized purposes. An increased focus on privacy and security would gain the trust of consumers, a wider acceptance and further growth of Cloud Computing.

2.7 Summary

This chapter presented the essential characteristics, service models, and deployment models of cloud computing. The most relevant models for this thesis are infrastructure as a service, community cloud, regional cloud and personal cloud. The service model IaaS, discussed in Chapter 2.3.1, offers on- demand provisioning, control and management of storage and computing resources within a comput- ing infrastructure. The community cloud, discussed in Chapter 2.4.2, is used by a group of people who have common interests. The members own the resources to establish the cloud infrastructure and every member can be a consumer, provider or coordinator in the resource sharing process. Chapter 2.5.1 discussed regional clouds and showed that the collaboration of departments of communities, such as local governments, may have the advantage of effective resource utilization, save cost, and remove dependency on public cloud providers. The personal cloud, discussed in Chapter 2.5.3, pre- sented the possibilities for individuals or smaller groups to share resources and data from anywhere 2. BASICS 31 at anytime. Finally, privacy and security mechanisms to create a trustworthy cloud were presented in Chapter 2.6. 3. REQUIREMENTS ANALYSIS 32

3 Requirements Analysis

This chapter analyzes the requirements that must be met in order to achieve the development of a system component1 that is an integral part of a personal cloud environment. The system component should be responsible to provide computer resources, such as storage capability and computing per- formance, as well as data and software, as Resource-as-a-Service (RaaS). While IaaS (see Chapter 2.3.1) generally represents a system that can be dynamically extended or modified with any amount of computer resources the RaaS approach presented in this thesis concentrates on the discovery and management of resources, data, and software in personal clouds, which are also able to utilize remote resources, data, and software. A detailed analysis of the problem is presented in this chapter. There- fore, the local network consisting of multiple devices and the dependence on applications, which consume/provide resources and information about them, as well as data and software, is surveyed to border the main responsibilities of the RaaS component. Depending on this considerations, the concrete requirements are formulated.

3.1 Problem Context

Nowadays, cloud computing solutions are offered by popular vendors, such as Amazon, Microsoft and Google, which provide different service models to utilize high scalable virtualized resources over the Internet. Customers benefit from the simplified management of infrastructure, platform and software, because these tasks are taken over by the Cloud provider. But outsourcing of computing resources and data also means loss of control and security, which leads to a reservation of companies against the usage of cloud computing solutions that are not on the premises of the company. For the solution of these problems the junior research group FlexCloud2 was founded. The goal of FlexCloud is to develop methods and mechanisms for supporting a secure Cloud lifecycle to enable companies and home users to securely move their applications and data into the Cloud. The research results provided in this thesis contribute to allow home users to provide resources, data and applications into the Cloud in the context of FlexCloud. The following considerations describe how the system component developed in this thesis depends on the underlying infrastructure, which establishes the foundation for the scenario (see Chapter 1.2) and other applications of FlexCloud.

Figure 3.1 presents the layered view of the architecture that integrates resources, data and software, which should be utilized by friends of a resource sharing community in a provider/consumer manner. Here, the main focus lies in the local network and aims at getting all of the available resource infor-

1The system component can also be regarded as a service, which delivers a specific functionality to serve the overall system. 2http://flexcloud.eu 3. REQUIREMENTS ANALYSIS 33

Figure 3.1: Layered view of the integration of resources and data into FlexCloud. 3. REQUIREMENTS ANALYSIS 34 mation of a user’s personal devices. These information are propagated to make them visible for other user applications of the Cloud and help to utilize resources, data and software.

The layer at the bottom of Figure 3.1 contains the different personal devices and their resources, which constitute the infrastructure of the personal home network. The personal infrastructure con- sists of multiple heterogenous devices (e.g., storage server, personal computer, laptop, mobile phone, and USB mass-storage device), that are located in a local network or connected via short distance connections (e.g. Bluetooth). Storage capacities of the devices range from a few GB to terabytes and processors may also lie in the field of server-class processors. The connection between devices ranges from wireless network to Gigabit Ethernet.

The layer Resource-as-a-Service summarizes the essential functional units to integrate the heteroge- nous personal devices into the Cloud, as shown in Figure 3.1. Integration means that the devices should be discovered, resources of the devices should be determined, communication and data trans- fer with devices should be possible, as well as the management of the resources, data, and software. Most of these features are summarized in the unit Resource Integration. Due to the fact that a manual definition of device and resource properties is a time consuming process, an automatic device and resource discovery should be facilitated to reduce the time to identify the properties. Once prop- erties are discovered, applications can use the provided device and resource information, which are propagated by the API. A discovery of device data and software should also be possible, whereby software is regarded as data. Such a simplification includes the case that a software is an executable file, which can be transfered between computers. Thus, if data is mentioned in the following text it can also mean software. Device Data can be used and shared by applications, also applications on remote computers, depending on specific access rights. Furthermore, the device discovery is not only restricted to the home user’s devices. As described in use case two of the scenario (see chapter 1.2), it can also be applied to friends’ devices. Under the precondition that a located device can be identified as a friend’s device, it should be possible to share data between them.

For management of resources, the unit Resource Management is responsible for allocating, reserving, and releasing of resources. It should also be responsible to migrate resources to other devices. Thus, the access to storage resources can be managed transparently. The unit Data Management is respon- sible to add, delete, and modify datasets. It should also migrate datasets to other storage resources. Furthermore, this unit has to handle access rights to datasets and the provision of data for remote machines of friends.

The layer at the top of Figure 3.1 contains the most important applications that want to receive/de- liver information about existing resources or datasets of the local network, or they are necessary to identify resources of specific members of a community. In the following, the dependence on these applications is described (a summary can be found in Table 3.1):

• Resource/Data Provisioning: A resource provisioning application needs a concrete descrip- tion of the available resources to divide them into virtual resource, which are later provided on a market place or reserved for friends of a community. Such a description would be delivered as a resource plan. If a virtual resource was specified, the capacity of the virtual resource must be allocated to prevent further virtualization by other consumers, which leads to a modification 3. REQUIREMENTS ANALYSIS 35

of the resource plan. In the case of a leased virtual resource by the market place, allocated virtual resource must be identifiable so that the holder can use it. The provisioning of device properties is also necessary. A data provisioning application creates indexes in a dataset that is stored in the distributed storage resources of the personal home cloud - but instead of obtaining a detailed resource description, it consumes data and metadata transparently. On the other side, data provisioning would provide data and metadata if a client is searching for a data item. • Adaptive Synchronization: This application synchronizes data with multiple heterogenous devices of the personal home cloud. Because of the different properties of the devices , such as performance, storage, and display size, there is the need the adapt media files so that they can be effectively utilized. Thus, additional data and metadata is provided to serve a specific device. Therefore, this application needs information about the original media type (metadata), the media file itself (data), the device type property, as well as performance properties to handle the adaptation process. • Data Storage Utilization: Such an application is deployed on a machine in the personal home network and is used to survey leased virtual resources and shared data on the local and re- mote machines (e.g. a machine of a friend), as well as the attachment of access rights on local personal files. To utilize storage resources or access/provide shared files of community mem- bers, consumption values of leased virtual resources should be provided (consumes resource plan), as well as information about the type and size of media files, and access rights to data of community members. On the one hand, data and metadata is provided if a friend has at- tached specific access rights so that data is accessible. On the other hand, data and metadata is consumed to summarize locally accessible datasets. It should also be possible to discover new devices to overview or integrate their resources and data. The integration would also require the consumption of device properties. • Social Network: Defines a community of friends or employees of a business, which want to utilize resources and share data depending on the relations within the community. If a member has a deficit on resources (e.g. storage resources) in the own personal home network, it should be possible to utilize resources of other members. Such a case facilitates that members can backup their data remotely. Consequently, the community members should access information about the provided resources of their buddies. The sharing of files requires the propagation of access rights and information about the provided media files. Such functionality will be provided by the applications mentioned before. The most important benefit of a community is that the members can be identified. By requesting the social network application, other applications can determine the identity of community members and allow resource utilization and access to files accordingly. The dependence on the before mentioned applications is summarized in the following table: 3. REQUIREMENTS ANALYSIS 36

Dependence Resource/ Adaptive Data Social Data Synchronization Storage Network Provisioning Utilization Consumes Device Properties X X X Provides Resource Plan X Consumes Resource Plan X X Provides Data / Metadata X X Consumes Data / Metadata X X X Provides Data Access Rights X Consumes Data Access Rights X Provides User Identity X

Table 3.1: Dependence on applications of FlexCloud.

3.2 Requirements

Based on the description of the problem context and the dependence on the introduced applications (see Chapter 3.1), the concrete requirements are presented in the following.

The task description (see the beginning of this thesis) already states three basic requirements: The discovery of resources by common protocols in local networks and short distance connections, the assignment of resources by trust metrics in the context of SLAs and the connection to social networks, and the synchronization of the devices’ resources, data, and software with a local directory. These basic requirements are further divided to discover and structure the most important features, which are necessary to fulfill the task. An overview can be found in Table 3.2.

The discovery of local available devices should be performed with common protocols that do not necessarily require the installation of additional software libraries to advertise the device-to-find. But, the device must support the protocol. If a device was found, the RaaS component has to check what kind of resources are available and determine the resource properties. It also requires to use protocols that the device supports. The information about the discovered devices and resources should be provided in a device and resource plan to make them consumable by other applications. Since other applications can modify resources, the RaaS component has to modify the device and resource plan. Leased or reserved resources become virtual resources, whose properties should be provided and modified. Since the implementation of all available discovery protocols is not possible in this thesis, the discovery should be capable of integrating new protocols.

The RaaS component should be able to discover datasets on devices of the local personal network. It should furthermore provide and modify the discovered datasets, and transfer data by common proto- cols.

For the identification of devices, resources, and datasets the RaaS component should analyze the identification items, which are on the device, or attached to a virtual resource or a dataset. Thus, if an incoming request asks for information about a specific virtual resource or dataset, the corresponding information would be delivered. Such informations could include the owner, usage context, and principle utilizer. Because of security issues, a complete device should only have one owner and zero principle utilizer to prevent that a malicious user controls the complete device based on the perspective 3. REQUIREMENTS ANALYSIS 37 of the RaaS component. For now, there is no need to require special security issues for software, since software is regarded as data, as mentioned in Chapter 3.1.

Resources, virtual resources and datasets that were discovered should be synchronized by the RaaS component with a directory on a local machine. The Synchronization should allow applications to access resources and data of multiple devices of the local personal network in an uniform way.

3.3 Summary

This chapter has described in detail the problem this thesis is going to address and will close with an overview of all the requirements that were named in the section before. They have been grouped into basic requirements and therefore split into four groups that contain the concrete requirements (Table 3.2).

Basic Abbr. Concrete Requirements Requirements

Resource Discovery R-RD-1 Discover devices by common protocols R-RD-2 Discover resource information by common procedures R-RD-3 Provide and modify device and resource description plan R-RD-4 Provide and modify virtual resource information R-RD-5 Discovery should be capable of integrating new protocols Data Discovery R-DD-1 Discover datasets of distributed local resources R-DD-2 Data transfer by common protocols R-DD-3 Provide and modify datasets Trust Metrics R-TM-1 Provide / Attach information about the owner to a device R-TM-2 Provide / Attach information about the usage context to a virtual resource R-TM-3 Provide / Attach information about the principal utilizer to a virtual resource R-TM-4 Provide / Attach information about the owner and principal utilizer to a dataset Synchronization R-S-1 Synchronize distributed local (virtual) resources with a local machine R-S-2 Synchronize distributed local datasets with a local machine R-S-3 Uniform access to resources and data of multiple devices

Table 3.2: Overview of all requirements. 4. STATE OF THE ART 38

4 State of the Art

This chapter provides a presentation of personal cloud products and presents their hardware and resource information discovery mechanisms, as well as their respective features and Cloud con- cepts. The main part consists of an analysis of relevant projects (research and commercial domain) in the context of community, personal and home clouds, regarding the discovery of hardware and resource information respectively, and the application of Service Level Agreements (SLA). The sce- nario (Chapter 1.2) has presented some use cases to demonstrate what kind of functionality the system component should provide. According to this, devices (e.g., mobile devices, hardware devices, or net- work computers) should be discovered or specified in the local network to share data with friends. Furthermore, local network devices can contain resources (e.g. storage resources) that can be pro- vided to members of a community or traded on a marketplace. The requirements analysis (Chapter 3) has explained the dependencies to other applications and the need to provide information about the available resources. Thus, the following investigation is also directed to tools that can provide detailed resource information (e.g., total, or available capacity of particular storage resources). These information can help to generate customized SLAs and thus, increase the reliability of the resource provisioning. Additionally, the presented projects are examined whether SLAs are applied.

Another study has concentrated on the discovery of devices and has therefor analyzed different service discover protocols (see Appendix A: “Recherche über das Auffinden von Geräten in mobilen Szenar- ien”). A service discovery can deliver information about the service address, which includes the machine address, and additionally the machine name, port, or a textual service description. Zeroconf technologies, such as Avahi1 or Bonjour2, allows devices to connect to a local network automatically by sending multicast messages (e.g., mDNS or DNS-SD) to other machines in the local network. For example, a machine can advertise Secure File Transfer Protocol (SFTP) services to other machines, which can discover and browse the advertisement with specific client tools. The utilization of the SFTP services would take place by sending messages to the advertised address, including identity credentials. The study has also presented projects, such as Webinos and Amigo, which use service discovery protocols to establish a flexible communication between local network devices and are not further discussed in this chapter.

4.1 Personal Cloud Products

This section introduces two providers that offer NAS devices for the personal cloud. In the following, it will be explained what technologies are used to discover device. Furthermore, technologies are

1Avahi facilitates service discovery on a local network via mDNS/DNS-SD protocols. 2Bonjour is a Zeroconf implementation by Apple and supports mDNS/DNS-SD to discover services. 4. STATE OF THE ART 39 presented that facilitate the access to data, general device features and the handling of shared data are mentioned.

The Iomega Personal Cloud3 is a technology that provides access to Iomega storage and manage- ment devices over the local network or Internet. Iomega provides several storage devices (NAS de- vices) for the personal home network or businesses with remote access of data. The NAS devices are “VMware Ready” certified for Network File System (NFS) and Common Internet File System (CIFS) and can be utilized by VMware user to consolidate and virtualize storage resources [Iom09a].

Iomega NAS devices support two ways of discovery [Iom09b]:

• SendTarget: SendTarget is the default discovery method and requires an initiator, who has to know the IP address and port number of the target (NAS device). When the device was found, an Internet Small Computer System Interface (iSCSI)4 initiator can log on with/without authentication. • Internet Storage Name Service (iSNS): iSNS is a protocol to automate the discovery, manage- ment, and configuration of iSCSI and Fibre Channel devices over TCP/IP. To launch the iSNS device discovery, users have to activate the local iSNS server, or specify host name or IP address from an external iSNS server through the Iomega Web interface [Iom11].

Iomega offers as management software the open source-code for the NAS Cloud Edition. This soft- ware consists of Debian packages, which contain, inter alia, the packages mDNSResponder, hotplug (see also 4.3), dbus, libsoap, bluez5, and sobexsrv6.

The most common features for Iomega Personal Cloud devices are:

• bluetooth-upload, UPnP-AV7 media server (StorCenter ix2-200- and ix4-200d) • upload and download of files (via client software) • local access to own/shared files (depends on user account) via web browser (login/password) • remote access to own/shared files (depends on user account) via web browser (user has to send invitation to herself or to friends) • iSCSI (internet Small Computer System Interface) for storage consolidation • backup and restore (RAID technology) • backup between network devices in the personal cloud • USB storage extension • print server

3http://www.iomegacloud.com/landing_page.php 4The iSCSI protocol was developed to transport SCSI packets over TCP/IP to provide an interoperable solution the com- munication. SCSI is a family of protocols for endpoint communication with I/O devices, especially storage devices. The initiator requests services from targets (e.g. logical units of a server) [ISC04] 5Bluez is the Bluetooth stack for Linux 6sobexsrv is a full Bluetooth OBEX server with Bluetooth security support. It implements OPUSH (put), OPULL (get) and OBEX-FTP (setpath + directory listing). 7See Appendix A 4. STATE OF THE ART 40

The following options are provided to share data:

• additional user accounts: can access own/shared files • individual folders for guests • select data for sharing or specific users • sharing of folders and files (similar to FTP) via invitations Akitio8 provides several Gigabit NAS devices and media server to facilitate the establishment of a personal cloud. When the NAS is initially connected to the local network, there are two ways to discover the NAS from another network device. The discover tool (Windows, Mac OS) sends a socket message on port 8100 to all connected network devices (the source code9 of the discover tool reveals further details). The NAS device can interpret the message and sends a proprietary response to the requesting client. Then, the discover tool shows the host name, IP address, MAC address and firmware version of the NAS. The second possibility consists in using the Safari Web Browser and Bonjour Bookmarks. Thereby, Bonjour can interpret multicast advertisements that the NAS sends and shows a link with the NAS name, which links to the web configuration interface.

The most common features for Akitio NAS devices are:

• file access via Server Message Block (SMB)10, WebDAV and FTP • media files can be viewed via UPnP-AV and DLNA clients (local network) • local/remote access and file browsing (file browser with built-in media player for web browser or iPhone/iPad) • USB 2.0 host port for USB drives (FAT32 read/write, NTFS read only) • print server ( USB 1.1 host port for USB printer) File access via SMB, WebDAV and FTP requires specific system tools that can handle these protocols. When SMB is mounted locally, it also provides storage properties (e.g. used and available).

The following options are provided to share data:

• additional user account: these users can access files in public directory or album section • guest account: local users connect via SMB, remote users connect via FTP (public directory/al- bum section) • send link of an image (plugin feature) to friends, who can view it in their web browser

8http://www.akitio.com/network-attached-storage 9http://www.akitio.com/software/nettool-nt2lan 10 SMB is a protocol that facilitates the sharing of files, printers, serial ports, and communications abstractions (e.g., named pipes, or mail slots between computers). [Sha02] 4. STATE OF THE ART 41

4.2 Research Projects

The Anzere system [ANZ] provides management solutions for data in the personal cloud to de- liver privacy and control, and resilience to provider failures. Anzere replicates personal data by a single user across multiple physical and virtual devices. Replication is based on a device- and content-neutral policy model that consists of a set of rules, filters, and constraints to control user data [RYJ+11]. Policies are expressed in Constraint Logic Programming (CLP). Beside a replica- tion subsystem (consistency and partial replication) and a CLP solver (knowledge base and reason- ing), a overlay network is responsible to collect information about each node in the personal cloud (see Figure4.1). The overlay network is built on Rhizoma11, a constraint-based runtime system that can can allocate and release virtual machine resources in a self-manageable fashion. Rhizoma uses CoMon monitoring system [PP06] that provides status information about individual nodes (e.g., node reachability, load, hardware specifications, free CPU, CPU speed, load, DNS failure rate).

Figure 4.1: The Anzere system architecture with overlay sensors [RYJ+11].

Sensors are running on every node. Storage sensors monitor policies and item metadata. Overlay sensors detect failures of nodes, and collect device information, such as device type and status, num- ber and type of network interfaces available, latency and bandwidth between any two devices. The monitoring information are divided into device, node and link information. These information are shortly presented in the following:

• device(devid,location,type,cost,processor,mem,disk) • olnode(hostname,cpuspeed,freecpu,fiveminload,mem,freemem,gbfree)

11http://www.systems.ethz.ch/research/projects/rhizoma 4. STATE OF THE ART 42

• ollink(src,dst,link-type,latency,bandwidth) The research projects in the field of community clouds concentrate predominantly on the distributed architecture of community networks, which are characterized by heterogenous devices, different de- mands and conditions for quality of services, and additional security issues. Most of the challenges rely on the modeling of the community infrastructure and therefore discuss the management of the distributed member nodes, identity and authorization mechanisms to facilitate the secure access of member resources ([MB09], [BS10], [BMA10]). The following research projects in the field of community clouds were examined with regard to the hardware and resource discovery, and the in- volvement of SLAs, and also contain information about their architecture and resource management.

CloudCDI [WN10] is a platform that enables members of a community to register, upload, update and share data with arbitrary formats in the Cloud. They developed a unified data model called Re- source Net Model, in which resource items can represent any data item uploaded by members. On the other hand, data items can represent structured data (e.g. tables), semi-structured (e.g. XML documents), or unstructured data (e.g. images). The platform allows users to register for one or more groups (1), which share all resource items uploaded by group members. A group resource view (2) is composed of all these resource items. Each member has a own resource view (2) and can only edit own resources (3). To merge changes of resources into the group resource view, members have to check in their modifications. The platform provides a scalable query model, which is based on Re- source Net Model and facilitates the discovery of resources of other community members by utilizing a simple keyword search or complex structural query (5), and automatic query recommendations (4). CloudCDI uses Hadoop12 as task coordinator and network communication layer. Hadoop is a highly scalable distributed system for data storage that provides, inter alia, status information about physical and virtual memory13, CPU, files, and the filesystem. It acquires access to application data through the distributed file system Hadoop Distributed File System (HDFS).

CloudCDI data exchange and share layer provides the following managers:

• (1): User and resource registration manager • (2): Resource view manager • (3): Update exchange Module • (4): Query constructor • (5): Query engine J.P. Barraca, one of the authors of the paper with the title “User Centric Community Clouds” [BMA10], is involved in the development of a community mesh network, which implements service discovery mechanisms for communities. The architecture of this network contains the module WIP-SD (Wire- less Service Discovery) [BFSR08], which provides service location mechanisms to find services and communities in local and remote wireless networks.

WIP-SD is based on mSLP (mesh-enhanced Service Location Protocol) [ZS03] that is an extension to the standard SLP (more information about SLP can be found in Appendix A). mSLP is used in a

12http://hadoop.apache.org/ 13http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/LinuxResourceCalculatorPlugin.html 4. STATE OF THE ART 43

Figure 4.2: WIP-SD as module for the community management in [BFSR08]. dynamic architecture with directories (directories are not mobile) that share information and a query- based service discovery. Four further functional blocks were added to overcome the shortcomings of mSLP to properly support communities:

• Mobility support: advertisement of communities and services based on multicast advertise- ments. • Community support: communities can be discovered, browsed and are used to discover other services. • Message propagation control: improves the protocol efficiency and scalability by single hop limit14 and traffic limit. • Caches: reduction of protocol traffic by obtaining information from overheard queries and advertisements. Event mechanisms in WIP-SD are based on directories that make discoverable communities and ser- vices available. Users can request the directories to get information about communities and their services. But when the services are not available for a particular time, a frequently polling of di- rectories should be prevented. Thus, users can subscribe to a service and get specific information about services, to which they subscribed, when the services are available. They also get information about services in distant locations. Figure 4.2 shows how the WIP-SD module is integrated in the overall architecture. A component for the communication is the Network Abstraction Layer, which is responsible for the communication between users, services, communities and the network stack. This component was not regarded further. The management of distributed information on communities

14For example, single hop limit for multicast messages in wireless broadcast networks to send messages to several nodes without any additional traffic. 4. STATE OF THE ART 44 is based on a Distributed Hash Table (DHT) that stores and delivers objects, and provides redun- dancy and caching mechanisms. Finally, the publication does not deal with hardware and resource discovery, or service level agreements.

Figure 4.3: Social Cloud Architecture [CCRB10].

An approach that combines cloud computing with social networks is presented in [CCRB10]. This paper describes an architecture for a social cloud, in which users provide resources as a service to friends (focused on storage service). Friends within a community are able to provide and consume resources by a Facebook application (social cloud application) that represents the hosting environ- ment for a generic service marketplace. Services are included in the marketplace by advertising their capacity, availability and pricing, and can be discovered by the Facebook application. The advertise- ment is an XML document containing the advertised capacity of any resource that a user wants to share, which is stored and updated in Globus Monitoring and Discovery System (MDS)15 (see Fig- ure 4.3). Globus is further described in Section 4.3. Their implementation details illustrate that the application requests a service or more precisely, a storage service, and gets a response in the form of JSON. If a user selects a cloud storage service, the application creates a SLA containing specific pricing and utilization informations, and passes it to the service to accept the agreement. In the SLA generation process the required service levels are transformed into an agreement using the SLA cre- ation component of SORMA16 (Self-Organizing ICT Resource Management). The SORMA market is a decentralized Open Grid Market that facilitates the easy integration of computing resources for heterogenous local resource managers. The included contract management converts bids and of- fers to SLAs and initiates the SLA enforcement that is responsible for the surveillance of contracts [Neu09]. After the passing, the storage service checks the local policy and current resource capacity. If accepted, the service instantiates the requested state by creating a representative resource and an associated working directory for each storage instance. The service monitors the negotiated service level and has interfaces to list storage contents, retrieve the amount of storage (used/available), up- load, download, preview and delete files. Furthermore, a banking service transfers credits between users, stores agreements they are participated in and information about current reservations. Such a credit-based system rewards users for contributing resources and charges users when they consume resources.

15http://www.globus.org/toolkit/mds/ 16http://sorma-project.org/ 4. STATE OF THE ART 45

In their paper “Volunteer Computing and Desktop Cloud: the Cloud@Home Paradigm” [CDPS09], Salvatore Distefano et al. have defined the computing paradigm Cloud@Home. This paradigm de- scribes the sharing of resources between users within a community with similar goals. The user hosts are active participants in a resource sharing process by providing locally existing resources for the community. By grouping the available resources of single users, a high capability of resources can be given, that is similar or higher as offered by commercial cloud providers. Cloud@Home provides possibilities for a combination with commercial clouds to establish a marketplace where users can buy and sell their resources or services. The Cloud@Home architecture (Figure 4.4) includes mech- anisms to reserve physical execution and/or storage resources that are utilized by execution and/or storage services. A hypervisor allocates and runs virtual machines, which are necessary to implement an execution service. The storage service make use of the Cloud@Home file system (C@H FS) to dedicate a portion of the local storage space to the Cloud. Because of the variable capacity of con- tributed storage resources, the file system can split data and files into chunks of fixed and variable size. Therefor, a directory servers, called storage master, are responsible for indexing distributed data stored in the associated chunk providers of multiple contributing host. The resource subsystem contains the execution Cloud and the storage Cloud, which gather all local and distributed manage- ment functionalities. The management subsystem provides a unique view on the virtual resources and offers execution and/or storage services.

Figure 4.4: Basic architecture of the Cloud@Home system [CDPS10].

Furthermore, Cloud@Home contains tools that are able to translate user requirements into physical resource demands that are passed to a resource provider. Requesting of resources also means to check their reliability and availability. If a matching resource was found, the SLA engine negotiates a specific QoS. The fulfillment of the negotiated QoS is observed by monitors [CDPS10]. Some of the participating resources have monitoring tools that can be requested from a remote node. This facilitates that the SLA Engine can process information from the monitored resources. Furthermore, mobile agents can monitor resources by moving between the virtual resources to collect respective performance values. A context service provides context information to applications that are executed in the Cloud. Context information can include the nature of the device, the quality and availability of 4. STATE OF THE ART 46 wireless connections, the power of a VM and the physical location [AAC+10]. Figure 4.4 shows that XMPP is used to exchange resource information between contributing nodes, but this is not further described in the publications. Technologies for mobile agents and context service were not further described.

The Cloud4Home project [KGS11] addresses data services (focus on storage, access, and manip- ulation of data) that leverage devices in home and office environments and also includes resources offered by remote clouds. If a low-end device accesses data of another device for the purpose of pro- cessing tasks, then it should be possible to perform the processing on other devices that are suitable for the execution of such services. Their approach is based on a virtual object storage system called VStore++. VStore++ is a set of services that implement methods for data storage and manipulation. It implements a dynamic overlay layer to facilitate the cooperation of home devices and active resource management, whereby a distributed key-value store enables the access of data and the monitoring of dynamic resources. Thereby, the VStore++ system handles data transparently and tracks the resource availability of home devices. The metadata and resource management layer provides the following features:

• lookups for available data objects and services in the distributed key-value store. • routing of request to an appropriate location. • tracks information about the availability of resources in the home environment and remote clouds. The key-value store contains object location and metadata (e.g., tags and access information), and is based on a Distributed Hash Table (DHT). They mention that the decision on determining the location for the execution of services and storing of data may be guided by resource availability and constraints, such as response-time SLAs and data privacy requirements. Furthermore, SLAs exist for different types of nodes.

The VStore++’s implementation is based on Xen hypervisor17. Virtual machines and remote nodes communicate through command based interfaces that use TCP/IP sockets. The data transfer between the host and the guest VM is realized by XenSocket, a socket-based module for high-throughput interdomain communication on the same system. A DHT-based key-value store is built on top of Chimera18 peer-to-peer system. The Cloud4Home prototype enhances Chimera with capabilities for dynamic overlay reconfiguration, caching, and replication. Chimera also constitutes a logical tree view of other nodes in the overlay network (see Figure 4.5). A node can register a list of services with the key-value store. Thereby, the service name concatenated with service ID is used as key, and a list of nodes supporting a service plus a service policy that is used as value. Furthermore, the resource monitoring is realized by the Linux libgtop library (more information about libgtop can be found in Section 4.3) and integrated in a resource monitoring utility module that was not further explained. Each node updates its resource information in the key-value store with the node ID (key) and serialized resource information structure (value) executed by the monitoring module. Updates

17http://xen.org/products/xenhyp.html 18Chimera is a light-weight C implementation of a structured overlay supporting similar functionality as prefix-routing protocols Tapestry and Pastry. 4. STATE OF THE ART 47 of resource information take place after a configurable time period. VStore++ calls the Chimera function (GetDecision()) (see Figure 4.5) to get a list of nodes and requests the key-value store for each node’s resource information. Finally, all received resource information are analyzed to find the most convenient node for a service request.

Figure 4.5: Resource Monitoring in Cloud4Home [KGS11].

The experimental evaluation shows that the dynamic request routing to find a matching node for a processing task results in performance gains. While the execution of small files can be performed on a requesting low-end device - because of sufficient processing power, the total computation time of larger files is smaller, if data is transfered to a device with higher processing power.

The project Clouds@Home19 aims to find methods to guarantee the availability of computational and storage resources for groups of volatile Internet resources. They present an optimization approach to reduce hardware costs for large-scale Web Services by distributing the workload on dedicated (highly available) and non-dedicated (can become unavailable at any time) hosts [AKA10]. For example, a personal storage application, such as Dropbox20, could utilize non-dedicated resources close to their users to process and store data. By passing specific tasks to non-dedicated hosts, the provisioning of resources becomes critical if hosts are temporarily not available. Thus, they require a small set of dedicated resources to prevent a complete service breakdown in a worst case scenario and to supply enough bandwidth for the service utilization. An operational model is presented to guarantee long- term availability of resources based on prediction and ranking approaches. Initially, the availability of non-dedicated hosts is monitored over a period of several weeks (10,000 host because of the large- scale scenario). The results are ranked and affect the composition of a working set of dedicated and non-dedicated host. The selection of the working set is repeated periodically to replace host and migrate data based on prediction values. The results show that if a service provider requests 50 hosts to fulfill a service provision, an allocation of 25 dedicated hosts out of 55 given will minimize total costs (migration rate not considered) at 3.60 USD/hour. The migration rate can be minimized by choosing more dedicated hosts. Finally, an allocation of 44 dedicated hosts out of 52 given results in

19http://clouds.gforge.inria.fr/pmwiki.php?n=Main.HomePage 20http://www.dropbox.com/ 4. STATE OF THE ART 48 a minimization of the migration rate.

They also developed a prohabilistic model [AKS10] to find matching Spot Instances21 based on user SLA constraints, such as availability, performance or job completion. With this model the user can get the bids that meet given SLA constraints. However, the discovery of hardware devices or information about hosts was not mentioned in their scientific publications.

Antonio Celesti et al. [CTVP10] present a horizontal cloud federation approach, with which smaller, medium and large provider are able to enlarge their capabilities by on-demand resource provisioning. Although this approach covers the resource discovery between clouds, it presents techniques that can be applied to personal clouds. The concept considers the federation of home clouds and foreign clouds. Both own virtual resources, but only the home cloud can compensate a resource deficit by renting virtual resources from foreign clouds. A discovery agent is responsible to discover free re- sources of all available clouds. The agent is implemented in a distributed fashion and follows the publish-and-subscribe (pub-sub) pattern. This means that information about services are published to a centralized location and authorized for consuming clouds that act as subscribers. To automatically discover a foreign cloud that provides the requested resources, the home cloud defines policies that consider functional (e.g. QoS metric greater than a threshold, resource availability for a particular pe- riod of time, etc.) and non-functional (e.g. the trusted identity of provider and services, the blacklisted clouds) properties. Information about resources of foreign clouds contain also functional and non- functional properties, and can be stored by the home cloud in native XML format within a cache. The communication between discovery agents is based on the presence concept, in which agents broad- cast information about supported features and resource status to subscribers. The implementation of the presence concept is realized with the Extensible Messaging and Presence Protocol (XMPP). To discover information about other clouds, a home cloud sends iq-messages (get) to all foreign clouds which subscribed. Each foreign cloud answer either with an error or an iq-message (result), which contains information about CPU, memory, storage, etc. Finally, a match-making algorithm analyzes these information to identify with whom a federation can be established. The architecture is based on a Virtual Infrastructure Manager, which automates the setup, deployment and management of the vir- tual environment, independently from the underlying Virtual Machine Manager layer that can include Xen, KVM, or VMware. The publication does not describe how the hardware and resource properties within a cloud are determined.

4.3 Tools

This section gives a more detailed description of tools and libraries (which in turn are based on specific tools) that provide the discovery of resource information. Some of these tools were already mentioned in the previous section. All of these tools can be used on Linux-based systems.

The Globus toolkit22 is an open software toolkit for grid computing that includes software services

21Spot Instances in Amazon Elastic Cloud Computing (EC2) are resources that are allocated if a user bids a specific price. If the instance prices is smaller than the bid price, the instance is made available for the user. Thus, the bid process influences the resource availability 22http://globus.org/toolkit/ 4. STATE OF THE ART 49 and libraries for resource monitoring, discovery, and management, security and file management. The information service component of the Globus toolkit is called Monitoring and Discovery System (MDS)23 and provides information about the available resources on the Grid and their status. MDS includes a framework (Aggregator Framework), which provides mechanisms to build services that collect and aggregate data. Data can be collected via a WSRF24 subscription/notification mechanisms (Subscription Aggregator Source) or by external executable programs (Execution Aggregator Source). The discovery of resources is based on two distributed monitoring systems: Hawkeye and Ganglia (see below). Information providers gather data from these systems and notify a Globus service, which publishes the resource information. The Index Service provides a query/subscription interface to access these information. Thus, the web-based user interface WebMDS can show all the resource information by querying the Index Service.

Hawkeye [Haw] is a monitoring and management tool for distributed systems and provides mecha- nisms to collect, store, and utilize information about a system or a collection of systems. It consists of modules that monitor specific system information, such as free disk space (based on Unix df pro- gram), system memory usage, system load (based on Unix uptime and w programs), etc. Hawkeye has to be configured to run the modules and can be started as a process via command-line. The documentation is only restricted to the most important information (“Documentation - Better docu- mentation for Hawkeye is being developed. ”) and does not reveal how it can be integrated into a programming environment (see Globus toolkit above) to utilize the system information and monitor- ing functionality.

Ganglia [Gan] is a scalable distributed monitoring system (modules are written in C and python) for high-performance computing systems (e.g., cluster or grids). The Ganglia monitoring system is divided into three main parts: gmond, gmetad and ganglia-web. gmond (Ganglia Monitoring Dae- mon) is a daemon that must be installed on every system of a cluster that wants to monitor resource information. It can send/receive information, which supports the announcement of relevant changes or listen to information from other systems. Discoverable resource information include machine type, CPU properties, memory properties, disk space (free/total), etc. gmond transmits (unicast/multicast transport) information via external data representation (XDR25) as UDP messages or XML via TCP connection. gmetad (Ganglia Meta Daemon) polls gmonds of other nodes periodically to collect dis- tributed system information, saves the information in a database (RRD26) and transfers all information as XML file over a TCP sockets to clients. ganglia-web is a PHP web frontend showing information via real-time dynamic web pages. It receives a XML tree from gmetad containing all system informa- tion from the nodes and has to be installed on the same machine as gmetad to access the RRD files. The documentation does not reveal how it can be integrated into a programming environment (see Globus toolkit above) to utilize the system information and monitoring functionality.

A GNU/Linux program to discover device information is udev. Udev is built on hotplug27 and main-

23http://www.globus.org/toolkit/docs/4.0/info/key-index.html 24Web Services Resource Framework (WSRF) is a generic and open framework that provides the modeling and accessing of stateful resources using Web services. WSRF v1.2 is an OASIS standard. 25External Data Representation 26Round-Robin Database 27hotplug is a GNU/Linux program that notifies user mode software when hardware-related events occur (e.g. plugged in a USB device). It is also useful to load and setup device drivers and packages, and specifics kernel modules [UKA]. 4. STATE OF THE ART 50 tains a dynamic device directory that only matches the devices actually present or connected with the system. If a device is added to the system, udev is notified to read the device information. Therefor, it reads the sysfs directory of the added device and writes the device attributes (e.g., label, serial number or bus device number) to a database (maintains device node files located in the /dev/... directory) that contains all devices of the system. If a device is deleted, udev gets the notification and informs the database to delete the device. Udev is executed entirely in the user space. Programmers can use the libudev (C programming language) library28 directly or pyudev29 (Python programming language), which is a Python binding to libudev to discover device information.

Libgtop30 is an independent library (functions and retrieved information should not depend on a spe- cific operating system) that provides specific data of a system, such as CPU, memory and information about running processes. It also delivers a mount list that contains name, mount directory and type of a device. Libgtop is used by gnome-system-monitor and the system monitor applet. Similar func- tionality is provided by the python package python-gtop31. This module allows the use of the Gtop system information library in Python applications.

Table 4.1 summarizes the most important information about the tools presented above. The table also contains the library Libgtop to demonstrate the differences.

In the case of monitoring a single machine, table 4.1 shows that udev, together with libgtop, can provide a similar functionality like the monitoring tools. But such a combination does not deliver storage information. The Linux program df32 (a GNU version of df) can deliver the amount of disk space available on the file system and would close this gap.

28http://www.signal11.us/oss/udev/ 29http://pyudev.readthedocs.org/en/v0.14/index.html 30http://developer.gnome.org/libgtop/stable/ 31http://packages.debian.org/de/sid/python-gtop 32http://linux.die.net/man/1/df 4. STATE OF THE ART 51

Supported Udev Libgtop Hawkeye Ganglia Features Autodetection of yes no no no plug-in hardware (e.g. USB) Device yes partially not machine type information (devices (just device mentioned in /dev; name, type vendor, mountdir ) serialId, ..) CPU information no yes reports yes (total, user, the busiest nice, ...) processes on the system Memory information no yes monitors yes (total, used, system memory (free/total/ free, ...) usage shared/cached) Storage information no no monitors disk space free disk (free/total) space Network Monitoring no no yes yes Integration C/Python C library not not libraries mentioned mentioned (contains local monitor)

Table 4.1: Features of tools and libraries that retrieve hardware and deliver resource information. 4. STATE OF THE ART 52

4.4 Summary and Discussion

This chapter presented an overview of products in the field of personal clouds, research projects (in the field of personal, community , social , and home cloud) that investigate in the sharing of resources or data. Not all of the research projects specify detailed information about how hardware and resource information of users can be discovered. Nevertheless, they reveal how they utilize these information and thus can deliver useful insights that could be helpful for the conceptual part of this thesis. Table 4.2 summarizes the most important technological features of the products and research projects mentioned in this chapter.

The two products Iomega and Akitio find devices via protocols such as Bonjour and UPnP. Both can extend their storage capacity through USB extension, but the product descriptions have not revealed how such a storage extension is implemented. Devices, as well as storage hardware (iSCSI devices), can be discovered via iSNS and a procurement of resource information can take place by mounting the storage resources. The personal cloud system Anzere [ANZ] is mainly a policy based replication system for personal data accessible by multiple devices. To replicate data, the system has to know the resource information of each node ( a specific device in the personal cloud), which are collected by an overlay network. Each node needs overlay sensors that discover device information, such as device type and status, CPU, memory and storage information. CLOUDCDI [WN10] utilizes re- source information [WN10] are used to construct a Resource Net Model that contains resource and data information of community members. The individual resource information can be merged into a group view that facilitates the discovery and sharing with other members. This approach uses Hadoop as network manager to retrieve data files. A possible application would be to extend the model with resource properties, which can be provided by Hadoop. But the disadvantage in general is the de- pendency on Hadoop on every node (see Chapter 3.2 - installation of additional client-side software libraries should be prevented ). The module WIP-SD [BFSR08] is responsible to discover services and communities based on an extension of mSLP. It is grounded on directories that also facilitate the service discovery in distant locations. Service utilization takes place by requesting a directory and/or in a publish-subscribe fashion based on event mechanisms. This approach is not listed in Table 4.2, because the publication does not deal with system resource and hardware discovery. But it demon- strates how a collaboration based on service advertisements can be established to discover services. If communities would advertise their storage capability as a service, other communities could find and potentially utilize them. Iomega can advertises their storage service with iSNS and FreeNAS33 with Avahi, which would facilitate the discovery of this services. The presented social cloud archi- tecture [CCRB10] discovers resources, in particular storage resources, by utilizing the Globus toolkit in a marketplace scenario. The resource information, such as capacity, availability and pricing, are requested from the service (client side) and visualized through a Facebook application that can cre- ate SLAs with resource demands and passes them to the service to accept the agreement. Then, the service creates storage instances and monitors the negotiated service level. Details about the tech- nologies for device and resource discovery were not mentioned. Cloud@Home [CDPS09] provides the sharing of resources between users of a community. Contributing hosts have virtual machines, and

33FreeNAS is an open source storage platform that supports the sharing of storage resource and includes Avahi to advertise several services. 4. STATE OF THE ART 53 a hypervisor to allocate execution resources, and a chunk provider to manage storage resources stored as data chunks. Both, execution and storage resources, are monitored and are made accessible via services. Cloud4Home [KGS11] implements a DHT-based key-value store, which provides a logical tree view of other nodes in the personal network. Each node updates its resource information (mon- itoring with libgtop) in the key-value store within a specific time period. A dynamic overlay system can now request a optimal node for specific resource demands and thus migrates data processing jobs to qualified nodes. The project Clouds@Home [AKA10] present a model to guarantee long-term availability of resources in a network consisting of dedicated and non-dedicated hosts. They apply prediction and ranking approaches to determine the amount of non-dedicated host, beside a specific amount of dedicated hosts, to fulfill a complete service provision. Thus, a resource provider should be able to save cost by utilizing specific non-dedicated hosts. Clouds@Home is not mentioned in Table 4.2, because it reveals too little details about their technical solutions. But it is an interesting approach, which provides the reduction of costs by utilizing volatile devices. A personal or busi- ness cloud consisting of many volatile devices, such smartphones, tablets, or laptops, could benefit from this approach. To fulfill specific services in such a network, the replication of data and services needs to be addressed too. Antonio Celesti et al. [CTVP10] present an interesting approach to find a matching resource provisioning cloud by applying the publish-and-subscribe pattern. A home cloud that has a resource deficit requests resource information (via XMPP) of all foreign clouds that have subscribed for resource provision and calculates, which on is the most suitable for resource allocation. This approach is more concentrated on cloud-to-cloud resource provision, but the idea to transfer re- source information via XMPP is also an interesting procedure, that could be applied to a personal cloud scenario. A possible solution would be the installation of an XMPP server in the local network that receives and propagates resource information from/to local devices. This would allow commu- nity members to share resource information via their chat clients, which effects their behavior when they transfer files between their devices. A synchronization of directories based on XMPP accounts is also possible, which is implemented by tools, such as DVCS-Autosync34 or xmppbox35. Because communities usually communicate via chat systems, additional chat software providing synchroniza- tion features installed on client device would not mean that to much technical efforts are necessary to communicate. But this does not resolve the disadvantage of additional tools that have to monitor the resources (e.g., storage capacity).

Section 4.3 presented libraries and tools that deliver different information about hardware. The library libgtop includes functions to find information about CPU and memory, and some few information about devices of a machine. For the discovery of detailed device information, the Linux tool udev provides useful information. There also exist libraries to access the udev database (libudev, pyudev). This libraries also have monitors to subscribe for specific device events. The missing storage infor- mation can be provided by the Linux tool df that collects information about the disks of a machine. Hawkeye and Ganglia are monitoring tools that gather resource information about CPU, memory, or storage and can be used to monitor grid or clusters.

It can be seen that the discovery of devices in the local network is mostly based on services that ad- vertise their resources. Such an approach supports a dynamic network consisting of different devices

34http://mayrhofer.eu.org/dvcs-autosync 35http://xmppbox.5sh.net/ 4. STATE OF THE ART 54 with individual resources. Thus, the conceptual part of this thesis will consider service advertise- ments. The discovery of hardware and resource information of the main machine can be achieved with tools such as udev and df, and libraries such as libgtop. It was shown that the discovery of storage information of other devices can be achieved by the mounting of the storage resource (see usage of SMB or iSCSI). Because of the requirement that the main focus of this thesis is storage dis- covery and secondly, the installation of additional client-side software libraries should be prevented, this approach will affect the concept. That does not mean that information about CPU and memory are ignored totally. The system component should be open to integrate additional sub-components that deliver such information. For example, the monitoring tool Ganglia could be integrated - if the project requirements hypothetically allow the additional overhead on the network nodes. 4. STATE OF THE ART 55 [CTVP10] Home Cloud Cloud Federation - - - via XMPP (pub-sub) yes [KGS11] Home Cloud Data Services via Service Discovery - libgtop key-value store (DHT) yes [CDPS09] Home Cloud Volunteer Computing and Desktop Cloud Context Service - Mobile Agents REST/SOAP XMPP yes ucts and projects. [CCRB10] Social Cloud Cloud Computing for Social Networks MDS (via Service Discovery) - MDS (Hawkeye, Ganglia) XML via MDS yes [WN10] Community Cloud Community Information Management - - (Hadoop for data discovery) Views/ Queries - [ANZ] Personal Cloud Policy-based Data Replication Overlay Sensors (CoMon) Overlay Sensors (CoMon) Overlay Sensors (CoMon) Overlay Network - Table 4.2: The most important features of the presented prod Akitio Personal Cloud Data Sharing Bonjour, UPnP (USB Storage Extension) SMB (via mount) Client System Tools Web Interface - Iomega Personal Cloud Data Sharing iSNS, UPnP iSNS, SendTarget (USB Storage Extension) iSCSI (via mount) Client Software (Linux Packages) - Features Cloud Type Main Topic Device Discovery Hardware Discovery Resource Information Discovery Resource Information Exchange SLA 5. SYSTEM DESIGN 56

5 System Design

This chapter presents the most important design decisions for the Resource-as-a-Service component. The component has the name FlexiSource, which stands for a flexible device and resource integration in the context of a resource sharing community. Initially, the global process of the resource discovery and the provision for the community is described. The following section explains in detail the concept for the discovery of resources in local networks and short distance connections. Then, the resource management is presented, which includes the description of the resource plan, which is synchronized by device and service events. The task to assign the resources with trust metrics was solved with a resource labeling procedure that is also presented in the chapter about the resource management. Subsequently, the data management is discussed in short, followed by the solution for the API. Finally, the most important design decisions of this chapter are summarized.

5.1 Global Process and Delimitation

Figure 5.1 illustrates the global process of the resource provisioning in a personal cloud. It starts on the left side with the discovery of devices and resources, that can be own or foreign devices (e.g. a friend’s device). They correspond either to network devices, such as storage server, or peripheral devices, such as USB hard disks. The device and resource information are delegated to the resource management, which instructs the operating system to take control of the resources (in the following, such a mechanism is called mounting or to mount) and determines their resource information. To maintain resource descriptions the resource management synchronizes device and resource informa- tion with a so called resource plan. The trust metrics are created with information of the individual members of a community and verified according the Public Key Infrastructure (PKI). A mapping of resources to community members is provided through a resource labeling procedure. The data man- agement is just responsible to fetch/attach a resource label to the resource of a mobile device, but also provides the potential to transfer further data. The resulting system design serves a community to provide, consume and share resources in a trustworthy fashion.

The following delimitations are applied to the system design:

• the focus lies an storage resources • for simplification, the discovery (Requirement R-DD-1), provision and modification (Require- ment R-DD-3) of datasets are related to the data necessary for the resource labeling 5. SYSTEM DESIGN 57

Figure 5.1: The global process of the resource provisioning in the personal cloud.

5.2 Resource Integration

This section presents the approach for discovering devices (e.g., mobile phones or USB storage de- vices), resources they provide, and resource information. The requirements analysis ( Chapter 3) has shown the dependency to applications, which want to receive information about devices and their resources. Thus, FlexiSource has to gather, process, prepare, and provide this information. This procedure can be summarized as the term Resource Integration. The foundation of this integration depends on different functional mechanisms to discover devices or resources, and different technolo- gies, which are therefor utilized. To provide functionality that can be extended in a flexible way to split the architecture for the Resource Integration into functional units that provide different infor- mation about devices and resources, and utilize different discovery mechanisms.

The underlying functional mechanisms are based on communication protocols and tools, which are able to deliver the information mentioned above. A discovery of mobile devices is possible through several Bluetooth libraries and tools, that can be used to automate the discovery process in an pro- gramming environment. Such libraries and tools are able to scan the near field environment for Bluetooth devices and deliver a list of detected devices and device information. As mentioned at the end of Section 4.4, service discovery protocols should be used to announce specific storage re- sources of the local network. Bluetooth implements communication protocols to exchange data, such as OBEX FTP and OBEX Object Push, and is able to announce and discover the presence of these protocols on discovered devices. Because these communication protocols facilitate the utilization of storage resource, a categorization is performed that applies the Bluetooth communication protocols to exchange data to the category of storage resources. A further possibility for the discovery of stor- age resources of the local network is to use service discovery protocols, such as Avahi and Bonjour, mentioned in Chapter 4. These protocols can be used to discover storage resource announcements and deliver the related machine represented by the IP address, and a port or directory, with which a connection can be established. udev notifies about USB events and delivers information about device subtypes, such as disk or partitions. Storage resource information of personal network devices (can also include personal machines) can be determined based on mounting the resource in the file system and analyzing them via the df tool (libgtop). 5. SYSTEM DESIGN 58

The next sections describe in more detail the behavior of functional units, the description of functional units, and the discovery of device and resource information.

5.2.1 Functional Units

As mentioned previously, functional units have different mechanisms and technologies to discover devices, storage resources and related information. It is therefore possible that two different functional units deliver information, which are similar to each other. For example, the controller in Figure 5.2 requests addresses for storage resources from the functional units A and B, but one unit provides the addresses of OBEX services and the other one the addresses of NFS services. The reason for this situation could be that the functional unit A implements Bluetooth and just finds related services, such as OBEXFTP, and functional Unit B implements Avahi that can discover announcements on services, such as the advertised location of NFS services. Thus, functional units have interfaces and implement the same call syntax if the delivered information are similar.

Functional units can also implement mechanism for the data transfer between devices. This is appro- priate in cases such as when a session is required to transfer data without authorization procedures for a single data transfer. The mechanism would implement a specific session that can be initialized, followed by sequence of data transfers and finally terminated.

Functional units implement the following mechanisms:

• discovery of devices and resources by common protocols • discovery of resource information by common protocols • transfer of data by common protocols For the dynamic local network, information about devices and resources are detected automatically based on the mechanisms that functional units implement. Functional units store the actual state of devices and resources, which can be requested within a synchronous communication model. Further- more, if devices join or leave the local network, respectively resources, the according functional unit sends a message to the controller, which contains information about changes, based on notification mechanisms. This prevents that the controller has to frequently poll the units.

Figure 5.2: Notifications of functional units. 5. SYSTEM DESIGN 59

5.2.2 Functional Unit Description

The integration of the functional units is based on a description file (Figure 5.3) that contains spe- cific information about the respective functional unit. These information are necessary to access the functional units and contain the following information as key-value attributes for every unit:

• Plugin: “Plugin” is the short form (or synonym) for a functional unit and describes the name of the functional unit. • Directory: The value describes the directory, where the functional unit can be found and exe- cuted. • Command: The value describes the command, with which the functional unit has to be initial- ized. • Address: The value describes the location (machine address) of an executed and accessible functional unit. Each functional unit can have several descriptions of services (e.g. storage services or storage re- source) that it can discover.

• Protocol: This describes the protocol, with which a discovered service is accessible. The value of the protocol must fit into a global name convention. • Parameter: The Parameter describes the call parameter for a specific method of the functional unit. As one can see in Figure 5.3, the controller has access to the description file. It parses the file, initializes the described functional units and can call methods of the functional units to get device and resource information. The request methods for different functional units can be aggregated, because of the common interface structure for units that deliver similar information. The following sections describe the request methods and the response content in more detail.

Figure 5.3: Description of functional units. 5. SYSTEM DESIGN 60

5.2.3 Remote Device and Resource Discovery

This section explains the concept for the automatic discovery of machines of the local network and the resources, respectively resource services, they provide. In the chapter of this section, the “remote device” is such a machine, which represents a device unit offering, inter alia, resource services.

A functional unit that delivers addresses of remote storage resources (resource discovery) is auto- matically coupled with the device discovery. The mechanism based on Avahi should scan the local network for services that announce storage resources and catch the services of a specific type. The found services automatically contain device information (e.g., name, addresses, or port). Avahi also supports the search for workstations (the Avahi parameter for this is type=_workstation._tcp and could be used, see Table 5.1) and delivers information about the workstation, but without service descriptions. First of all, there is a major interest in provided resources, which means - in the case of Avahi - that devices without service information are not necessarily useful. On the other hand, recognizing all accessible devices - based on the discovery protocol and eventually without service information - can be suitable to cover the overall context of available devices from the perspective of the user. Thus, a single discovery for devices is provided too. The request methods (Table 5.1) for a functional unit that implements the Bluetooth protocol are the same, but the internal behavior is different. For example, a mechanism based on the Bluetooth protocol and a specific service type should start to scan the environment for accessible devices and if a device was found, the mechanism starts to search for specific services on the found device. Thus, functional units only need the service type as parameter to start the discovery on specific resources, and deliver besides the service address and further service attributes, the device information, such as the name, address or type of the device.

Table 5.1 describes the request methods and parameters for functional units that implement device and resource discovery mechanisms. The method startDiscovery starts the discovery of services of a specific type and executes a frequently scan of the environment in the background, which provides that device and service information are up-to-date. The method getService returns all available ser- vices of a specific type in a synchronous fashion. The method getDevices delivers information about accessible devices in a synchronous fashion. If the device already has discovered services, service information are contained in the response. The method notify informs the functional unit about a re- quester, who want to receive events about services (e.g. added, deleted, or modified services). When- ever an event occurs, the requester will be notified. A notification registration can be terminated by the method stopNotification. Finally, the method stopDiscovery halts the discovery of devices, or services of a specific type. The response for the methods startDiscovery and stopDiscovery contains information about the success of the initialized action. The methods getService and notify deliver information about the requested resource services and their devices.

The request parameter for different functional units can be different, because they generally search for different resource service types.

For the execution, the controller (see Figure 5.2) should initially call the method startDiscovery, which activates the environment scan and maintains an internal database depending of the functional unit. Thereafter, the methods getServices and getDevices, which access the database, should be executed. The reason therefore is that the scanning of the environment is a time consuming process 5. SYSTEM DESIGN 61

Request Request Description Response Methods Parameter startDiscovery This request facilitates the discovery success information of devices (e.g., _workstation._tcp), or services of a specific type (e.g., FTP,NFS,OBEXFTP), and scans the environment in the background. getServices This request delivers information resource service about services of a specific type and (e.g., FTP,NFS,OBEXFTP) device information in a synchronous fashion. getDevices This request delivers information device information about accessible devices in a synchronous fashion. notify This request notifies the requester resource service about device and service events and (e.g. added, deleted, or modified services). device information stopNotification This request halts the notification success information for device and service events. stopDiscovery This request halts the discovery success information of devices, or services of a specific type.

Table 5.1: Request methods of functional units for the remote device and resource discovery.

(Bluetooth does not use notifications), which would result in a problem of efficiency if the functional unit scans the environment each time the method getServices or getDevices is called.

The discovery of services for a specific device through an additional method is not listed, because the design is based on notifications - and thus, the overall device and service information are always up-to-date.

The response content (Table 5.2) for the methods getService, getDevices and notify has to have the same content elements. This approach facilitates the implementation of a single response handler on the side of the controller (see Figure 5.2), that can handle synchronous and event based responses. While the method notify returns just one device with zero or multiple services, the methods get- Services and getDevices return multiple devices (a list of RC) with zero or multiple services. The response content for the elements (device or service event) differ in the possible options. This options are necessary to inform the controller about the nature of an event. The option all means that the response contains all devices and related services stored in the internal database of a functional unit. The option add means that a device or service was added (notification). The option delete means that a device or service was deleted (notification). The option delete means that a device or service was modified (notification). Functional units indicate their name by the element . All other response elements are adequately described and documented in Table 5.2. 5. SYSTEM DESIGN 62 ] ] add, delete, modify add, delete, modify notify options:[ options:[ ] ] / all all getServices getDevices options:[ options:[ te device and resource discovery. or ) ). Bluetooth Plugin IPv4, IPv6 Description Describes the name of the functional unit (e.g., Avahi Plugin Describes the event for a specific device. Describes the protocol that the functional unit implements. Describes the discovered device name. Contains a list of device addresses (e.g., Contains all services, for which a specific event occurred. Describes the event for a specific storage service. Describes the protocol a specific storage service. Describes the specific address or a directory. Describes the port or channel of the device, through which the service is accessible. Table 5.2: Response content of functional units for the remo Response Content (RC) 5. SYSTEM DESIGN 63

5.2.4 Local Device and Resource Discovery

This section explains the concept for the automatic discovery of storage devices of the machine, on which FlexiSource is executed. Such a device discovery retrieves connected USB storage devices and installed hard disks, and the partitions they contain.

The mechanisms for the discovery of local devices are encapsulated in a functional unit. In contrast to the remote device discovery (Section 5.2.3), the information about the local devices are collected from system (e.g. the Linux kernel manages device nodes and device information can be requested from udev) and can be provided immediately. The specification of the basic request methods and the response content is described in the following text.

Table 5.3 describes the request methods and parameters for functional units that implement mecha- nisms to discover storage device of the main machine. The method getDevices delivers information about accessible devices depending on a specific subsystem. This subsystem can be a USB storage device (USB_BLOCK) or an internal block device (INT_BLOCK). An important conceptual detail again is the notification about device events. The method notify informs the requester about added or removed devices depending on the subsystem. The implementation should check if the device is pluggable and so accept the notification request. A notification registration can be terminated by the method stopNotification.

Request Request Description Response Methods Parameter getDevices This request delivers information device information about devices of a specific type (e.g., USB_BLOCK,INT_BLOCK) in a synchronous fashion. notify This request notifies the requester device information about device events (e.g. added or removed). stopNotification This request halts the notification success information for device.

Table 5.3: Request methods of functional units for the local device and resource discovery.

A functional unit for the local device and storage resource discovery should also deliver resource information (e.g. total, used, or available capacity), but it will not be enforced. This could be practical because the dependency to other machines of the local network does not exist, which will allow the smooth combination of device and resource information discovery.

Table 5.4 shows the response content for the methods getDevices and notify. Both should provide the same content elements. The method getDevices returns multiple devices (a list of RC) that consist of a disk and multiple partitions. One device is delivered by the method notify. The methods have different options for the response element event to characterize specific device events. The method getDevices just provides the option all, which means that the device is one of multiple discovered devices collected from the internal database. Two options are provided from the method notify. The first is the option added, which means that the device was added to the system and the second 5. SYSTEM DESIGN 64 is removed, which means that device was removed. The element device_node describes the node name of a storage device. A device is describe by product specific attributes that are contained in the element device_attributes. Finally, the element disk has multiple partitions that contain resource information, such as total and used storage capacity, and the directory where the listed partition is mounted.

Response Description Content (RC) Describes the name of the functional unit (e.g., Udev Plugin) Describes the event for a specific device. Describes the protocol or tool that the functional uses. Describes the type of the device (e.g., USB_BLOCK,INT_BLOCK) Contains the device node provided by the system (e.g., /dev/sda, /dev/sdb ) Contains specific attributes of a device. Describes the product id of the device. Describes the product manufacturer of the device. Describes the product name of the device. Describes the serial number of the device. Contains all partitions of a disk Contains specific attributes of a partition. Describes the total capacity of the partition. Describes the used capacity of the partition. Describes the available capacity of the partition. Describes the percentage used capacity of the partition. Describes the directory where the partition is mounted.

Table 5.4: Response content of functional units for the local device and resource discovery. 5. SYSTEM DESIGN 65

5.2.5 Resource Information Discovery

In this section, further details about the discovery of resource information are explained. Information about the local storage resource are provided by the functional unit described in section 5.2.4. But if an event occurs that notifies about a new plugged in device that is not mounted, the resource informa- tion would not be accessible. In this case, the job to mount a resource is delegated to the Resource Management. The reason for this is that the functional units should not have the privileges to mount a resource and thus retain their flexible status. What is also missing are the resource information of the other devices of the local network. To be eligible for this information, resources must also be mounted, which can be facilitated through the resource information (e.g. device address and storage directory) delivered by the functional units. Figure 5.4 illustrates the transfer of device and resource information to the resource management. Thereby, the controller notifies the resource management about device and service events. The notification contains the device and resource information and the event type. If a new device was found that contains provided storage resources, the resource man- agement has to determine where the resource should be mounted. It creates a directory and attaches the resource (e.g. USB partition, local NFS, mobile phone shared folder) to this mount point. The resource information of a mounted resource can be identified by utilizing system tools, such as the df program. More information about the resource management can be found in Chapter 5.3.

Figure 5.4: Information discovery by the resource manager.

5.2.6 Manual Resource Integration

Bluetooth is standard for mobile devices and can be used discover storage services, such as OBEXFTP. The same is true for technologies that determine storage devices of a machine, such as udev. Service discovery protocols that use the mDNS/DNS-SD technology are available for some storage service , but not for all, as can be seen for the Iomega personal cloud (see Chapter 4.1). If the machine does not support such discovery protocols, the manual setting of a service address has to be possible. The application programming interface provides methods to connect with a storage service under the precondition that the underlying communication protocol is implemented. Such a method takes as 5. SYSTEM DESIGN 66 parameter the resource address (e.g. 10.0.0.218:21), the resource type (e.g. a filter for storage re- source) and the communication protocol (e.g. FTP). Internally, it is checked if the specified resource can be integrated and returns a success or error message. Such a method call is implemented as a synchronous communication.

5.3 The Resource Management

The management of the discovered or manually specified resources of the personal cloud is one of the most important task of FlexiSource. The resource management has to handle the different information from the resources or the resource services. As mentioned in Chapter 5.2.5, some of these information are delivered by the controller, who sends notifications about storage services and devices joining or leaving the personal cloud and resource information of the machine FlexiCloud is installed. If a mobile phone is detected that provides a service, such as OBEXFTP, the device and resource information can be stored immediately. But service discovery implementations, such as Avahi, are decoupled from the actual storage service (e.g. SFTP). This means that the accessibility of the service has to be validated. In this case, the resource must be mounted into the system and if this procedure is successful, the storage service is valid. If a automatic discovery of a specific service is not possible, service information can be sent manually to the resource management that has to check whether the desired resource is valid. Also in this case, the passed information are correct if the resources can be mounted into the system. After that, the complete resource information are integrated in a resources plan that contains all detected devices and their storage resources.

The following sections reveal details about the concrete device and resource information that are synchronized and maintained within a resource plan, and finally, present the concept for the trust metrics provided by the resource labeling procedure.

5.3.1 Resource Plan

The resource plan provides an overall view of devices of the personal cloud and attached storage de- vices. It contains information about device and resource specific properties and their present status, and is maintained as a database depending on the delivered resource information. The synchroniza- tion of the resource plan depends on device and service events, the manual resource specification, and the attachment of trust metrics that is described in the following section. By making these resource information accessible to other cloud applications, the resource plan lays the foundation for a resource sharing community. The following text describes the necessary features to meet this requirement.

The main components of the resource plan are machines. A machine is either the main computing device, on which the FlexiSource component is installed, or other computing devices, which provide resource services. Machines have the following elements:

• identifier: In order to identify a machine for an application request, they have a unique identifier. • name: The host name of the machine for personal cloud or near field devices. 5. SYSTEM DESIGN 67

• type: The type of the machine (e.g., Smartphone, Laptop, etc. ) • description: An additional attribute to textually describe the machine. A machine has zero or multiple resources, that represent the provided storage services. Thereby, each resource consists mainly of the features that were discovered by the plugins or manually specified. Because the main focus lies on storage services, the following elements of resources are focused on storage resources and mainly contain “neutral” elements:

• id: Each resource has an unique identifier, which is independent from the machine. An appli- cation can use the identifier to request specific storage information. • ownership: A resource may have a owner identity, which is identical to the identity of a member of a community. If the value of the owner identity is set, the resource can be provided to the community. This element also includes a trust value specifying the reliability of a community member. • type: Reflects the resource type. Because of the requirements, the value is “storage”. • protocol: The protocol of the communication protocol, which facilitates the access to a specific storage service. • mountdir: The directory where the resource is mounted. • capacity: This element contains information about the storage usage (e.g., total, used, avail- able). For each devices, availability information can be set by the user: • type: Specifies the availability type (e.g. permanent, durable, volatile). • start: The date for start of the resource provisioning. • end: The date for the end of the resource provisioning. • notAvailableSince: Informs about the date, on which a temporarily not available device was the last time accessible.

5.3.2 Resource Labeling

The automatically discovered or manually specified resources are stored and maintained in a database so far. Under the precondition that the resource information can be requested by other applications that visualize or utilize storage resources (the programming interface for those applications is de- scribed in Chapter 6), applications can also send requests to assign specific characteristics of utiliza- tion to a resource. For example, if a detected device contains storage resources, that the user, who manages the personal cloud, would like to share with some friends of a community, he or she can use an application to choose the related action “share resource”. Such a behavior would initiate an action to set specific attributes of a resource in the resource plan. These attributes can be analyzed and are used to allow other community members to access/utilize the provided resource. If just the own re- sources of a single member are behind a unique IP address that is registered in the social network and can uniquely identify the member, the member can sign a sent value with a secure private key, and 5. SYSTEM DESIGN 68 sends the resulting signature and the value to the requester. The requester would decrypt the signature with the members secure public key and compares it with the value, and if the values are identical, the member’s identity is valid (such a procedure is a part of a Public Key Infrastructure (PKI)). But what happens if the member set the sharing attribute for a mobile device (e.g., smartphone, USB device) that he or she also carries with when meeting community members physically to share resources. In such a situation, the modification of the resource plan cannot be used to allow a friend the access to the storage resources of the mobile device, because it is decoupled from the member’s personal cloud. Furthermore, how can be automatically detected that a device integrated in the personal cloud is a friend’s device. This consideration leads to the requirement to attach information about the owner to the device.

As mentioned in Chapter 3, the social network is able to deliver information to securely identify community members. Such a procedure is based on Public Key Infrastructure (PKI), whereby public keys of members can be securely distributed by a social network application to verify messages of the members. Figure 5.5 illustrates the PKI for two members of a social network. Member one is the owner of the personal cloud 1 and member two of the personal cloud 2. The members securely transfer their public keys (P1, P2) to the social network, from where they are accessible for the community. Both personal clouds consist of different resources, but the resource E and the resource F are separated from their owner’s personal cloud and physically located in the personal cloud 1. Member two has encrypted some data with his private key S2 and stored the data in resources E and F. Member one is able to access the data of the resources E and F, and by securely fetching the public key P2 from the social network, he or she can verify the reliability of that data. The following approach uses this technique to attach a resource label to a resource of a device, with the goal to identify the owner of that resource. This also means that the device and resource information must be considered for the resource label.

Figure 5.5: Public Key Infrastructure (PKI).

This approach assumes that the underlying key exchange mechanism is secure and that a public key can be uniquely mapped to a community member. Furthermore, the presented approach should be unique for all devices of the personal. Thus, the solution is based on the device with the least provided communication features and in this case, such a device is a USB storage device. 5. SYSTEM DESIGN 69

What means resource labeling? The resource labeling procedure attaches the user identification to a resource of a device. For this purpose, the unique device properties are determined (e.g. udev can deliver information about productID, the vendor and a serial number) and written, together with the user name of the member, into a description file. Then, a hash value is generated from that description file and encrypted with the members private key. The encrypted hash value is the signature of the description file. The description and the signature are stored in the resource. Then, the signature can be verified as mentioned above.

A subcomponent of the resource management, the resource labeling, has to provide the following procedures for the resource labeling:

• sign: A signature is generated through a label, consisting of unique device and resource infor- mation, and a private key. • verify: The verification process accepts or rejects the message and uses therefor the message, the public key and the signature. A subcomponent of the resource management, the resource ranger, provides the following proce- dures:

• create: Creates the resource label. • fetch: Fetches the label of the resource. • attach: The label and the signature are stored in the resource. • remove: The label and the signature can be removed from the resource. Figure 5.6 describes the interaction of the above mentioned procedures. The left role illustrates a device of the personal cloud. Device can be foreign devices of friends or a personal devices. The comments on the left illustrate the start of a sequence of actions. The role resource ranger is, inter alia, responsible to process the notifications received from the controller. If the resource ranger receives a notification about a detected resource, it uses the delivered device and resource information to fetch the label of the resource (check resource label). The same reaction appears when a correct resource was manually specified. In the case where the resource is without a label, the resource plan is updated immediately. Next, a GUI command must be send to instruct the resource ranger to create a new label and sends it to the sign procedure of the resource labeling component. The resource labeling generates the signature of the label and returns the signature. For the signing, the social network adapter delivers the related public key and a specific trust value (the trust value will be outlined below). The label and the signature are attached to the detected resource. If a notification about a detected resource is recognized by the resource ranger and the resource has a label and signature, the label and the signature are sent to verify procedure of the resource labeling component. The same procedure is called when a correct resource was manually specified. The social network adapter delivers the public key and the trust value. Thereafter, the signature is verified and decrypted with the public key to compare the result with the label. If the label and the decrypt result have the same value, the adapter returns valid, otherwise invalid. The result, recognized member name and the trust value must be sent to the GUI, to prevent that an untrusted community member is automatically assigned to the resource (the reason for this is explained below). A valid result initiates an update 5. SYSTEM DESIGN 70 of the resource plan and therefor completes the ownership attribute of the corresponding resource element by adding the member identification. It is possible to cancel the resource provisioning coupled on a user interaction. In such a case, the resource ranger removes the label and the signature from the resource and updates the resource plan by removing the corresponding resource element. Thus, the resource ownership is undefined for other community members and thus is not provided to the community.

Figure 5.6: The most important actions for the resource labeling.

The trust value mentioned above can be used to determine how trustworthy a community member is. A high value means that many members have signed for the reliability of the corresponding member. A low value means that zero or a few members have signed for the reliability of the corresponding member. So, if an application such as NubiSave1 would transparently distribute local data - data that is encrypted - over remote or foreign member devices, or even allocates their resources, by utilizing the resource descriptions provided by each resource plan of community, it has to use the provided resources by members with a high trust value. NubiSave would also be responsible to store data of the personal cloud into resources of friend devices that are temporarily integrated in the personal cloud of a member.

Hypothetically, if a device of an untrusted community member (low trust value) is integrated in the personal network of an other member, an automatic resource integration is possible, but NubiSave would not utilize the resources because of the low or “negative” trust value. In the case of an untrusted

1NubiSave is a user-space controller to securely distribute data over multiple cloud provider [Nub11]. 5. SYSTEM DESIGN 71 community member with potentially evil intentions, it may be assumed that he or she has temporarily stolen a device with a resource labeling from a member with a high trust value (maybe the victim does not even recognize the theft), copies the labeling files to his own device, reads the stolen device description file and manipulates the own device descriptors according to the stolen device descriptors (related information on such an attack can be found here “Dongles - faked hardware protections”2 ).

This approach does not provide a solution for this problem, but supplies instructions to prevent such an evil behavior, which is as follows. Because the GUI shows the ownership information of detected devices of the personal network before updating the resource plan, it should be prevented that an untrusted community member integrates a resource that has not his or her ownership. Furthermore, the user should ensure that only authorized individuals can access the machine, on which the Flexi- Cloud component is installed. Naturally, connecting mobile devices with a machine of an untrusted community member implies similar considerations.

As mentioned above, NubiSave distributes the user data over multiple cloud provider in a secure fashion. Furthermore, the application provides techniques to split the data into data chunks that are encrypted before transporting them. Thus, the stronger the algorithms are to encrypt these chunks, the lower is the chance to decrypt them from the perspective of attackers. And thus, the personal data stays protected.

5.4 Data Management

As mentioned before, the resource ranger transfers data to connected USB storage devices to perform the resource labeling. This is an adequate behavior because the ranger knows exactly where the resource are to find. He would just copy the created files to a specific system folder. A bluetooth mobile phone device, that is connected to the USB interface can also be mounted. A solution therefor is the FUSE3 based filesystem ObexFs4. ObexFs can also find calender information and private data. This fact may frightens some members, which therefor prefer the OBEXFTP protocol that just transfers data to selected folders specified by the user (folder like SHARE, PUSH, or PULL). The OBEXFTP protocol can be implemented by a plugin, respectively functional unit. As could be seen earlier, the resource management mainly receives notifications about service information from the controller. The decision is to swap such a data transfer to the data management to provide a clearly structured management. The data management sends a message to the controller containing storage service specific information. Then, the controller sends the message to a suitable plugin that stores the requested data in a directory (path is included in the message).

2http://www.woodmann.com/crackz/Dongles.htm 3Filesystem in Userspace (FUSE) 4http://dev.zuckschwerdt.org/openobex/wiki/ObexFs 5. SYSTEM DESIGN 72

5.5 Application Programming Interface

The Application Programming Interface (API) provides the functionality to accept and response re- quests from the applications mentioned in Chapter 3. There are different applications that are inter- ested in the resource information from the resource plan. This means, that FlexiSource has to handle different requests from client applications and send adequate responses. It is possible to request the resource information directly, which means that information are send to clients that are actually stored in the resource plan. The API therefore provides methods to request information about machines, re- sources, availability of devices, and the ownership of resources. As mentioned before, the discovery is based on events that are triggered by the plugins. It is assumed that the applications are interested in the delivered information of these events and want to receive them immediately. In this case, frequent requests (also known as polling) initialized by the clients would not be a solution, because the deliv- ered information from the events are eventually not up-to-date. To provide the most flexible solution it is possible that the client applications can register themselves for a specific event and use for this purpose the API. This means that FlexiSource has to handle the address of the requester and sends specific information about an event to a corresponding requester. If a requester can not listen directly for the event notification, it is possible that the requester passes information on the desired address, where FlexiSource can send notifications of the event information.

Furthermore, the communication between the applications and FlexiSource has to be reliable. Thus, the technology to transfer messages between the applications and the FlexiSource API has to provide reliable communication mechanisms including an ordered transfer of messages.

The API provides methods to add, modify and delete information about resources and machines. Furthermore, complete resources and machines can be added, modified or deleted. Adding a resource means that the user manually specifies the resource. If this resource is valid, the related machine and resource items are created in the resource plan.

5.6 Summary

Figure 5.7 illustrates the overall system architecture of FlexiSource presented in this chapter. At the button are the functional units, respectively plugins, which provide device and resource discovery mechanisms for devices of the local network, and the detection of resource information for devices of the machine FlexiSource is installed. Plugins which deliver similar information, such as resource in- formation of the local network, have the same request parameter and response content. A description file - description of functional units - gathers all information necessary to install and request the plug- ins. The controller can be instructed to parse the description file to start the plugins to search and listen for storages services (network services) defined in the description file or to listen to events of plugin devices. The resource management can register events that are fired if the controller receives device or resource events. Depending on the events, the resource management can determine the resource information and synchronize the resource plan that contains all discovered resource. Temporarily not available devices can stay in the plan and have a related status. The resource labeling is responsible 5. SYSTEM DESIGN 73 to generate, verify and decrypt signatures, and has to check the reliability of messages based on the PKI of the social network. A signature is generated from a file containing unique device and resource descriptors and can be attached to the related resource. Then, the labeled resource is visible to other members of the community, and can give information about the member and his/her trust value. The data management has the task to send/fetch data to near field devices via Bluetooth. Finally, the API implements an interface to facilitate web services and applications the access to information from the resource plan and to register for notifications depending on the device and resource events. The API can also receive calls to manually integrate storage resources into the resource plan.

Figure 5.7: The Resources-as-a-Service system architecture (focus: storage resources). 6. IMPLEMENTATION 74

6 Implementation

This chapter presents selected parts of the prototype that was developed in order to demonstrate and evaluate the concept of this thesis described in Chapter 5. The emphasis is on the resource integration that delivers information for specific devices. These information can be used to sign a specific storage resource in the interest of the member of a community.

The prototype is mainly implemented with Python. An important feature of Python is the possibility to seamlessly integrate information of system tools in the programming environment, which is necessary for the integration and management of resources. The implementation of the prototype is mainly structured in packages, which are folders/directories that contain modules. A Python module can provide functionality and/or multiple classes.

The following delimitations are applied to the implementation:

• machines in the local network have a unique host name and unique IP address • USB devices have to be mounted automatically, if they are connected with the machine • presented solutions were implemented and tested on a Linux-based operating system (current development environment: Debian GNU/Linux 6.0) The next section presents an overview of packages and classes that demonstrates the structure and the provided functionality of the prototype implementation.

6.1 Prototype Overview

A general overview of the prototype is introduced in this section. The programming structure of the prototype is divided into functional parts according to the concept presented in Chapter 5.

The root directory of the prototype contains the configuration file flexisource.conf, which provides the configuration of the directory, where all mountable resources are mounted. The README file gives further instructions for the usage of the prototype.

6.1.1 Package: ResourceManagement

The resource management is responsible to deliver the main functionality to mount resources accord- ing to the resource information and to manage them. In the following, the contained folders and their modules are explained: 6. IMPLEMENTATION 75

• Machine: The class MachineHandler has a list of all machines of the local network (it is implemented as a singleton pattern, and so, many classes can use some instance to get resource information). The module resourceContainer contains classes that manage a specific resource type of a machine. The most important one is the StorageContainer, which adds, deletes and modifies storage resources of a machine. The StorageContainer can have multiple instances of a StorageContainerItem (from the module resourceContainerItem) • Shell: This folder contains all Linux-based commands to integrate resource (e.g., ifconfig in- formation, curlftpfs, sshfs, mount, hcitool, and rfcomm) • SocketEventListener: Contains the event handler that process events from the functional units. • Types: Contains string type for protocols, interfaces, addresses, actions, and requests. The class Ranger (module resourceRanger) handles the resource information depending on the in- formation of the MachineHandler. The class ResourceMarker has to execute the resource labeling.

6.1.2 Package: PluginController

This folder contains the Controller to manage the functional units, the description file for functional units (plugins.conf), and event classes to handle the plugin events (socketEvent, eventObject).

• Messaging: Contains modules that implement functionality to communicate with the func- tional units. • Plugins: A container for the functional units. Currently, it contains Avahi, Bluetooth, and Udev.

6.1.3 Package: DataManagement

The data management provides additional functionality to transfer data between devices.

• DataExchange: Currently, this folder contains the class BluetoothDataExchange to send and receive data from Bluetooth devices.

6.1.4 Package: Logging

The folder Logging contains the module logger that is responsible to write the system’s actions, behavior and errors in the file flexisource.log.

6.2 Resource Integration

The resource integration is mainly based on functional units that deliver resource information of the personal network, as described in Chapter 5.2. Because it is possible to extend the architecture on multiple functional units, similar to a plugin mechanism, the functional units are to be termed as 6. IMPLEMENTATION 76 plugins. There are two plugins to discover storage resources. The Bluetooth plugin is implemented in Java and uses BlueCove1, a library for Bluetooth. For mDNS/DNS-SD service discovery, a plugin written in C exists that uses the zeroconf implementation Avahi. Both plugins accept a message to start the discovery (startDiscovery ) by socket requests and return messages about resources and devices based on events. Furthermore, a plugin exists that imports pyudev (a pure Python libudev binding) to detect storage information of pluggable devices and local partitions.

Listing 6.1 shows an example for the description file for plugins ( description of functional units), with which the controller identifies available plugins. The name of the description file is plugins.conf. Each plugin description starts with the statement define_plugin and contains multiple key-value at- tributes. The attribute type describes the underlying service discovery protocol. The attributes name, port and directory are used to construct the command to start a plugin from a specific directory. A plugin should have at least one service description. Each service description starts with the statement define_service and can consist of the attributes parameter or type. A type is necessary to find the related mount procedure for the resource integration. The Bluetooth plugin accepts multiple request parameters and thus, the type of the plugin is used to identify the plugin internally. This means, the internal event handler detects at first the type of the plugin and then processes incoming request. The occurrence of multiple request parameters is expressed by the attribute grouped_request and must have the value “yes”. Thus, the method call startDiscovery of the plugin (see Table 5.1) accepts multiple service types. In contrast, the Avahi plugin accepts just one service parameter and thus, the type is specified in the service definition. This means, the internal event handler directly processes a related event depending on the service type.

define_plugin { type = bluetooth name = BluetoothPlugin . jar dir = path/to/plugin/named/Bluetooth port = 10445 command = java −j a r grouped_request = yes define_service{ parameter = OBEXFileTransfer } define_service{ parameter = OBEXObjectPush } } define_plugin { type = avahi name = AvahiPlugin dir = path/to/plugin/named/Avahi port = 10455 command = ./ define_service{

1http://bluecove.org/ 6. IMPLEMENTATION 77

type = sftp parameter = _sftp −ssh. _tcp } define_service{ type = nfs parameter = _nfs._tcp } }

Listing 6.1: Description file for plugins.

Examples for the data exchange format to transfer service event messages from the plugins to the FlexiSource controller is illustrated in Listing 6.2 and 6.3. Listing 6.2 shows a string message for the event that a new device was detected. The string consists of the plugin name (“BluetoothPlugin.jar”) and the event type (“add”), followed by the device properties and the detected services. The informa- tion on the service “OBEXFileTransfer” contain the service type, the event, the complete address and the specific service channel.

BluetoothPlugin . jar+add#bluetooth+dimebag+001583482464 +[[ OBEXFileTransfer+add+btgoep:001583482464:8;authe nticate=true ;encrypt=true;+8], [OBEXObjectPush+add+btgoep:001583482464:7; authentic ate=true ;encrypt=true ;+7]]

Listing 6.2: Device event (new) with found storage services.

Listing 6.3 shows a string message for the event that a device was modified. The event type has the value “modified” and indicates that the device has changed some properties. In this case, the modified property refers to the availability of the service “OBEXFileTransfer”, which is no longer offered by the Bluetooth device (expressed by the event type “delete”).

BluetoothPlugin . jar+modified#bluetooth+dimebag+001583482464 +[[ OBEXFileTransfer+delete+btgoep:001583482464:8;au thenticate=true ;encrypt=true ;+8]]

Listing 6.3: Device event (modified) with deleted storage services.

The controller offers the method start_plugins() that initializes the plugins by parsing the plugins.conf file to retrieve port and command information and starts the plugins from a shell environment. Then, by calling the method scan_devices(), the controller initializes the sending of messages to the plugins. These messages contain the command startDiscovery and one or multiple service types (see Table 5.1). Event handler can be added to the controller. Thus, the controller attaches incoming events to a related event handler. The implementation provides multiple event handler2 to extract device information, which are passed to the resource ranger. If the resource ranger gets the information of a storage resource (service). In the case that the device of the resource is newly detected, it request a new machine object from the machine handler and adds the resource. If the machine is already stored

2Currently, four event handler are implemented. A Bluetooth event handler that accepts OBEXFTP and OBEXObjectPush messages, and FTP, SFTP, and NFS event handler. 6. IMPLEMENTATION 78 and the resource is new, the resource ranger creates a related storageContainerItem and adds it to the machine.

6.3 Gathering of Resource Information

Currently, the machine handler manages the discovered resources. It can add, delete, and modify objects of machines, resource containers and related items (the actual resources).

6.4 Application Programming Interface

The Application Programming Interface (API) implements sockets to request information from the resource plan. This has the benefit that nearly every higher programming language can handle sockets, and also PHP provides sockets. The most important web service is Cloudremix, which is written in PHP and JavaScript. Cloudremix users are interested in the device and resource information and want to receive information of events immediately. PHP can send socket request (socket client), but waits until the answer receives (blocking). The PHP socket listener (socket server) do not wait, they can listen to new clients and handle the client input. The waiting problem can not be resolved efficiently with a socket client solely. To notify Cloudremix, the following is considered. The FlexiSource socket interface excepts a notify request from a client that contains the clients address (in this case localhost) and the socket number. Whenever an event occurs, FlexiSource can inform the PHP web service, which listens on a specific port, handles the input and has to know the incoming data structure. Higher programming languages, such as Java, provide also sockets for the communication. In contrast to PHP, they provide threads that can process specific actions in the background. Thus, an application written in Java can wait in a background thread for notifications without opening a listening socket. This means, if the notify request contains no address information, an event is sent directly to the requesting socket.

The first version of the prototype does not contain the implementation for the API.

6.5 Prototype Extension

Since the previous sections concentrated on the fundamental part of the prototype, the resource inte- gration, this section discusses further implementation work to realize the concept and thus creating a solid basis for the evaluation of the prototype. The current management of discovered resource information is based on an object-oriented implementation without any persistent data management. To improve this situation, the machine handler has to initialize the writing of resource information to an XML database that reflects the actual state of resources of the personal network. Thus, whenever a machine, resource container, or resource item is added, deleted, or modified, the machine handler has to update the XML database throughout the entire runtime of the program. Based on the device 6. IMPLEMENTATION 79 and resource information, the resource labeling has to be implemented based on the ideas of the con- cept. The resource labeling for USB devices can use the information of the Udev plugin to create a signature for a partition. Because the mount directory of a USB partition differs on different systems, Udev provides a solution by offering partition numbers. Thus, such a labeled USB device can also be identified connected to a friend’s FlexCloud machine. For mobile devices that use Bluetooth, the device’s Bluetooth address is unique, which facilitates the implementation of the resource labeling. In this case, the signature can be transfered via the BluetoothDataExchange class provided by the data management. For other network devices that are only used in the personal network, the unique IP address and host name can be used to create the signature and would be copied in the mounted folder of the resource. For other network devices, whose resources should also be identified in a friend’s personal cloud, the IP address is most likely different from the assigned one of the own network and the host name could conflict with names of friend’s devices. The solution for the resource labeling of these devices is to use the unique MAC address. How is it possible to spy the MAC address? The Linux program arp3 that implements the Address Resolution Protocol can be called on the command line and needs as parameter the IP address of the network machine. It delivers the MAC address and the name of the interface. A precondition for this procedure is that the requested machine does not block the requests by a firewall. To do this, the Linux-based commands of the shell folder of the re- source management must be extended to include this command. Finally, the application programming interface must be implemented to provide the resource information to other applications, respectively the setting of resource information.

6.6 Summary

This chapter presented selected parts of the prototype, which was implemented to demonstrate the realization of the concept of this thesis. The focus hereby was to describe the resource integration, from the plugins over the controller to the resource management. It was also shown how to enhance the prototype to create a solid basis for the evaluation.

3http://linux.die.net/man/7/arp 7. EVALUATION 80

7 Evaluation

The purpose of this chapter is to evaluate the concepts of the system design based on the requirements described in Section 3.2. Furthermore, this chapter provides a description of the evaluation of the prototypical implementation of the concept based on a set of test cases.

At first, the validation of the system design is presented by discussing the fulfillment of the conceptual realization of the requirements. Parts of the system design were prototypical implemented in order to demonstrate the functioning of the contained concepts and to serve as a foundation for further eval- uation work. So, in the final step, the prototypical implementation of the system design is evaluated with regard to the use cases presented in Section 1.2.

7.1 Evaluation of Requirements

At first, the evaluation of the requirements, which were described in Section 3.2, is discussed with regard to their fulfillment by the concepts of the system design presented in Chapter 5. The evaluation is based on the defined concrete requirements that were described in the requirements analysis.

Resource Discovery

R-RD-1: Discover devices by common protocols. The most of the information for the resource discovery are deliver by the functional units, respectively plugins, as mentioned in Section 5.2. The discovery of devices is coupled with a request for resources. If the requested resource is not available, device information are provided. This is the case for mobile and peripheral devices. The manual discovery of storage resources also adds devices and resources information to the resource plan.

R-RD-2: Discover resource information by common procedures. The plugins implement the procedures to discover devices and the manual discover is provided by the application programming interface. The focus of this thesis and the concept lies on storage resources and thus, the discovery is limited to storage resources.

R-RD-3: Provide and modify device and resource description plan. The provision of information of the resource plan is given by the API. The API also provides methods to modify the resource plan. For example, by the manual discovery of resources or if a friend’s resource was detected. A modeling of the resource plan was presented, but a concept for detailed mechanisms to modify the resource plan is missing.

R-RD-4: Provide and modify virtual resource information. A virtual resource is a part of a 7. EVALUATION 81 resource with a specific capacity. The idea was that a provisioning tool consumes the resource plan and requests the allocation of a specific resource capacity for utilization. The system design contains no concepts about the allocation of resources and the provisioning of virtual resource information.

R-RD-5: Discovery should be capable of integrating new protocols. The architecture allows to integrate additional protocols and procedures based on the plugins and the description file for plug- ins. Thus, the plugins are programming language independent and can implement several protocols. The resource manager uses the specification of the response content of plugins that provide similar information.

Data Discovery

R-DD-1: Discover datasets of distributed local resources. The concept is focused on the informa- tion about resources. Data is just discovered and fetched if the label for the resource identification is required. The system design chapter initially states that this requirement is not taken into account.

R-DD-2: Data transfer by common protocols. Data is transfered if the label for the resource iden- tification is required. The concept implements this requirement in the data management component to transfer data in a near field communication scenario.

R-DD-3: Provide and modify datasets. The API does not allow to provide or modify datasets. A provision and modification is just useful for FlexiSource itself to handle the resource labeling. If a storage resource was mounted, applications of the same machine can request the location of the mounted directory and utilize the resource for data management. Furthermore, the system design chapter initially states that this requirement is not taken into account.

Trust Metrics

R-TM-1: Provide/Attach information about the owner to a device. The resource labeling proce- dure described in section 5.3.2 creates a signature to verify the user and the resource, and transfers this signature to a resource of the device. The device can have multiple information about the owner depending on the provided storage resources. But if no label was attached, there are no further owner information.

R-TM-2: Provide/Attach information about the usage context to a virtual resource. As men- tioned above, virtual resources, respectively the allocation of storage resources, is not provided. The usage context of device resources currently refers to social network community.

R-TM-3: Provide/Attach information about the principal utilizer to a virtual resource. As mentioned above, virtual resources, respectively the allocation of storage resources, is not provided. Utilizers of resources are not considered in the concept.

R-TM-4: Provide/Attach information about the owner and principal utilizer to a dataset. The handling of owner and utilizer information for datasets is not provided. 7. EVALUATION 82

Synchronization

R-S-1: Synchronize distributed local (virtual) resources with a local machine. The synchroniza- tion of resources depends on events about resources, the manual discovery of resources, the resource labeling and the API. The API provides the methods to manage the resources of the personal network and thus, initializes the synchronization of the resource information with the resource plan located on the local machine.

R-S-2: Synchronize distributed local datasets with a local machine. Generally, just the labels for the ownership identification are synchronized with the local machine.

R-S-3: Uniform access to resources and data of multiple devices. The uniform access to resources is given by the application programming interface. Data provision is not supported directly, but the location of the mounted resources can be requested to utilize the contained data.

7.2 Scenario-based Evaluation

This section describes the evaluation of the prototype based on the three use cases presented in Section 1.2. To analyze the functionality of the prototype presented in Chapter 6, the prototype is integrated in the cloud control application Cloudremix, which provides a user interface to manage resources, data and software. Therefor, Cloudremix should use the API of FlexiSource ( see Section 5.5) to request the resource information and to manage the resources.

Eleven test cases are used to analyze the interaction between users and the FlexiSource API. The test cases consist of actions that are delegated to the FlexiSource API, which informs the FlexiSource components, and the corresponding responses are illustrated by the behavior.

The test cases also follow the basic requirements presented in Section 3.2:

• Resource Discovery: T1-T8 • Data Discovery: T9 • Trust Metric: T9, T10, T11 • Synchronization: T2, T6, T8, T9, T10, T11 Appendix B contains the forms for the tests. Because the API was not implemented, the results of the test performance are not available at the time of the submission of this thesis. To fulfill these tests, the prototype is extended after the submission and the test results are submitted before the oral plea of the diploma thesis takes place.

T1 Auto-discovery Configuration Cloudremix provides a list of protocols for the auto-discovery of devices and storage resources (USB, Bluetooth, Avahi). The user enables USB to be automatically informed if new USB devices are connected to the main machine. The user will receive a success 7. EVALUATION 83 information.

T2 Visibility of devices Cloudremix provides a view that shows a list of devices. The user connects a USB device to the main machine and the integration of the device is indicated by a new entry in the list of devices.

T3 Visibility of resources The user recognizes the new USB device entry in the list of devices and double-clicks the device entry. Then, he/she sees the device properties, such as device name and vendor, and the resource information of the partitions.

T4 Visibility of resources Cloudremix provides a view that shows a list of resources, which includes device properties, such as the device name and the device type. Among these resources, the user can find the partitions of the USB device. Each partition corresponds to exactly one resource entry of the list of discovered resources. A resource entry shows also the capacity (e.g.,total, available, and used) of the resource.

T5 Semi-automatic discovery Cloudremix provides functionality to scan the main machine or the local network for a specific type of resources, such as USB, Bluetooth, or Avahi announced storage services. The user activates the scan functionality for Bluetooth devices, under the precondition that the auto-discovery of Bluetooth devices is disabled and that the user has at least one visible Bluetooth device. After the scan is finished, the visible Bluetooth devices that are newly detected appear in a list.

T6 Connect to device The user recognizes the new Bluetooth device entries and double-clicks one specific device entry. Then, he/she sees the device properties, such as device name and Bluetooth address. Because the device was newly detected, the user can connect to the Bluetooth device by entering the pin-code, followed by a confirmation. The user will receive a success information. Then, the device is visible in the list of devices.

T7 Modify resource information The newly detected Bluetooth device provides no information about storage capacity. The user navigates to the list of resources and chooses the resource of the Bluetooth device, under the precondition that a OBEXFileTransfer service was detected. Then, he/she can enter values for the capacity (e.g.,total, available, and used) of the storage resource and confirms. The user will receive a success information.

T8 Manual discovery Cloudremix provides the functionality to manually discover resources. The user navigates to the corresponding view. He/she chooses the protocol (e.g., NFS, FTP, SFTP), which is necessary to connect to the resource, enters the address and confirms. The user will receive a success information. If the connection was successful, the resource is visible in the list of resources.

T9 Detect Friend’s Device A USB device of a friend of the related social network is connected to the main machine, under the precondition that the device has a resource label (same sequence of actions as T2 and T3). The device appears in the list of devices and the user can see the member name of the friend. The resources of the device appear in the list of resources and the user can see the member name of the friend.

T10 Attach resource label for provisioning A personal storage resource should be provided for members of the community. An application that initializes this task uses the FlexiSource API to call 7. EVALUATION 84 a related method to start the resource labeling. The user will receive a success information. If the action was successful, the resource label causes that the resource is ready for the provisioning, which is shown by related applications.

T11 Release provisioning resource A personal storage resource was labeled to provide it for mem- bers of the community. An application that cancels the provisioning calls at first a related method to check if the resource is utilized. The user will receive a success information. In the case that a re- source contains data of other members, the application will receive the related information and shows it to the user. The user consciously ignores this information and starts the release of the resource. If the action was successful, the removed resource label causes that the resource is released from the provisioning, which is shown by related applications. 8. SUMMARY AND OUTLOOK 85

8 Summary and Outlook

This thesis presented the concept for a system design that provides the functionality to support mem- bers of a social network in the management of resources within the personal network. The research topics resulted from the challenges to discover resources in local networks and short distance connec- tions, the trustworthy assignment of resources for members of a social network, and the synchroniza- tion of the resources, including the contained data and software, with a central machine. A scenario with three use cases was presented in Section 1.2 that demonstrated how the members of a social network should utilize the resources of the personal network and interact with each other regarding the context of this thesis.

Chapter 2 introduced the basic mechanisms underlying the cloud computing paradigm. The most important foundations were the deployment model of the community cloud, the distinction of the personal cloud from other deployment models of the Cloud, and the presentation of privacy and security mechanisms for a trustworthy cloud computing.

In Chapter 3, the problem was analyzed in detail by discussing the dependence on other applications, which utilize resource information and information about the identity of community members. The analysis also included the behavior of community members utilizing their resources. Based on this analysis, the concrete requirements were formulated regarding the discovery of devices, resources, and data, the handling of trust metrics, and the synchronization of resources and data.

Chapter 4 provided an overview of the related state-of-the-art products, research projects, and tech- nologies. Two products in the field of personal clouds were presented and analyzed regarding the employed technologies to share resources and data. The following section presented eight research projects in the field of personal, community, social and home cloud. The research projects were examined to see what technologies and mechanisms are used to discover hardware and resource in- formation. Then, a detailed description of tools and libraries, which provide the discovery of resource information and utilized in some of the examined projects, was presented. The chapter concluded with an overview of the most important features of the presented products and projects.

The concept for the system design was presented in Chapter 5. The most important parts of the sys- tem design are the resource integration and resource management, as well as the data management and application programming interface. The resource integration presented an approach for the dis- covery of device and resource information by separating technologies and functionality, and apply them to functional units, respectively plugins. The integration of the plugins depends on interface specifications (request and response schema) and a description file that contains information on the execution and parameter values of the plugins. Thus, the component is extensible by functionality that can be implemented programming language independently. The resource management described the content of the resource plan, which contains the resource information of the discovered devices. 8. SUMMARY AND OUTLOOK 86

A synchronization of the resource plan is conducted if device and resource events occur or through the manual discovery of resources by the user. To identify resources of specific community members, the resource labeling was introduced. This procedure attaches a label to a specific resource that con- tains device, resource, and user information, and updates the resource plan with information on the ownership of a resource. The authenticity of the label can be verified based on the public key infras- tructure. With this procedure, other applications are able to recognize that a resource has a unique owner. Furthermore, possible attack scenarios were discussed.

Chapter 6 discussed the prototypical implementation of the system design by first giving an overview of the program structure, followed by a detailed description of the implemented resource integration. Furthermore, the database for the gathered resource information was mentioned and the concrete implementation of the API was discussed. Finally, the extension of the prototype was discussed.

Finally, Chapter 7 discussed the evaluation of the requirements with regard to their fulfillment by the concepts of the system design. Furthermore, the test cases for the evaluation of the prototype were described.

8.1 Future Work

The focus of this thesis was based on storage resources that should be discovered and managed. To include also information on computational resources of network machines, programs are necessary that deliver information on CPU and memory of the resources. Because the concept of the resource integration provides the integration of further plugins , a plugin can be integrated to communicate with programs of the resource’s machine to gather the information on computational resources. Monitoring tools, such as Hawkeye or Ganglia presented in Section 4.3, could be used for this purpose. A wrapper program has to be implemented that facilitates the communication of FlexiSource with the monitoring tool, and the description file of the plugins must be extended with information to start the wrapper and on request parameters.

The description file for plugins contains no information about the response content of messages with the consequence that the event handler of the resource management parses the content based on a specification. To provide a more flexible solution, the response content could be described in the description file, which would facilitate that the structure of response messages can be designed more individual. Under the precondition that there is a fixed vocabulary of possible response items, the response content could be modified without modifying the event handler.

The idea to connect information on negotiated service level agreements (SLAs) with the trust metrics was that members should be preferred in the utilization of storage resources if they have negotiated a specific agreement. For example, Bob provides a print service and has an agreement with Alice, who uses the print service. On the other side, Alice provides storage resources that Bob want to utilize. If there are other community members that have the same desire as Bob, but no agreement with Alice, then Bob would be preferred. For the remote communication of community members the solution would be to use repositories of SLAs on the member peers to check if agreements are negotiated between a requester and a provider. For the personal network with integrated foreign storage resources 8. SUMMARY AND OUTLOOK 87 of other members a similar solution is possible and is described as follows. If multiple resources of other members are detected, it is checked who uses the most negotiated services. The storage resource of the member, who uses the most services is taken for utilization. Another solution is to attach information about SLAs in every resource label. But this would create an overhead of information and would not be necessary based in the considerations mentioned before. The problem with such a solution is how to keep the information about the negotiated SLAs up-to-date in an environment that is based on automated service provisioning and SLA negotiation.

When the information about the storage resources are changing, the resource plan is synchronized with these information. Applications can receive resource information through the FlexiSource API and store them in their own database, even if these applications are executed on remote peers. The task of this thesis also states that data and application services should be synchronized. Assuming that data is provided through the discovered resources, then an application can request the mount directory of resources and synchronize it with the main machine. If FlexiSource synchronizes the data, it has to verify specific signatures of a dataset to identify the owner. If the ownership is valid, the synchro- nization can be applied and uses therefor a specific directory of the main machine. By contrast, the synchronization of application services with the main machine of the personal network requires exe- cution resources, which was excluded in the requirements (Section 3.2). An extension would require that the devices provide directory services, which allow to discover the provided application services. Then, a synchronization with the directory service of the main machine can be applied. 9. ABBREVIATION 88

9 Abbreviation 9. ABBREVIATION 89

IaaS Infrastructure-as-a-Service PaaS Platform-as-a-Service SaaS Software-as-a-Service RaaS Resources-as-a-Service IDaaS Identity-as-a-Service GUI Graphical User Interface HPC High-Performance Computing URL Uniform Resource Locator SOA Service-Oriented Architecture API Application Programming Interface VM Virtual Machine QoS Quality of Service NIST National Institute of Standards and Technology VPN Virtual Private Network IPsec Internet Protocol Security SPML Service Provisioning Markup Language SSO Single Sign-on LDAP Lightweight Directory Access Protocol SAML Security Assertion Markup Language IAM Identity and Access Management CTO Chief Technical Officer NAS Network-attached Storage PII Personally Identifiable Information CDN Content Delivery Network CTO Chief Technology Officer XML Extensible Markup Language JSON JavaScript Object Notation DHT Distributed Hash Table IP Internet Protocol TCP Transmission Control Protocol UDP User Datagram Protocol SLP Service Location Protocol mSLP mesh-enhanced Service Location Protocol DNS Domain Name System mDNS Multicast DNS DNS-SD DNS Service Discovery XRD External Data Representation RRD Round-Robin Database CLP Constraint Logic Programming XMPP Extensible Messaging and Presence Protocol FTP File Transfer Protocol SFTP Secure File Transfer Protocol NFS Network File System CIFS Common Internet File System iSCSI internet Small Computer System Interface iSNS Internet Storage Name Service SMB Server Message Block HDFS Hadoop Distributed File System PKI Public Key Infrastructure FUSE Filesystem in Userspace Bibliography 90

Bibliography

[AAC+10] Rocco Aversa, Marco Avvenuti, Antonio Cuomo, Beniamino Di Martino, Giuseppe Di Modica, Salvatore Distefano, Antonio Puliafito, Massimiliano Rak, Orazio Tomarchio, Alessio Vecchio, Salvatore Venticinque, and Umberto Villano. The Cloud@Home Project: Towards a New Enhanced Computing Paradigm. Proceedings of the confer- ence on Parallel processing, 2010.

[AC11] David S. Allison and Miriam A.M. Capretz. Furthering the Growth of Cloud Computing by Providing Privacy as a Service. Information and Communication on Technology for the Fight against Global Warming, 2011.

[AKA10] A. Andrzejak, D. Kondo, and D.P. Anderson. Exploiting Non-Dedicated Resources for Cloud Computing. Network Operations and Management Symposium (NOMS), 2010.

[AKI08] Akitio introduces “personal cloud” computing with mycloud series. http://www.akitio.com/201105081291/2011/akitio-introduces-personal-cloud- computing-with-mycloud-series, 2008. [last access: 2011-12-10].

[AKS10] A. Andrzejak, D. Kondo, and Yi Sangho. Decision Model for Cloud Computing under SLA Constraints. IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2010.

[ANZ] The anzere personal storage system. http://www.systems.ethz.ch/research/projects/anzere. [last ac- cess: 2012-02-27].

[App09] Unlocking the Promise of Cloud Computing for the Enterprise. Appistry, Inc., 2009.

[BEv+11] Vanderson Botêlho, Fabrício Enembreck, Bráulio Ávila, Hilton de Azevedo, and Ed- son Scalabrin. Using asymmetric keys in a certified trust model for multiagent systems. Expert Systems with Applications, 2011.

[BFSR08] João Paulo Barraca, Pedro Fernandes, Susana Sargento, and Rui Rocha. An Architecture for Community Mesh Networking. IEEE 19th International Symposium on Personal, Indoor and Mobile Radio Communications, 2008.

[BKNT11] Christian Baun, Marcel Kunze, Jens Nimis, and Stefan Tai. Cloud Computing: Web- Based Dynamic IT Services. Springer Berlin Heidelberg, 1 edition, 2011.

[BMA10] João Paulo Barraca, Alfredo Matos, and Rui L. Aguiar. User Centric Community Clouds. Wireless Personal Communications, 2010.

[BPV09] Rajkumar Buyya, Suraj Pandey1, and Christian Vecchiola. Cloudbus Toolkit for Market- Bibliography 91

Oriented Cloud Computing. CloudCom ’09 Proceedings of the 1st International Confer- ence on Cloud Computing, 2009.

[BS10] Fabrizio Baiardi and Daniele Sgandurra. Securing a Community Cloud. Proceedings of the IEEE 30th International Conference on Distributed Computing Systems Workshops, 2010.

[CAAS10] William Y. Chang, Hosame Abu-Amara, and Jessica Feng Sanford. Transforming Enter- prise Cloud Services. Springer Science Business Media, 1 edition, 2010.

[CCRB10] Kyle Chard, Simon Caton, Omer Rana, and Kris Bubendorfer. Social Cloud: Cloud Computing in Social Networks. IEEE 3rd International Conference on Cloud Computing (CLOUD), 2010, 2010.

[CDPS09] Vincenzo D. Cunsolo, Salvatore Distefano, Antonio Puliafito, and Marco Scarpa. Volun- teer Computing and Desktop Cloud: the Cloud@Home Paradigm. Eighth IEEE Interna- tional Symposium on Network Computing and Applications, 2009.

[CDPS10] Vincenzo D. Cunsolo, Salvatore Distefano, Antonio Puliafito, and Marco Scarpa. From Volunteer to Cloud Computing: Cloud@Home. Proceedings of the 7th ACM interna- tional conference on Computing frontiers, 2010.

[Clo09] Security guidance for critical areas of focus in cloud computing v2.1. https://cloudsecurityalliance.org/csaguide.pdf, December 2009. [last access: 2011-12-05].

[CTVP10] Antonio Celesti, Francesco Tusa, Massimo Villari, and Antonio Puliafito. How to En- hance Cloud Architectures to Enable Cross-Federation. IEEE 3rd International Confer- ence on Cloud Computing, 2010.

[CVKB11] Rodrigo N. Calheiros, Christian Vecchiola, Dileban Karunamoorthy, and Rajkumar Buyya. The Aneka platform and QoS-driven resource provisioning for elastic appli- cations on hybrid Clouds. Future Generation Computer Systems, 2011.

[EHL+11] Tariq Ellahi, Benoit Hudzia, Hui Li, Maik A. Lindner, and Philip Robinson. Cloud Computing - Principles and Paradigms: 4. The Enterprise Cloud Computing Paradigm. John Wiley & Sons, 1 edition, 2011.

[ER11] Mohamed El-Refaey. Cloud Computing - Principles and Paradigms: 5. Virtual Machines Provisioning and Migration Services. John Wiley & Sons, 1 edition, 2011.

[Fie] Roy Thomas Fielding. Architectural styles and the design of network-based software architectures. http://www.ics.uci.edu/ fielding/pubs/dissertation/top.htm. [last access: 2011-11-21].

[Gan] Ganglia monitoring system. http://ganglia.sourceforge.net/. [last access: 2012-02-24].

[Goo] Google app engine. http://code.google.com/intl/de-DE/appengine/. [last access: 2011- 10-29]. Bibliography 92

[Gru10] Brian Gruttadauria. The coming personal cloud: Cloud storage for the rest of us. http://download.iomega.com/com/launchkit/px4300r/personalcloud_wp.pdf, 2010. [last access: 2011-12-10].

[Haw] Hawkeye - a monitoring and management tool for distributed systems. http://research.cs.wisc.edu/condor/hawkeye/. [last access: 2012-02- 24].

[IBM11] Ibm intelligent operations center for smarter cities, June 2011. [last access: 2011-12-06].

[Iom08] Iomega personal cloud. http://www.iomegacloud.com/landing_page.php, 2008. [last access: 2011-12-10].

[Iom09a] Das potenzial von vmware nutzen. url:http://download.iomega.com/com/eu/de/info/vmware_solutions_sheet_de.pdf, 2009. [last access: 2012-03-02].

[Iom09b] How to: Vmware esx server. url:http://download.iomega.com/com/nas/pdfs/howto_vmwareesx_ss.pdf, 2009. [last access: 2012-03-02].

[Iom11] Iomega iscsi setup guide. http://download.iomega.com/com/nas/pdfs/iscsi_wp_0211.pdf, 2011. [last access: 2012-03-02].

[ISC04] Internet small computer systems interface (iscsi). http://www.ietf.org/rfc/rfc3720.txt, 2004. [last access: 2012-03-02].

[KGS11] Sudarsun Kannan, Ada Gavrilovska, and Karsten Schwan. Cloud4Home: Enhancing Data Services with @Home Clouds. International Conference on Distributed Computing Systems, 2011.

[KV10] Ronald L. Krutz and Russell Dean Vines. Cloud Security - A Comprehensive Guide to Secure Cloud Computing. Wiley Publishing, Inc., 1 edition, 2010.

[LG10] Markus Lanthaler and Christian Gütl. Towards a RESTful Service Ecosystem - Perspec- tives and Challenges. 4th IEEE International Conference on Digital Ecosystems and Technologies, 2010.

[LMS+11] Ignacio M. Llorente, Rubén S. Montero, Borja Sotomayor, David Breitgand, Alessandro Maraschini, Eliezer Levy, and Benny Rochwerger. Cloud Computing - Principles and Paradigms: 6. On the Managements of Virtual Machines for Cloud Infrastructures. John Wiley & Sons, 1 edition, 2011.

[MB09] Alexandros Marinos and Gerard Briscoe. Community Cloud Computing. Lecture Notes in Computer Science, 2009.

[MG] Peter Mell and Tim Grance. The nist definition of cloud computing. http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf. [last access: 2011-11-21]. Bibliography 93

[Mur09] Alan Murphy. Regional cloud providers: Buy local with a “cloud franchise”. http://thevirtualdc.com/?p=156, July 2009. [last access: 2011-12-06].

[Neu09] Dirk Neumann. SOMAR - Final Activity Report. Information Society Technologies, 2009.

[Nub11] Nubisave - cloud storage controller. http://nubisave.org/, 2011. [last access: 2012-03-13].

[PB11] Siani Pearson and Azzedine Benameur. Privacy, Security and Trust Issues Arising from Cloud Computing. IEEE International Conference on Cloud Computing Technology and Science, 2011.

[PP06] KyoungSoo Park and Vivek S. Pai. CoMon: a mostly-scalable monitoring system for PlanetLab. ACM SIGOPS Operating Systems Review Volume 40 Issue 1, 2006.

[REL09] Bhaskar Prasad Rimal, Choi Eunmi, and Ian Lumb. A Taxonomy and Survey of Cloud Computing Systems. Fifth International Joint Conference on INC, IMS and IDC, 2009.

[Roq08] Celine Roque. Home cloud computing. http://www.theappgap.com/home-cloud-computing.html, August 2008. [last access: 2011-12-08].

[RR10] John W. Rittinghouse and James F. Ransome. Cloud Computing - Implementation, Man- agement, and Security. Taylor and Francis Group, 1 edition, 2010.

[RYJ+11] Oriana Riva, Qin Yin, Dejan Juric, Ercan Ucan, and Timothy Roscoe. Policy Expressiv- ity in the Anzere Personal Cloud. Proceedings of the 2nd ACM Symposium on Cloud Computing (SOCC), 2011.

[Sal] trust.salesforce.com - global privacy law landscape. https://trust.salesforce.com/trust/privacy/global-privacy/. [last access: 2011-10-31].

[Sha02] Richard Sharpe. Just what is smb? http://www.samba.org/cifs/docs/what-is-smb.html, 2002. [last ac- cess: 2012-03-02].

[Sma] Smarter government - from the local town council to international collaborations, new ways of working are underway. http://www.ibm.com/smarterplanet/us/en/government/ nextsteps/solution/B235026Q03312B40.html. [last access: 2011-12-06].

[SOA] Soap version 1.2 part 1: Messaging framework (second edition). http://www.w3.org/TR/soap12-part1/. [last access: 2011-11-21].

[Spe10] Backgrounder network-attached-storage (nas). http://www.speicherguide.de/storage- hardware/nas-systeme/backgrounder-network-attached-storage-nas-9995.aspx, August 2010. [last access: 2011-12-07].

[Sto08] Jon Stokes. Syncing vs. saving, and the case for a home storage cloud. http://arstechnica.com/old/content/2008/06/syncing-vs-saving-and-the-case-for-a- Bibliography 94

home-storage-cloud.ars, 2008. [last access: 2011-12-08].

[Sub10] Krishnan Subramanian. Scaleup offers regional cloud storage. http://www.cloudave.com/192/scaleup-offers-regional-cloud-storage/, July 2010. [last access: 2011-12-06].

[Ton08] What is tonido? http://www.tonido.com/software_what.html, 2008. [last access: 2011- 12-10].

[TSH11] Yuan Tian, Biao Song, , and Eui-Nam Huh. Towards the Development of Personal Cloud Computing for Mobile Thin-Clients. Information Science and Applications (ICISA), 2011.

[UKA] Fumitoshi UKAI. Linux man page - hotplug(8). http://linux.die.net/man/8/hotplug. [last access: 2012-02-24].

[VBB11] William Voorsluys, James Broberg, and Rajkumar Buyya. Cloud Computing - Principles and Paradigms: 1. Introduction to Cloud Computing. John Wiley & Sons, 1 edition, 2011.

[Wes11a] How is homecloud realized? - leveraging a new network appli- ance and innovative software to create a cloud solution in the home. http://www.homecloud.com/pdf/HomeCloud_Realization_White_Paper.pdf, 2011. [last access: 2011-12-08].

[Wes11b] What is homecloud? - the benefits of an in-home cloud computing solution for today’s digital consumer. http://www.homecloud.com/pdf/HomeCloud_Introduction_White_Paper.pdf, 2011. [last access: 2011-12-08].

[Win] Windows azure platform. http://www.microsoft.com/de-de/azure/. [last access: 2011-10-29].

[WN10] Xu Baomin Wang Ning, Xu De. Collaborative Integration and Management of Com- munity Information in the Cloud. International Conference on E-Business and E- Government, 2010.

[WTGB10] Neal H. Walfield, Paul T., Stanton John Linwood Griffin, and Randal Burns. Practical Protection for Personal Storage in the Cloud. EUROSEC, 2010.

[ZS03] Weibin Zhao and Henning Schulzrinne. Mesh-enhanced Service Location Protocol (mSLP). The Internet Society, 2003. List of Figures 95

List of Figures

1.1 Scenario...... 8

2.1 Convergence of various advances leading to the advent of cloud computing [VBB11]. 11 2.2 Cloud computing reference model [BPV09]...... 13 2.3 CommunityCloud[MB09]...... 18 2.4 Services in the Community Cloud [BMA10]...... 20 2.5 HybridCloud[KV10]...... 21

3.1 Layered view of the integration of resources and data into FlexCloud...... 33

4.1 The Anzere system architecture with overlay sensors [RYJ+11]...... 41 4.2 WIP-SD as module for the community management in [BFSR08]...... 43 4.3 Social Cloud Architecture [CCRB10]...... 44 4.4 Basic architecture of the Cloud@Home system [CDPS10]...... 45 4.5 Resource Monitoring in Cloud4Home [KGS11]...... 47

5.1 The global process of the resource provisioning in the personal cloud...... 57 5.2 Notifications of functional units...... 58 5.3 Description of functional units...... 59 5.4 Information discovery by the resource manager...... 65 5.5 Public Key Infrastructure (PKI)...... 68 5.6 The most important actions for the resource labeling...... 70 5.7 The Resources-as-a-Service system architecture (focus: storage resources)...... 73 List of Tables 96

List of Tables

2.1 Cloud computing services classification [BPV09]...... 13 2.2 Characteristics of personal and home clouds...... 26

3.1 Dependence on applications of FlexCloud...... 36 3.2 Overview of all requirements...... 37

4.1 Features of tools and libraries that retrieve hardware and deliver resource information. 51 4.2 The most important features of the presented products and projects...... 55

5.1 Request methods of functional units for the remote device and resource discovery. . . 61 5.2 Response content of functional units for the remote device and resource discovery. . 62 5.3 Request methods of functional units for the local device and resource discovery. . . . 63 5.4 Response content of functional units for the local device and resource discovery. . . . 64 APPENDIX A. SERVICE DISCOVERY PROTOCOLS 97

A Service Discovery Protocols APPENDIX A. SERVICE DISCOVERY PROTOCOLS 98

TECHNISCHE UNIVERSITÄT DRESDEN

FAKULTÄT INFORMATIK INSTITUT FÜR SYSTEMARCHITEKTUR PROFESSUR FÜR RECHNERNETZE PROF. DR. RER. NAT. HABIL DR. H. C. ALEXANDER SCHILL

Recherche über das Auffinden von Geräten in mobilen Szenarien

Stephan Zepezauer

Dresden, 21. Juli 2011 APPENDIX A. SERVICE DISCOVERY PROTOCOLS 99

1

Inhaltsverzeichnis

1 Einleitung 2

2 Related Work 3

3 Geräteklassen 5

4 Ermittlung der Kommunikationsprotokolle für die einzelnenGeräteklassen 6 4.1 UPnP ...... 6 4.2 SLP ...... 6 4.3 Zeroconf...... 6 4.4 JXTA ...... 7 4.5 Jini...... 7 4.6 Salutation ...... 7 4.7 HAVi ...... 7 4.8 Bluetooth ...... 8 4.9 Geräteklassen und Protokolle ...... 8

5 Beschreibung der Kommunikationsprotokolle 9 5.1 UPnP ...... 9 5.2 SLP ...... 9 5.3 Zeroconf...... 10 5.4 JXTA ...... 10 5.5 Jini...... 10 5.6 Salutation ...... 11 5.7 HAVi ...... 11 5.8 Bluetooth ...... 11

Literaturverzeichnis 13 APPENDIXA.SERVICEDISCOVERYPROTOCOLS 100

1. EINLEITUNG 2

1 Einleitung

Für den Austausch von benutzerspezifischen Daten stehen in mobilen Szenarien unterschiedliche Geräte zur Verfügung. Ein Szenario wäre beispielsweise ein Treffen von Freunden, bei dem digitale Bilder aus- getauscht werden sollen. Robert, Anne und Alice treffen sich bei Bob, der seinen Freunden seine neuesten Urlaubsbilder zeigen möchte. Die anderen Freunde wollen natürlich auch ihre Bilder zeigen und neh- men dafür ihre mobilen Geräte mit zu Bob. Robert hat seine Bilder auf seinem Handy gespeichert. Anne kann mit ihrem Smartphone auf ihre Bilder zugreifen und Alice bringt ihren Laptop mit. Alle vier Freun- de können mit einer Anwendung, die auf den eben genannten Geräten installiert ist, auf die Bilder der anderen zugreifen, ohne komplizierte Netzwerkkonfigurationen vorzunehmen. Der Zugriff auf die Bilder erfolgt durch die Auswahl einer bestimmten Kategorie, der bestimmte Bilder semantisch zugeordnet sind (Pull) oder durch das Senden von Bildern an einen Freund (Push). Weiterhin können nicht alle Bilder der Anderen durchsucht werden, sondern nur diejenigen, die für eine bestimmte Person oder Gruppe freigegebenen sind. Als Bob die Bilder von Anne durchstöbert, entdeckt er ein Bild, das er gern an seine Wand hängen würde und sagt dies Alice. Da Bob einen Drucker besitzt, kann Alice das entsprechende Bild von ihrem Smartphone drücken.

Für das soeben geschilderte Szenario bzw. ähnliche Szenarien werden in Kapitel 3 die notwendigen Ge- räteklassen vorgestellt, mit denen Daten ausgetauscht, gedruckt und angezeigt werden können. Diese Geräteklassen wurden daraufhin untersucht, welche Kommunikationsprotokolle jeweils unterstützt wer- den. Die Ergebnisse dieser Untersuchung werden in Kapitel 4 dargestellt. Das Kapitel 5 beschreibt die einzelnen identifizierten Protokolle. APPENDIXA.SERVICEDISCOVERYPROTOCOLS 101

2. RELATED WORK 3

2 Related Work

Webinos ist ein von der EU finanziertes Projekt, welches das Ziel verfolgt die Kommunikationen zwi- schen Web-Applikationen verschiedener Endgeräte (Mobile Devices, PC, Home Media Devices, In-Car Devices) zu gewährleisten. In einer Demoversion 1 wurde gezeigt wie Webseiten mittels JavaScript und Plugins verschiedene Geräte und Dienste erkennen können. Dabei wurden die folgenden Implementie- rungen für Kommunikationsprotokolle berücksichtigt: Avahi für multicast DNS, OpenSLP für SLP und BlueZ für Blootooth. Weiterhin wurde das Simple Service Discovery Protocol (SSDP), welches ein Teil des UPnP-Standard ist, verwendet.

Das Amigo2 Projekt entwickelt eine Middleware, die die Kommunikation von unterschiedlichen End- geräten unterstützen soll und dabei eine Interoperabilität zwischen Diensten und Geräten herstellt. In einem Heimnetzwerk sollen somit Endgeräte wie Haushaltsgeräte (Kühlschrank, Lichtsystem), Multi- media Player und Renderer sowie mobile Geräte (Handy, PDA) zusammenarbeiten können. Die Kom- munikation kann auch zwischen verschiedenen Heimnetzwerken stattfinden. Das Finden von Diensten basiert auf den Technologien UPnP, SLP und WS-Discovery. Die Interaktion zwischen den Diensten wird durch die Protokolle SOAP und RMI ermöglicht.

Das Projekt The European Application Home Alliance (TEAHA) beschäftigte sich mit der Venetzung von Geräten eines Heimnetzwerkes - vorallem AV-Geräte und Haushaltsgeräte - und deren Steuerungsgerä- te und wurde von 2004 bis 2007 3 durchgeführt. Es wurden Treiber integriert, mit denen verschiedene Endgeräte verbunden werden können. Durch Proxies werden bestimmte Protokolle an bestimmte Tech- nologien gebunden. Dadurch können beispielsweise UPnP fähige Geräte über IP oder European Home System (EHS) eingebunden werden. [SDC+06]

DomoNet ist ein Framework, welches die Integration, die Interoperabilität und die Steuerung von Gerä- ten eines Heimnetzwerkes unterstützt. Dabei werden Dienste durch XML beschrieben und mittels Web Services kontrolliert. Die Integration von verschiedenen Kommunikationsprotokollen (UPnP, Jini, HA- Vi, X10 u.a.) erfolgt durch Manager. Damit lassen sich Geräte auf Basis unterschiedlicher Protokolle integrieren. Es existiert eine Prototypimplementierung, die auf virtuelle Geräte beschränkt ist und den Nachteil hat, dass Geräte, die mehrere Protokolle unterstützen, mehrfach angezeigt werden. [MRA10]

Für die Vernetzung von LANs auf der Basis von UPnP wird in [CKC08] ein Ansatz vorgestellt, bei dem die Benutzer von Informationssystemen eines Campus kontextsensitive Informationen erhalten bzw. Ge- räte - beispielsweise Drucker - steuern können. Durch Gateways, die UPnP-Nachrichten verarbeiten,

1http://www.w3.org/2011/04/discovery.html 2http://www.hitech-projects.com/euprojects/amigo/ 3http://cordis.europa.eu/search/index.cfm?fuseaction=proj.document&PJ_RCN=7871342 APPENDIXA.SERVICEDISCOVERYPROTOCOLS 102

2. RELATED WORK 4

sollen bei diesem Ansatz auch Geräte anderer LANs angesteuert werden.

In [SSAP09] wird ein Forschungsansatz vorgestellt, bei dem Geräte eines UPnP-Netzwerkes kontextab- hängig gesteuert werden können. Dabei werden Benutzerprofile auf einem Server hinterlegt. Die Profile werden von den Geräten ausgewertet, um die Authorisierung für die Gerätebenutzung zu überprüfen. Da UPnP keine Mechanismen für den Austausch von Benutzer- und Kontextinformationen bereitstellt, wurde bei diesem Ansatz der UPnP-Standard um Authentifizierungs- und Authorisierungsmechanismen erweitert.

Ein System für die Ad-Hoc-Vernetzung von mobilen Geräten ist RELATE. Das System verwendet Ul- traschallsensoren, um andere Peers ausfindig zu machen und Daten auszutauschen [HKG+05]. Das User Interface zeigt dabei die relative Position der Geräte durch Icons an. Der Dateiaustausch funktioniert per drag-and-drop, indem Dateien auf die abgebildeten Geräteicons verschoben werden. [GFG+09]

Ein Heimnetzwerk basierend auf JXTA, welches PCs, PDAs, Cell Phones, TV, Kühlschränke etc. ver- bindet wird in [JAKP08] vorgestellt. Zwischen den Geräten werden Konextinformationen ausgetauscht, um bei der Suche nach einem Dienst bzw. einer Information eine kontextabhängige Antwort zu erhalten. Ein Beispielszenario besteht darin, dass nach Fotos gesucht werden kann, die einer bestimmten Gruppe bzw. Kontext angehören und auf verschiedenen Geräten verteilt sind. APPENDIXA.SERVICEDISCOVERYPROTOCOLS 103

3. GERÄTEKLASSEN 5

3 Geräteklassen

In diesem Kapitel werden jene Geräteklassen aufgeführt, die daraufhin untersucht werden sollen, welche Protokolle für das Auffinden (Discovery) von anderen Geräten in der unmittelbarer Umgebung und deren Dienste unterstützt werden.

Geräteklassen: • Computer • Laptop • Handy • Smartphone • Drucker • AV Geräte

Aus verschiedenen Quellen (Forschungsarbeiten, Internet), die im weiteren Text referenziert sind, wur- den die am meisten verwendeten Kommunikationsprotokolle für alle Geräteklassen ermittelt. Dies sind die Protokolle UPnP, SLP, Zeroconf, JXTA, Jini, Salutation, HAVi und Bluetooth. Um eine Zuordnung von einem Protokoll zu einer Geräteklasse herzustellen und damit die für eine Geräteklasse möglichen Protokolle zu bestimmen, werden im nächsten Kapitel die genannten Protokolle näher untersucht. APPENDIXA.SERVICEDISCOVERYPROTOCOLS 104

4. ERMITTLUNG DER KOMMUNIKATIONSPROTOKOLLE FÜR DIE EINZELNEN GERÄTEKLASSEN 6

4 Ermittlung der Kommunikationsprotokolle für die einzelnen Geräteklassen

In diesem Kapitel werden nun die zuvor genannten Protokolle daraufhin untersucht, für welche Geräte- klassen diese einsetzbar sind. Der letzte Abschnitt dieses Kapitels fasst die ermittelten Resultate in einer Tabelle zusammen und stellt dar, welche Protokolle eine Geräteklasse im allgemeinen unterstützt.

4.1 UPnP

UPnP wird von den meisten Geräten unterstützt, die eine Kommunikation auf der Grundlage von IP zu- lassen. Das UPnP SDK für Linux 1 kann beispielsweise auf Desktop PCs und Laptops getestet werden. Eine UPnP Unterstützung ist für die Nokia Handys N80 gegeben und weitere Handys dieser Serie (sie- he 2 3). Mit dem kostenlosen Twonky Mobile App kann ein Android Smartphone als UPnP Streaming Server eingesetzt werden (siehe 4). Weiterhin können Drucker und AV-Geräte in die Kommunikation eingeschlossen werden. Für AV-Geräte existiert das Protokoll UPnP AV (siehe 5).

4.2 SLP

OpenSLP ist eine OpenSource-Implementierung von SLP, die für Linux Betriebssysteme erhältlich ist. Somit kann SLP auf Geräten wie Desktop PCs und Laptops betrieben werden. Hewlett-Packard und Xe- rox benutzen SLP in ihren Druckern (siehe 6). jSLP ist eine Java-Implementierung von SLP, die u.a. bei Servern, Desktop PCs und Smartphones (Nokia 9300i) eingesetzt wird (7).

4.3 Zeroconf

Zeroconf kann in IP-Netzen betrieben werden und ist somit für Desktop PCs und Laptops geeignet. Ava- hi ist eine Zeroconf-Implementierung mit der auch Drucker gefunden werden können (siehe 8). Bonjour ist die Zeroconf-Implementierung von Apple. Es ist mithilfe der MjDNS Java Bibliothek möglich iTunes (auf PC/MAC) von einem iPhone (Smartphone) aus zu steuern. Das Advertisment des Dienstes auf dem

1http://sourceforge.net/projects/upnp/ 2http://www.netzwelt.de/news/72871_3-dvb-h-upnp-neuen-nokia-handys.html 3http://www.allaboutsymbian.com/features/item/Setting_Up_uPnP_on_Your_Nseries_Smartphone.php 4http://blog.buerstinghaus.net/twonky-mobile-android-als-upnp-server/l 5http://upnp.org/resources/whitepapers/UPnP Device Management White Paper_2011.pdf 6http://www.openslp.org/doc/html/IntroductionToSLP/index.html 7http://jslp.sourceforge.net/jSLP/index.html 8http://avahi.org/ APPENDIXA.SERVICEDISCOVERYPROTOCOLS 105

4. ERMITTLUNG DER KOMMUNIKATIONSPROTOKOLLE FÜR DIE EINZELNEN GERÄTEKLASSEN 7

iPhone wurde dabei mit Avahi durchgeführt (siehe 9).

4.4 JXTA

Mit JXTA kann laut Herstellerangaben jedes zu einem P2P-Netzwerk zugehörige Gerät mit anderen kom- munizieren (siehe 10). Mobile Geräte, die MIDP 2.0 unterstützen, können mit der JXTA Micro Editi- on an der Kommunikation teilnehmen (siehe 11). MIDP 2.0 unterstützt u.a. das Abpielen von Sound, Streaming-Videos und HTTPS. Weiterhin ist es bei dieser Technologie möglich, dass ein Handy mit ei- nem Drucker kommuniziert (siehe 12).

4.5 Jini

Das Jini Starter Kit 13 ist für die gängigsten Betriebssysteme verfügbar und somit für den Einsatz auf Desktop PCs und Laptops geeignet. Weiterhin ist die Integration von Druckern, Displays und Disks mög- lich [LH02]. Jini-Anwendungen können mit der J2ME-Technologie entwickelt werden und sind somit von Anwendungen auf mobilen Geräten nutzbar, die die J2ME-Technologie unterstützen (siehe 14).

4.6 Salutation

Über Referenzimplementierungen des Salutation Standards sind sehr wenige Quellen zu finden. Das Sa- lutation Application Reference Model aus dem Jahr 1997 sollte auf den Betriebssystemen Windows 95 und Windows NT laufen, und Softwareentwickler bei der Entwicklung von Applikationen unterstützen 15. Die Architektur schließt auch Dienste wie Drucker und Scanner ein [YA04].

4.7 HAVi

Es gibt verschiedene Typen von Geräten, die HAVi unterstützen. HAVi-Geräte, die die gesamte Funktio- nalität von HAVi umfassen, werden als Full AV bezeichnet und können Settop-Boxen, Home-Gateways oder PCs sein. Steuerungsgeräte müssen nicht einmal HAVi Softwarekomponenten enthalten. Es reicht aus, dass diese Geräte Java Bytecode verarbeiten können, um andere Geräte zu steuern. ([Tei])

9http://dacp.jsharkey.org/ 10http://jxta.kenai.com/ 11http://weblogs.java.net/blog/2008/02/02/new-jxta-micro-edition-cldcmidp-20 12http://www.java.net/blogs/tra/ 13http://www.jini.org/wiki/Category:Jini_Starter_Kit 14http://entwickler.de/zonen/portale/psecom,id,102,buch,127,.html 15http://www.thefreelibrary.com/Salutation+Consortium+to+Produce+Reference+Implementation.-a019945408 APPENDIXA.SERVICEDISCOVERYPROTOCOLS 106

4. ERMITTLUNG DER KOMMUNIKATIONSPROTOKOLLE FÜR DIE EINZELNEN GERÄTEKLASSEN 8

4.8 Bluetooth

Programmierbibliotheken für Bluetooth gibt es in Python, C16 und Java17. Bluetooth ist ein Industrie- standard, der in vielen Geräten integriert ist bzw. durch zusätzliche Adapter genutzt werden kann. Somit können beispielweise mobile Geräte untereinander Daten austauschen oder mit PCs und Laptops kom- munizieren. Weiterhin sind Audio- und TV-Geräte erhältlich, die das Bluetooth-Protokoll unterstützen.

4.9 Geräteklassen und Protokolle

Durch die Untersuchung der Protokolle daraufhin, welche Geräteklasse diese unterstützen, kann nun eine Tabelle erstellt werden, in der ersichtlich ist welche Protokolle von eine Geräteklasse unterstützt werden (siehe 4.1).

Computer Laptop Smartphone Handy Drucker AV Geräte UPnP X X X X X X SLP X X X X Zeroconf X X X X JXTA X X X X Jini X X X X Salutation X X X HAVi X X X Bluetooth X X X X X X

Tabelle 4.1: Geräteklassen und Protokolle

16http://people.csail.mit.edu/albert/bluez-intro/ 17http://www.javabluetooth.com/development_kits.html APPENDIXA.SERVICEDISCOVERYPROTOCOLS 107

5. BESCHREIBUNGDERKOMMUNIKATIONSPROTOKOLLE 9

5 Beschreibung der Kommunikationsprotokolle

Die verschiedenen Protokolle zur Kommunikation mit veschiedenen Endgeräten werden in diesem Ka- pitel etwas genau vorgestellt. Dabei werden die Netzwerkeigenschaften der Protokolle beschrieben und dargestellt, auf was für einer Grundlage ein Discovery von Geräten bzw. Diensten erfolgt. Dies soll dazu dienen, dass eine Grundlage für die Auswahl von bestimmten Protokollen für die technische Umsetzung des Szenarios gegeben ist.

5.1 UPnP

Universal Plug and Play (UPnP) wird in dem Bereich Heim - und kleinere Geschäftsnetzwerke angewen- det, wobei es den Datenaustausch bzw. den Austausch von Steuerungsinformationen zwischen verschie- denen PCs und mobilen Endgeräten ermöglichen soll. Die UPnP-Technologie ist laut Herstellerangaben unabhängig von einem zugrundeliegenden Betriebssystem, einer Programmiersprache sowie der Netz- werktechnologie (??). Bei den verwendeten Netzwerktechnologie handelt es sich um die IP-basierten Technologien TCP und UDP [upn02]. Es werden beispielsweise die Technologien Ethernet, Wi-Fi und IEEE 1394 unterstützt (siehe 1). Ein Verzeichnisdienst für das Finden von Diensten wird bei UPnP nicht verwendet. ([ZMN05])

Das Discovery von Geräten und Diensten erfolgt mithilfe des Protokolls Simple Service Discovery Protocol (SSDP). Für das Suchen nach Geräten oder das Advertisement von Geräten werden HTTP- Nachrichten über Multicast UDP versendet. Falls ein Gerät die Suchkriterien erfüllt, wird eine Nachricht über Unicast UDP an das suchende Gerät geschickt, die die URL für die Beschreibung des Gerätes bein- haltet ([upn02]).

5.2 SLP

Das Service Location Protocol (SLP) unterstützt das Auffinden, das Selektieren und die Ausnutzung von Netzwerkgeräten in einem IP-basierten Netzwerk. Eine wesentliche Eigenschaft bei dessen Entwurf war die Skalierbarkeit des Protokolls ([FDW+]). Der Austausch von Nachrichten erfolgt über UDP oder TCP (siehe 2). Ein Verzeichnisdienst für das Finden von Diensten kann bei SLP optional verwendet werden ([ZMN05]).

Ein Client (User Agent), der einen bestimmten Dienst sucht, sendet eine Multicastanfrage über das Netz- werk. Falls ein Agent eines Dienstes erkennt, dass er für den gesuchten Dienst zuständig ist, schickt er

1http://upnp.org/about/what-is-upnp/ 2http://www.ietf.org/rfc/rfc2608.txt - Abschnitt 6.1/6.2 APPENDIXA.SERVICEDISCOVERYPROTOCOLS 108

5. BESCHREIBUNGDERKOMMUNIKATIONSPROTOKOLLE 10

eine Unicastantwort an den Client. Um alle angebotenen Dienste ausfindig zu machen, verwendet ein Client die Technik „convergence multicast“, bei der nur die für den Client unbekannten Dienste antwor- ten ([FDW+]). Somit lässt sich der Datenverkehr verringern. Das Advertisement von Diensten durch deren Agenten wird unterstützt ([ZMN05]).

5.3 Zeroconf

Zeroconf ist eine IP-basierte Technologie mit der sich Geräte eines Netzwerkes bemerkbar machen kön- nen und andere Geräte gesucht werden können (Advertisemnet, Discovery). Mit Zeroconf ist es mög- lich IP-Adressen - ohne dem Einsatz eines DHCP-Servers - zu vergeben und Namen auf IP-Adressen (mithilfe von multicast DNS3) - ohne dem Einsatz eines DNS-Servers - abzubilden. Die Zeroconf- Schichtenarchitektur bietet das Austauschen einzelner Schichten an. Somit ist es möglich ein anderes als das Zereconf Adressprotokoll zu verwenden ([zer07]). Dienste können mit DNS Service Discovery gefunden werden, ohne dass ein Verzeichnisdienst vorhanden sein muss (siehe 4).

Durch die Verwendung von mDNS und DNS Service Record kann ein Advertisement von Diensten erfolgen. Es kann nach bestimmten Diensten bzw. Diensten, die einem bestimmten Typ entsprechen, gesucht werden. ([zer07])

5.4 JXTA

JXTA ist eine Technology, mit der unterschiedliche Endgeräte in einem P2P-Netzwerk kommunizieren und kollaboriert zusammenarbeiten können. Mobile Geräte, TV-Geräte, Kühlschränke etc. können auch an der Kommunikation teilnehmen. Es gibt drei Peers, die für den Nachrichtenaustausch zuständig sind. Mit dem Edge Peer können Nachrichten von einem Peer zu einem anderen gesendet werden. Der Ren- dezvous Peer liefert Informationen über andere Peers. Mit dem Relay Peer können mobile Geräte und Desktop PCs Nachrichten austauschen. [JAKP08]

5.5 Jini

Die Jini Technologie ist eine service-orientierte Architektur, die ein Programmiermodell definiert, das die Java Technologie erweitert und zur Implementierung von Verteilten System - u.a. auch adaptive Netz- werksysteme - verwendet wird (siehe 5). Die Kommunikation zwischen Client und Dienstanbieter wird durch einen Verzeichnisdienst (lookup Service) ermöglicht ([LH02]).

Das Registrieren eines Dienstes erfolgt, indem der Dienstanbieter durch eine multicast-Anfrage nach einem entfernten lookup Service sucht. Wenn dieser gefunden wurde kann der Anbieter ein Dienstobjekt und die Dienstattribute registrieren. Das Dienstobjekt beinhaltet die Java-Methoden des Dienstes, die von

3http://www.multicastdns.org/ 4http://www.ietf.org/rfc/rfc2608.txt - Abschnitt 6.1/6.2 5http://www.jini.org/wiki/Main_Page APPENDIXA.SERVICEDISCOVERYPROTOCOLS 109

5. BESCHREIBUNGDERKOMMUNIKATIONSPROTOKOLLE 11

einem Client aufgerufen werden können. Ein Client kann bei einem lookup Service anhand des Typs bzw. der Attribute des gesuchten Dienstes eine Kopie des Dienstobjektes erfragen. Über diese Kopie kann der Client direkt mit dem Dienstanbieter kommunizieren ([LH02]). In [ZMN05] wird erwähnt, dass auch Advertisements möglich sind.

5.6 Salutation

Salution ist eine Technology, mit der verschiedene Endgeräte miteinander kooperieren können, wobei eine Unabhängigkeit in Bezug auf verwendete Prozessoren, dem Betriebssystem und dem Kommunika- tionsprotokoll angestrebt wird ([LH02]). Die Entwicklung von Salutation begann 1995 und wurde von dem Salutation Konsortium durchgeführt ([ALX+07]). Das Salutation Konsortium wurde 2005 aufgelöst 6.

Aus technisch Sicht ist Salutation ein dezentralisiertes System, dessen Hauptkomponente der Salutation Manager (SLM) ist. Jedes Gerät implementiert einen lokalen SLM, der Informationen über lokale Dien- ste bereitstellt und als Service Broker aggiert. Verschiedene SLMs können einander lokalisieren und sind dabei unabhängig von dem Transportmedium, dessen Funktionalität durch Transport Manager bereitge- stellt wird. ([LH02], [ALX+07])

Die SLMs kommunizieren miteinander um Dienste zu finden und nutzen dafür das Salutation Manager Protocol, welches Suns ONC RPC für entfernte Dienstaufrufe einsetzt. Die Kooperation der SLMs ba- siert auf verteilten Managern und ist aus konzeptioneller Sicht das gleiche Verfahren wie bei dem lookup Service von Jini. ([ALX+07])

5.7 HAVi

Home Audio Video Interoperability (HAVi) ist eine Schnittstelle mit der AV-Geräte angesteuert werden können. HAVi basiert auf der Verbindungstechnik IEEE-1394 (FireWire), mit der es möglich ist Mul- timediastreams zu übertragen und Konfigurationen während der Laufzeit vorzunehmen. Jeder Dienst bietet über eine lokale Registry bestimmte Softwarekomponenten an, die u.a. für die Steuerung von Ge- räten, das Streaming von Multimedia-Dateien und die Reservierung von Ressourcen verantwortlich sind. ([FDW+])

Um einen Dienst des Heimnetzwerkes ausfindig zu machen, muss ein Client-Gerät mit den sogenann- ten HAVi Registry Processes kommunizieren, deren Instanzen auf den AV-Controllern des Heimnetzes installiert sind. ([FDW+])

5.8 Bluetooth

Mit Bluetooth können verschiedene Geräte über eine Funkverbindung miteinander gekoppelt werden. Falls sich ein Geräte mit einem anderen verbinden will, dessen Hardware-Adresse unbekannt ist, sendet

6http://web.archive.org/web/20050627074915/http://www.salutation.org/ APPENDIXA.SERVICEDISCOVERYPROTOCOLS 110

5. BESCHREIBUNGDERKOMMUNIKATIONSPROTOKOLLE 12

es eine Inquiry-Nachricht und danach eine Page-Message. Die Inquiry-Nachricht fällt weg, falls die Hardware-Adresse des anderen Gerätes bekannt ist. Nach dem Senden einer Inquiry-Nachricht wartet das Gerät auf eine Antwort. Falls diese eintrifft, kann die Page-Message direkt an das dazugehörige Gerät gesendet werden [EY02]. Eine asynchrone Kommunikation bei durch Bluetooth auch unterstützt [MG11]. APPENDIXA.SERVICEDISCOVERYPROTOCOLS 111

Literaturverzeichnis 13

Literaturverzeichnis

[ALX+07] AHMED, REAZ ; LIMAM, NOURA ; XIAO, JIN ; IRAQI, YOUSSEF ; BOUTABA, RAOUF: RESOURCE AND SERVICE DISCOVERY IN LARGE-SCALE MULTI-DOMAIN NETWORKS. IEEE Communications Surveys And Tutorials, 2007

[CKC08] CHEN, Wally ; KUO, Sy-Yen ; CHAO, Han-Chieh: Service integration with UPnP agent for an ubiquitous home environment. Information Systems Frontiers, 2008

[EY02] ERASALA, Naveen ; YEN, David C.: Bluetooth technology: a strategic analysis of its role in global 3G wireless communication era. Computer Standards and Interfaces, 2002

+ [FDW ]FRIDAY, Adrian ; DAVIES, Nigel ; WALLBANK,Nat;CATTERALL, Elaine ; PINK, Stephen: Supporting Service Discovery, Querying and Interaction in Ubiquitous Computing Envi- ronments. Wireless Networks - Special issue: Pervasive computing and communications - Volume 10 Issue 6

+ [GFG 09] GELLERSEN, Hans;FISCHER, Carl; GUINARD, Dominique ; GOSTNER, Roswitha ; KOR- TUEM, Gerd ; KRAY, Christian ; RUKZIO, Enrico ; STRENG, Sara: Supporting device dis- covery and spontaneous interaction with spatial references. Personal and Ubiquitous Com- puting, 2009

+ [HKG 05] HAZAS,Mike;KRAY, Christian ; GELLERSEN, Hans;AGBOTA, Henoc ; KORTUEM, Gerd ; KROHN, Albert: A Relative Positioning System for Co-located Mobile Devices. Proceedings of the 3rd international conference on Mobile systems, applications, and services, 2005

[JAKP08] JUNG, Hyosook ; AHN, Jinhyun ; KIM, Heejin ; PARK, Seongbin: A JXTA-based system for Smart Home. Proceedings of MoMM, 2008

[LH02] LEE, Choonhwa ; HELAL, Sumi: PROTOCOLS FOR SERVICE DISCOVERY IN DYNAMIC AND MOBILE NETWORKS. International Journal of Computer Research, 2002

[MG11] MOVAHHEDINIA, Naser ; GHAHFAROKHI, Behrouz S.: Performance analysis of Bluetooth asynchronous connection-less service. Journal of Network and Computer Applications, 2011

[MRA10] MIORI, Vittorio ; RUSSO, Dario ; ALIBERTI, Massimo: Domotic technologies incompatibi- lity becomes user transparent. Communications of the ACM, 2010

+ [SDC 06] SCHOLTEN, Hans ; VAN DIJK, Hylke ; COCK, Danny D.; PRENEEL, Bart; D’HOOGE, Mi- chel ; KUNG, Antonio: Secure Service Discovery in Home Networks. Consumer Electronics, 2006 APPENDIXA.SERVICEDISCOVERYPROTOCOLS 112

Literaturverzeichnis 14

[SSAP09] SALES, Thiago ; SALES, Leandro ; ALMEIDA, Hyggo ; PERKUSICH, Angelo: Enabling User Authentication and Authorization to Support Context-Aware UPnP Applications. Web- Media 2009 Proceedings of the XV Brazilian Symposium on Multimedia and the Web, 2009

[Tei] TEIRIKANGAS, Jussi: HAVi: Home Audio Video Interoperability. Helsinki University of Technology

[upn02] Programming Guide - Intel SDK for UPnP Devices Version 1.2.1. Intel, 2002

[YA04] YUAN, Yuan ; AGRAWALA, Ashok: A Secure Service Discovery Protocol for MANET. Personal, Indoor and Mobile Radio Communications, 14th IEEE Proceedings, 2004

[zer07] Zero Configuration Networking with Symbian OS. Novel Interactions Ltd, 2007

[ZMN05] ZHU,Feng ; MUTKA, Matt W. ; NI, Lionel M.: Service Discovery in Pervasive Computing Environments. IEEE CS and IEEE ComSoc, 2005 APPENDIX B. SCENARIO-BASED EVALUATION 113

B Scenario-based Evaluation APPENDIX B. SCENARIO-BASED EVALUATION 114

Test Cases (Part 1) Choose a value between 1 and 5, if the behavior allows such a classification. Choose yes or no, if the behavior is binary. 1-5 yes/no T1 Enable auto-discovery action: enable USB auto-discovery behavior: success information T2 Device visibility action: connect USB device behavior: automatic device list update T3 Resource visibility action: list of devices double-click new USB device behavior: shows device properties and resource information T4 Resource visibility action: list of resources behavior: shows capacity information and device information T5 Enable semi-automatic action: discovery enable Bluetooth activate scan behavior: show list of new Bluetooth devices T6 Connect to device action: double-click new Bluetooth device enter pin-code and confirm behavior: success information update device list T7 Modify resource action: information list of resources resource of Bluetooth device modify capacity values behavior: success information T8 Manual discovery action: manual discovery view choose protocol, enter address, confirm behavior: success information update resource list APPENDIX B. SCENARIO-BASED EVALUATION 115

Test Cases (Part 2) Choose a value between 1 and 5, if the behavior allows such a classification. Choose yes or no, if the behavior is binary. 1-5 yes/no T9 Detect Friend’s Device action: connect USB device behavior: automatic resource list update shows owner of the resource T10 Attach resource label action: for provisioning initialize the creation of the resource label behavior: success information shows resource that is labeled T11 Release action: provisioning resource check resource for utilization behavior: shows utilization information action: release resource from utilization behavior: success information shows released resource