The Network Software book 2011

Fredrik Abbors, Thomas Forss, Alonso Gragera, Petri Heinonen, Nico Hållfast, Niclas Jern, Miikka Kaarto, Tony Karlsson, Md. Nazmul Haque Khan, Jessica Laukkanen, Andres Ledesma, Sushil Pandey, Alexander Pchelintsev, Joacim Päivärinne, Fredrik Rantala, Haider Raza, Gema Román, Sumreen Mohsin Saleemi, Björn Sjölund, Kasper Välimäki, Frank Wickström, Maria Yanchuk, Guopeng Yu, Carlo Zambon

Preface

In this collection we have gathered the papers written by the participants to the Network Software course held during January-May 2011 at Åbo Akademi University, Department of Information Technologies. The students had to choose and research a topic related to the course theme, write a paper about it, get the paper reviewed by their colleagues and present the paper in class.

This year’s course was a great success, with 24 students finishing the course: six Computer Science students, ten Computer Technology students, three Information Systems students and five exchange students. The range of topics was very broad, from social networks and virtual worlds to home networking, private smart spaces, cloud and grid computing, fault tolerance, performance testing, security, context awareness, zigbee systems, IPv6, Host Identity Protocol, location privacy, load balancing, RIA architectures, Wi-Fi location awareness, energy-awareness, CRC, microchip implantations in animals and last but not at all least, public key cryptography (RSA) and quantum cryptography.

The order of the papers in this collection is not thematic; instead, we have kept the order in which the students registered to work on their papers.

Enjoy the papers!

Luigia Petre Turku, June 20, 2011 Petter Sandvik

http://www.users.abo.fi/lpetre/netsoft11/ Table of Contents

Nico Hållfast ...... Grid Computing versus Cloud Computing Maria Yanchuk...... Home Networking: the Smart House Concept Alonso Gragera...... Quantum Cryptography on Computer Networks Sushil Pandey...... Security Issues in Wireless Communications Tony Karlsson...... Fault Tolerance Methods for Ethernet Networks Gema Román...... The Networking Behind Carlo Zambon...... Public-key cryptography and the RSA Algorithm Fredrik Abbors...... Software Performance Testing in the Cloud Frank Wickström...... The ZigBee Technology and its Applicability Sumreen Mohsin Saleemi...... Personal Smart Spaces: Future Vision and Services Fredrik Rantala...... CRC - Cyclic Redundancy Checking Alexander Pchelintsev...... IPv6 Jessica Laukkanen...... Context-Aware Systems Thomas Forss...... Load Balancing Andres Ledesma...... Virtual Worlds Petri Heinonen...... Introduction to Rich Internet Application Architectures Md. Nazmul Haque Khan...... An overview of Network Security and Cryptography Miikka Kaarto...... Security in Cloud Computing Björn Sjölund...... Indoor Location Aware Computing Using Wi-Fi Networks Niclas Jern...... Location Privacy Joacim Päivärinne...... Online Social Networks Haider Raza...... Host Identity Protocol (HIP) Guopeng Yu...... Energy-Aware Networking Kasper Välimäki...... Microchip Implantations in Animals

1 Grid Computing versus Cloud Computing Nico Hallfast˚ Network Software Abo˚ Akademi, Turku, Finland [email protected]

Abstract—Grid computing is a technology which can computing paradigms, we compare the architecture provide large computing resources over a network. and features of the two as well as security, A technology very similar to grid computing, cloud software development and practical applications. In computing, is also able to provide the end-user with a the concluding section, Section V, we will discuss vast amount of computing resources. Due to the rising popularity of cloud computing, its definition has become our findings. very vague; therefore, a clear distinction between grid computing and cloud computing may be hard to find. In II.WHAT IS GRID COMPUTING? this paper we provide the reader with clear definitions of grid computing and cloud computing, as well as We understand the word grid to be a network provide examples of when to use each of these computing of horizontal and vertical bars sometimes used paradigms. We compare grid and cloud computing with as a base to build something on, and sometimes each other in order to uncover a better understanding of the advantages and disadvantages of these technologies. used to connect several things to each other. Grid computing could then be understood as having a Index Terms—grid computing, cloud computing, saas, close relation to our assumptions of the word grid. distributed computing Indeed sometimes grid computing is simply referred to as the Grid [4]. I.INTRODUCTION Grid computing has its origins in the USA LOUD computing has been a buzzword in the in the early 1990s, when efforts were made to IT-community for a few years. It has been connect several supercomputers together to provide said to change the way we use computers since it testbeds for scientifical use. These testbeds were lets the user control great amounts of computing not yet defined as being a grid, but instead power without the user having to invest in the these interconnected supercomputers were called hardware itself [1]. Another distributed computing metasystems and metacomputers [5]. The National paradigm, grid computing, also lets the user control Technology Grid, founded in 1997, was the first almost supercomputer-like computing power [2], computer system that was defined as being a [3]. Due to the rising popularity of cloud computing, grid of computers. The founders of the National the definition of it has been hard to find. Cloud Technology Grid, the National Science Foundation, computing has in some instances been mixed up thought of the grid as being “a name derived from with grid computing, in some discussions cloud the notion of the electrical power grid” [5]. computing and grid computing have even been In 1998 the Global Grid Forum was founded with claimed to be the same computing paradigm [1]. the goal of creating open standards for grids [6]. The In this paper we will give clear definitions of both Global Grid Forum also defined a computational grid computing and cloud computing. We will also grid as “a hardware and software infrastructure compare the two computing paradigms in order to that provides dependable, consistent, pervasive, create a better understanding of what grid or cloud and inexpensive access to high-end computational computing truly is. We will achieve this by first capabilities”. This definition was based on a giving definitions of both grid computing and cloud three point checklist given by Foster [4]. This computing. This is done in Sections II and III. In checklist proposed three characteristics that should Section IV we will focus on comparing the two be achieved by a grid: 2

1) A grid coordinates resources that are not subject to centralized control. 2) A grid uses standard, open, general-purpose protocols and interfaces. 3) A grid delivers nontrivial qualities of service. The checklist given by Foster has since been updated [5] in order to provide a more Figure 1. The different layers of cloud computing. SaaS includes comprehensive grid definition: “A grid can be the software that is provided to the end user, while utility computing defined as a large-scale geographically distributed includes the hardware that runs the software. hardware and software infra-structure composed of heterogeneous networked resources owned and IV. COMPARISON shared by multiple administrative organizations In this section we compare cloud computing which are coordinated to provide transparent, and grid computing from several perspectives. In dependable, pervasive and consistent computing Section IV.A we focus on the architecture of the support to a wide range of applications. These two computing paradigms, in Section IV.B we applications can perform either distributed study their security aspects, in Section IV.C we computing, high throughput computing, on-demand discuss software development and in Section IV.D computing, data-intensive computing, collaborative we outline their practical applications. computing or multimedia computing.” Grids are typically used for executing a job. A job A. Architecture and features is defined as being a program that is executed at an appropriate point on the grid. The jobs may compute As described in Section II, one of the first grid something, execute one or more system commands, computing platforms was designed by connecting move or collect data, or operate machinery [3]. several supercomputers together, in order to form Depending on the job and grid type, there may be a one big metacomputer. Therefore, in the most basic limit to how many jobs may be run concurrently on terms, a grid is made of several computers, or nodes, the grid. In this case a scheduler is used to control connected together to form one large computing the job execution [3]. unit, as illustrated in Figure 2. The connected nodes may all be running different operating systems, have different CPUs and memory, therefore grids are III.WHAT IS CLOUD COMPUTING? referred to as being heterogeneous systems [3]. Cloud computing, sometimes referred to as the Grids can be divided into two basic architectural Cloud, consists of software that is simultaneously types: computational grids and data grids [3]. The run on several computing units and provided to main application of computational grids is to use an end-user via a network - usually the Internet the processing power of available nodes to process [1]. Cloud computing also includes the hardware large-scale jobs. While computational grids are and software systems which offer the software used for aggregating resources, data grids focus on services [1]. The services can be divided into two providing secure access to distributed pools of data different layers: Software as a Service (SaaS) and [3]. A user may therefore access several distributed utility computing, as illustrated in Figure 1, however, databases via one single virtual database. utility computing is sometimes further divided into In addition to this classification, both two sub layers: Platform as a Service (PaaS) and computational grids and data grids may further Infrastructure as a Service (IaaS). In this paper, be classed as being intragrids, extragrids or we use the former definition of two classes, since intergrids [3]; intragrids are used within one it sometimes may be difficult to decide whether a single organization, extragrids between several service belongs to PaaS or IaaS [7]. organizations and intergrids are several grids Core features of cloud computing are scalability, connected to each other. Rather than being formed agility, multitenancy and cost-efficiency [1]. We and used by private persons, grids are primarely discuss these features in more detail in the following used by one or several organizations to satisfy their section. computational needs [3]. 3

that are available to be used for his or her service. Since clouds are scalable, they are able to provide a seemingly limitless amount of resources to the user. Even though some grids are considered as being scalable, scalability is not considered as being a core feature of grid computing [4]. Since cloud computing offers scalable services, it is also very cost-efficient. This is due to the fact that the user only pays for the computing resources he or she has used [1]. Whereas in grid computing, the user has to invest in advance in the resources he or she will have available for use. An important feature of cloud computing is virtualization. Virtualization provides the user with an abstraction of the available computing resources Figure 2. A basic computational grid. [8] i.e. it gives the user the illusion of interacting directly with the hardware of the cloud [10]. Virtualization together with scalability make clouds The control server, or scheduler, displayed more flexible compared to grids. in Figure 2 has the responsibility of delegating incoming jobs to free nodes in the grid [3]. In basic computational grids, the control server may partake B. Security in the execution of the jobs, in more advanced In this section we focus on comparing how grids grids however the control server is solely dedicated and clouds handle security. The area of data security to job scheduling. Grids configured as being in is vital for both grid and cloud computing, since scavenging-mode have the free nodes themselves they both handle great amounts of data in their report to the scheduler when they are available, after computations [1], [3]. Since the subject of security which the scheduler may assign a suitable job for is very broad, we focus on how grids and clouds the free nodes. [3]. handle data security. The two areas of data security In Section III we explained that cloud computing we consider are: authentication and data storage. It services can be divided into two different layers: should be noted that many standards exist on how SaaS or utility computing. The architecture of the for example authentication can be handled within cloud itself is also divided into two components: grids and clouds, and that in this paper only we front end and back end. The front end is regarded present a few of these standards. as being the applications used to access the cloud With authentication we understand the process of itself, whilst the back end is the fabric of the verifying the validity of a claimed individual and cloud (e.g. databases) [7]. The fabric of the cloud identifying who he or she is. Authentication is not may be implemented much like the hardware of limited to human beings, as authentication may be a grid, with several interconnected computers with done for services and applications alike [3]. With access to common databases [9]. Much like grids, security of data storage we understand the fact that clouds can be classified as private clouds (the stored data may not be viewed, altered or deleted access to the cloud is restricted for example only by an unauthorized user. for members within an organization), public clouds Within grid computing, authentication can be (can be accessed by anybody) or hybrid clouds (a handled by using digital certificates and asymmetric combination of the two; essentially a public cloud, cryptography. This is, for example, used by the but some resources from private clouds could also Globus Toolkit, an open source toolkit for building be accessed by certain users) [1]. grids [11]. The base for authentication within the Cloud computing offers the users scalable toolkit is the certificate authority, which is used as a services [1]. This means that a user of some cloud link between the hosts willing to communicate with computing service may easily scale up the resources each other. When host A wishes to communicate 4 with host B, they both exchange certificates that does not have the immediate control over the data have been signed by the certificate authority and stored in the cloud, the user can not say for certain check whether the other part is really who they that data is correctly stored in the cloud [1]. To claim to be. Secure Sockets Layer (SSL) is used solve this issue the cloud may provide continuous to provide security in the communication between backups of data stored in the cloud, for example the the two hosts [11]. Amazon Elastic Computer Cloud or Amazon EC2 Authentication in cloud computing can be done in provides its users with free backups of the user’s the same fashion as in grid computing. For example, relational databases [13]. In addition to this, data if a user wishes to access a web application stored in the cloud can be encrypted in order to deployed on a cloud infrastructure, the web server provide confidentiality on the stored data [1]. will send the browser a certificate signed by a certificate authority. The browser will then check the validity of the certificate and whether it came from C. Software development a trusted source [12]. This check will provide the In this section we discuss and compare authenticity of the web server; for user authenticity, software development practices for grid and cloud the user may be asked to fill in a username and computing. This section, together with the next password. section of practical applications give an insight on The user base of grids and clouds differ from what grids and clouds can be used for, and how one another. Grids are mainly used by organizations software development can be done for them. It to solve large jobs, whereas clouds have a varying should again be noted that the software development user base [1], [3]. Since clouds may have both models and techniques we present in this section individuals and organizations as users, the issue of do not represent all the available models and data storage in clouds is very important [1]. This is techniques, but merely a small subset of them. very different from grids. The organizations using Software development for grids does not the grids may have agreements on how data is differ radically from that of traditional parallel handled for them; for example, the organizations programming. The user must however take into may all have local databases which the grid then account the differences in the heterogeneity may access (see Figure 3). In such cases the of the resources, the stability of the resources responsibility of data storage is largely moved from and the fact that resources may join and leave the grid to the organizations themselves [3]. the grid during runtime [10]. One approach for developing parallel programs for grids is to use Message-Passing-Interface (MPI) [10]. MPI addresses the message-passing parallel programming model, in which ”data is moved from the address space of one process to that of another process through cooperative operations on each process” [14]. MPICH-G2 is a specific MPI implementation to be used in grid computing, for example MPICH-G2 provides integration with the Globus toolkit. MPICH-G2 is suited for usage in virtually any grid, since it allows heterogeneous grid components to communicate and interact [10]. Software development for clouds is closely dependant on the architecture of the cloud. For Figure 3. An extragrid where each organization has its own local database (marked as being orange in the figure). [3] example, Amazon EC2 offers the ability to run almost any operating system on the users In clouds, data is stored only within the cloud; cloud instance [13]. Therefore the user may this means that no data is stored locally at the user develop software in virtually any programming [1]. This issue of data locality is key in providing language. The drawback with this freedom of secure data storage within the clouds. Since the user choice lies in developing scalable software. Some 5 cloud platforms, such as Google App Engine (a platform for developing web applications), offer the user automatic scaling features which will automatically make applications scalable. The choices of programming languages that may be used for Google App Engine is considerably narrower than for Amazon EC2: software developers for Google App Engine have only the choice of either Java (or any language that is compatible with the Java Virtual Machine) or Python [15]. Software developed for one cloud platform may not necessarily run on other cloud platforms. For example, a web application developed to run on the Google App Engine, may not be portable to an Amazon EC2 instance [16]. In such cases, the cloud user is said to experience a data lock-in or vendor lock-in [1]. To solve this issue, standardized APIs for clouds are being developed, so that users could easily switch from one cloud to another [17], [18].

D. Practical applications Figure 4. A) If a website is hosted with the anticipation of peak In this section we will give a short insight on load, a lot of resources are wasted. B) In underprovisioning, a lot of what grid computing and cloud computing may be pageviews are lost. [1] used for. We will also discuss some known cloud and grid applications. Grid computing is often used by large V. CONCLUSIONS organizations to solve different jobs. Because Now that we have presented a comparison of grid large grids may be as powerful as conventional computing and cloud computing, we will sum up supercomputers [2], [19], scientific organizations our findings. In this section we will also present have adopted grid computing to be used for a table with the key differences between grid calculating solutions to scientific, large-scale computing and cloud computing (see Table I). problems. One such problem is tackled by the SETI@Home project. In the SETI@Home project, Table I a grid is formed of computers of volunteer users. COMPARISON RESULTS This grid is then used to analyse radio signals and to determine whether these signals include signs of Grid Computing Cloud Computing Architecture Computational grids SaaS extraterrestrial life. The grid is run in scavenging Data grids Utility computing mode, i.e. each volunteer computer sends out a Hardware Interconnected computers Not clearly defined message when it has free resources at disposal for (may be grids) the project [20]. Features Great computing power Scalable Pay in advance Pay by usage Cloud computing suits the needs of hosting Jobs Applications websites or web applications, since applications services deployed on clouds may scale up or down rapidly Security Certificates Certificates Asymmetric Asymmetric (see Figure 4) [1]. If a website is hosted on a cloud cryptography cryptography platform, it would be able to scale up and scale Dependant on down to meet the demand of resources experienced browser security Software Traditional parallel Platform specific in Figure 4. In addition to serving all of the development programming requested pages, the website owner would also only Practical Data-intensive Web application have to pay for the resources in use [1]. applications computing hosting 6

As we can see from Table I, even though we [14] MPI: A Message-Passing Interface Standard, http://www. at first saw no clear differences in the architecture mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf. [15] Google, “Google app engine,” http://code.google.com/ of the two programming paradigms, there exists appengine/, Retrieved: February 2011. differences between grid computing and cloud [16] C. Ecker, “Analysis: Google app engine alluring, will be computing. The differences are greatest in features, hard to escape,” http://arstechnica.com/old/content/2008/04/ analysis-google-app-engine-alluring-will-be-hard-to-escape. software development and practical applications. ars, Retrieved: February 2011. In this paper we have given definitions on both [17] TheOpenCloudConsortium, “The open cloud consortium,” http: grid computing and cloud computing and we have //opencloudconsortium.org/, Retrieved: February 2011. [18] CloudStandards, “Major standards development organizations also provided a comparison of the two computing collaborate to further adoption of cloud standards,” paradigms. According to our comparison, grid http://cloud-standards.org/wiki/index.php?title=Press Release, computing suits the needs of high throughput and Retrieved: February 2011. [19] BoincStats, “Seti@home stats,” http://boincstats.com/stats/ data-intensive computing as grids are able to provide project graph.?pr=sah, Retrieved: February 2011. the user with almost supercomputer-like computing [20] SETI@Home, “The science of seti@home,” http://setiathome. power. Cloud computing on the other hand suits the berkeley.edu/sah about.php, Retrieved: February 2011. needs for hosting web applications and websites, as clouds are scalable and offer an economical advantage over traditional hosting services.

REFERENCES [1] . Armbrust, A. Fox, R. Griffith, and A. D. Joseph, “Above the clouds: A berkeley view of cloud computing,” Electrical Engineering and Computer Sciences, University of California at Berkeley, Tech. Rep., 2009. [2] Top500, “Top500 list - november 2010 (1-100),” http://www. top500.org/list/2010/11/100, Retrieved: February 2011. [3] B. Jacob, M. Brown, K. Fukui, and N. Trivedi, Introduction to Grid Computing. IBM Corp., 2005. [4] I. Foster, “What is the grid? a three point checklist,” Argonne National Laboratory and University of Chicago, Tech. Rep., 2002. [5] M. Bote-Lorenzo, Y. Dimitriadis, and E. Gomez-Sanches, “Grid characteristics and uses: A grid definition,” School of Telecommunications Engineering, University of Valladolid, Tech. Rep., 2004. [6] OpenGridForum, “Overview,” http://www.gridforum.org/ About/abt overview.php, Retrieved: February 2011. [7] M. Armburst, A. Fox, R. Griffith, and A. D. Joseph, “A view of cloud computing,” Communications of the ACM, vol. 53, no. 4, April 2010. [8] DataMiningAndExploration, “Grid computing,” http://dame. dsf.unina.it/grid comp.html, Retrieved: February 2011. [9] R. Buyya, C. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility,” Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Tech. Rep., 2008. [10] I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud computing and grid computing 360-degree compared,” Department of Computer Science, University of Chicago, Tech. Rep., 2009. [11] Globus, “Toolkit,” http://www.globus.org/toolkit/, Retrieved: February 2011. [12] M. Jensen, J. Schwenk, N. Gruschka, and L. L. Iacono, “On technical security issues in cloud computing,” Horst Gortz¨ Institute for IT Security Ruch University Bochum and NEC Laboratories Europe, Tech. Rep., 2009. [13] AmazonWebServices, “Amazon relational database service (amazon rds),” http://aws.amazon.com/rds/, Retrieved: February 2011.

HOME NETWORKING: THE SMART HOUSE CONCEPT

Maria Yanchuk

65). With the invention of the microcontroller in 1971 the Abstract – Home Networking is a rather specific and still not costs of electronic control have significantly fallen, thus very widely spread way of using technologies in daily life. opening new perspectives for adopting automation control However, smart houses are becoming popular due to the technologies in industry and manufacturing. The term ‘home technological developments and our accelerated life styles. In this automation’ itself and the first ideas of the complete home paper, we present an overview of the smart house concept. We control system were proposed in 1978 by the Japanese discuss definitions, describe the applicability of home automation, and examine the most common software and companies Hitachi and Matsushita. In the 1980s, several hardware technologies used for its implementations. We also European and US companies (e.g. General Information cover device interconnection technologies, control and Systems, Home Automation Limited, Honeywell Control automation network technologies and data network technologies, Systems) [1] concentrated on the development of control thus offering the reader a general perspective on home systems that included such functions as lightning, heating and networking. air-conditioning control, energy saving and home security handling. In the beginning of the 21st century, home automation systems are able to control a household via I. INTRODUCTION Windows media technologies, wireless control and access through mobile phones and PDAs. Touch screen control panels were also introduced. Nowadays, one the most crucial he field of Home Automation is also referred to as Smart T directions of development in the Domotics sphere is the Houses, Intelligent Homes or Domotique (Domotics). In this development of very performing components that consider the paper, we study the ideas behind these terms, the features that behavior of the home’s inhabitants and their habits for their make the systems seem “smart” as well as the techniques used functioning, e.g. remembering preferable level of heating and to reach a high level of automation in daily life. We also take a lighting. Home automation for elderly or disabled people is look at the numerous benefits provided by the installation of also of great concern. Many companies now concentrate in the intelligent home technology and discuss future this area, creating products such as smart floors detecting development directions of the domotiques. As our topic is persons moving, tracking systems, etc. rather broad, we focus on the main definitions, concepts and To fully understand the meaning of the term ‘home techniques that make the fantastic dreams of people about automation’ we sum up its main functions below: ‘living’ and ‘thinking’ houses come true. • Heating, Ventilation and Air Conditioning (HVAC), We proceed as follows. In Section II we overview historical i.e., temperature and humidity control aspects of the development of home automation and home • Lighting control system (switching lights on/off, networking and give the main definitions related to the changing the ambient color, etc) subject. In Section III we discuss control methods used in the • Audio and video control (switching, distribution) field of home automation. Then we proceed to the • Security (simulation of presence, camera control, consideration of the technologies involved in providing the intrusion detection, etc.) proper functioning of home networks. Next we make an • Intercoms (communication via loudspeaker and overview of the benefits from the usage of domotiques and in microphone between rooms) the last section we discuss the future perspectives of home • Robotics (control of home robots) automation. • Caregiver systems control (aimed at taking control of elderly and disabled) II. HISTORICAL OVERVIEW AND DEFINITIONS • Pet feeding, plant watering, etc. Domotics refers to the automation of the home, of the work around the home, and of the household activities. It translates In the following we proceed to the overview of the to an automatic control over the functions within the home [2]. techniques that are used in performing all these functions. To better understand home automation, we first take a brief look at the main steps of its historical development. III. CONTROL METHODS The first electrified and automated houses were demonstrated at World Fairs in Chicago (1934) and New York (1939, 1964 - Home networking is a rather complicated system, built out of a considerable amount of elements to be controlled in order to function properly. There are two ‘dimensions’ of

controlling: one is performed by the user of the system and the technology is more than 30 years old. It provides the other by the centralized control of the system elements, via communication between devices via electrical wires sending networking. Here we discuss them both. data signals properly mixed with existing ones; for this reason, Human control is carried out in several ways. The most the protocol is quite affordable. X10 uses existing wires so common methods are described below. there is no need to rewire (or even reconstruct) one’s house. Affordability is also manifested in the installation of X10. One There are two kinds of remote control, namely from inside simply has to plug the transmitter at one location in the house and from outside. and it will send signals to the receiver plugged at another Remote control from inside is done either by a single remote location [4]. X10 supports more than 100 types of compatible for all automated items at home or via a panel on the wall products including ones from IBM, JDS, ACT and HomePro. (possibly touch-screen). Infrared technologies and Bluetooth Each product can be assigned to one of the 256 addresses are commonly used in providing the functionality of the using simple buttons and dials. If we need to operate some of remote. One of the most recently introduced types of inside the devices simultaneously, then we simply assign them to the remote control is carried out with voice commands or certain same address (e.g., if we want different lights to switch at the actions (e.g. claps). This method requires the involvement of same time). Such simplicity is one of the key points in the special technologies such as speech recognition. The major popularity of X10. disadvantage of control via remote is that the user should be To understand how X10 operates the devices, we describe within eyeshot of controllable device in infrared case or within the digital signals of X10 in more detail. These digital signals up to 100 meters in the case of Bluetooth. include the address of the device and a simple command or Outside remote control involves the use of phone (user status like “on”, “off”, the rate of the light brightness level, interacts with the system by pressing appropriate sequences of temperature level and some others. Each signal is a package numbers on the phone dial) or Internet (via special kind of that consists of 4-bit house code (number given every house at applications). This control method has significant benefits, the maintaining of the system; unique for every house), 4-bit such as the possibility to monitor from a distance the behavior unit code (number given to every device at the moment of of automated devices at home and change their working including it to the system; unique within a certain system) and schedule if needed. 4-bit command [3]. For the operation convenience, the house code is represented by a letter from A to P and the unit code is Programmable control can be performed by the user or by represented by a number from 1 to 16. We thus get 256 the system itself. The user can create a schedule that the possible addresses for the technical devices of the smart system will follow in its functioning. In its turn, the system house. For instance, assume that a certain device is assigned to can store the patterns of the user behavior (switching lights, the code C15. Then, the message turning it off in X10 is opening windows and curtains, etc.) and apply them according “select C15” “turn off”. to the current situation. As every technology, X10 has also certain weak points. One of these is the noise in the power line that can be generated by The components of a home automation system need to be some appliances (mostly those with motors such as vacuum connected in a certain network that handles a centralized cleaners and dryers). This problem can be solved by plugging control of their activity. This requires a special bus and the device into a noise filter (e.g. FilterLinc) [4]. Another protocol that provides proper communication between isolated weak point is the difference in phases at the transmitter and at parts of the domotique system, including such functions as the receiver; this can be overcome via plug-in phase coupler message exchange between devices and the centralized (e.g. SignaLinc) [4]. controller, rules for message routing, as well as processing and The main competitor of X10 in the field of home queuing. automation is Universal Powerline Bus (UPB). UPB is an There are several network technologies used in home extremely reliable and inexpensive solution for private and networking, but the most widespread are Ethernet and Wi-Fi. commercial powerline applications. Its reliability, defined as Among the numerous protocols that can be used are X10, the percentage of correctly operated transmitter/receiver pairs Universal Powerline Bus, ZigBee, EnOcean and EHS. operated upon installation is over 99.9% in comparison with figures of 70%-80% for X10. The speed of UPB’s data transmission is 20 to 40 times the speed of X10 and is IV. TECHNOLOGIES equivalent to over 10 full commands per second. The UPD addressing scheme has more than 64000 address spaces, thus There is a huge amount of technologies used in home much more than the 256 addresses of X10. [7] UPB also networking, out of which we consider the most popular ones. allows peer-to-peer communication, so no central controller is Before investigating them, we start with a brief definition of needed for a single point-to-point or group control. UPB is the protocol functionality in the smart home context. The job quite affordable and easy to install. Its maintaining does not of a protocol in the field of domotiques is to provide the require additional wiring. The main disadvantage is that the communication between all the devices included into a number of devices compatible with UPB is significantly network. The protocol makes them speak the same smaller than the number of ones that support X10. language [2] and fulfill the tasks user wants them to do. Another technology under consideration is that provided by The most well-known and frequently used protocol Ethernet-based systems, widely applicable in home technology in the field of domotiques is X10 [3]. This automation. They provide hard-wired connection between a

central controller and peripheral components (slave devices, downloading status of any or all of controllable devices and human interface components, e.g., a touch panel) of the smart their polling, while the home interface of Cosmic House house system using IEEE standard cabling. The core supports an extended dimmer operating and uploading of a components are plugged into the home Ethernet data network large control program (not just a limited number of actions). and use it to interconnect. Usage of the hard-wired network Further, we consider some of the numerous pieces of provides a high level of reliability as well as a good software that help the owner to control the functioning of the performance. The connection with the Internet gives such house. important advantages as the opportunity to check and change As X10 is the most popular technology to use in home the states of the devices online and the ability to upgrade automation, the number of software products operating in the software or firmware exactly from the manufacturer’s website X10 context is significantly large. Here, we discuss some of [6]. The major negative sides consist in the limited range of them. the supported devices (less than in case of X10) and the hard- One of the representative software products is ActiveHome wire installation costs that can be rather big. Another approach to home automation is its implementation Pro that provides the user with a PC program for operating the via wireless systems. The communication between the core modules. The main common feature of domotiques operating controller and the external components is provided by radio software is a simple user-friendly interface that does not frequency. The most well-known wireless control systems are require any programming skills. ActiveHome Pro is one of Insteon for lightning control, ZWAVE for lightning and that kind. Modules are organized by rooms and operating is climate control, and Control4 for wholly automated systems done by just clicking and using drag-and-drop actions. [4]. Radio frequency technology makes these systems reliable ActiveHome Pro provides user with functionalities such as: and fast. Another big advantage is that there is no need for • Lifestyle mode. By turning this mode on, the system extra wiring. As a weak point, this technology has limited remembers the user’s routines. It records the user’s support for sub-systems, in case they do not support IEEE or activity and repeats it every 24 hours, e.g., if the user other industry standards. The interface with other appliances turned the lights off at 11 pm, the next time the system (such as microwaves) can influence the reliability of the radio will do it itself at the same time. If the user changes frequency communications. the activities, the system will also change the In order to be operated from various electronic devices such remembered pattern automatically. as a PC, a PDA or a mobile phone, these hardware modules • Creating macros and events. Using simple drag-and- need proper software components. These programs control the drop actions the user can create certain sets of devices via certain interfaces. In the following we briefly operations that should run simultaneously by just one survey them. Since X10 is the most wide-spread technology, click. These can be macros attached to routine events we outline the interfaces related to it. such as ‘coming home’, ‘going to bed’, ‘watching The most well known X10 interfaces are CM11A, CM15A, movies’ and lots of others. Hence, the user only needs CM17A, CM19A, Cosmic House, PowerLinc Serial, LynX-10 to press the button of the ‘wake up’ event that PLC and USB PowelLinc. All the X10 interfaces support the determines the light to turn on and the coffee machine functionality listed below: to start brewing coffee. • Operating with basic X10 commands such as ‘ALL • Distant control from any computer or smartphone that UNITS OFF’, ‘ALL LIGHTS ON’, ‘ON’, ‘OFF’, has an Internet connection. ‘BRIGHT’, ‘DIM’, ‘STATUS REQUEST’ and some • ActiveHome Pro SDK. ActiveHome Professional also others. provides the users with a special software • Receive X10 activity. It means that interface receives development kit that gives them an opportunity to commands from X10 devices and generates a proper create their own applications to operate the automated response. devices in their home. Any programming language • Track device status. The interface monitors devices’ from C++ to JavaScript can be used to extend the statuses and perform certain actions in case they are standard functionality of ActiveHome Pro. needed. Another relevant software technology that works with • Supervisor mode that runs a control program. In that different protocols, including X10 is Event Control System mode a control program is uploaded and the system (ECS). ECS is considered to be one of the most powerful acts automatically according to it. software tools in the field of home automation. Let us recollect • SmartHome’s Primary Address Programming and the features that distinguish ECS from other software Scene Address Programming. There is an opportunity technologies: to write a program addressing to the devices in the • It implements tasks via schedules and/or English-like system by their code. The actions can be combined in scripts. Scripting supports if/then/else syntax, hence, the scenes that can be uploaded by a single click turning on the light at 10 pm in case the user is at through the usage of their own code (unique for every home employs a script like: single scene within a system). o If Clock.time becomes 10 pm There are also functions that are specific for concrete o And At Home is true interfaces. For instance, CM11A supports setting the time,

o Then Living room light set ON sub-system provides the user with the opportunity to create • ECS can be remotely accessed by means of web- light scenes (combination of lights’ states and brightness browsers or web-devices (e.g. iPhone, BlackBerry). level) and invoke them manually by pressing a single button • Support of e-mail commands (e.g. Living room light on the control device, by timetable or according to special set ON) and responding with the result of performing conditions (e.g. the user’s presence). the command. ECS processes only e-mails that come The security (alarm) sub-system is responsible for the safety from approved e-mails and IP addresses. of the house. It includes zone intrusion detectors (perimeter • Remind the user about upcoming events via playing detectors – mounted into doors and windows; interior audio sounds, sending text messages or e-mails. detectors – floor pads and motion sensors), smoke detectors • Reacting on alarm conditions (in case of fire, smoke, for fire alarm notification, outdoor intrusion detectors (motion flood, home intruders). detectors along the territory and in the gates). In case of alarm, • Support of NetCams/WebCams (record only on these systems can perform a wide variety of actions: activation motion-detected in 24/7 mode of working; e-mailing of the siren; turning on the light blinking; calling, e-mailing to images to house owner in case of motion detection). the owner or to the neighbors’; calling to the police office in • Automated tweeting for all the specific events. case of intrusion or to the fire brigade in case of the fire alarm. Security sensors can also control climate sub-system and • Displaying and recording the callers IDs (time, lightning sub-system in case of alarm. number and name identifiers for known numbers). Outdoor control sub-system is responsible for sprinkling, • Communicating with other programs via TCP/IP. swimming pool temperature control, and outdoor parts of • Downloading, using, monitoring and displaying other sub-systems (e.g. loudspeakers of the entertainment information from the Internet (weather forecasts, stock system). This system can include weather sensors, for instance rates). to balance the sparkling level according to the weather. This These are only the main features of ECS that show it as one can lead to significant savings in the water bill and also of the most multifunctional tools in home networking. prevents over-watering of the plants in the garden. There are many other software tools such as Zues, Home Home automation makes the user’s life simpler, more Control Assistant, Easy X10 and Cyber House that are similar convenient, more comfortable, and safe. Moreover, it reduces in functionality with the ones described above. house costs, for example, automatically turning off the lights in a certain scene (e.g. some rooms at the house) in case if the V. BENEFITS AND APPLICABILITY brightness of sunlight is at least 80% saves 20% of the The benefits of home automation are in the achievement of electricity used for that particular scene. [4] the desirable level of convenience, comfort, efficiency, connectivity, and safety in the users' houses. We consider VI. FUTURE PERSPECTIVES below the advantages of implementing domotiques according Home automation is an intensively developing branch of to each of its sub-systems. technology. One modern tendency is to integrate domotiques The entertainment center is a sub-system that includes technologies with the latest technical achievements, such as multimedia devices such as TV, home theatre, CD/DVD neuro-technologies and neuro-programming, used to player, video games systems and karaoke machines, all implement the artificial intelligence in home automation. controlled via a single remote control. This remote control These technologies bring to the smart house systems also provides the connectivity to the home computer network functionality such as voice, face and fingerprint recognition, for retrieving multimedia files (movies, music and photos) on carried out with the help of special biometrical sensors. the devices of the entertainment center, controls room According to the recognized personality, a home automation environment (automated drapes, blinds, projection screen), system acts in a certain way: lights up needed rooms, turns on and provides the support for multi video and audio streams. certain devices, e.g. TV, audio player or coffee machine, sets [7] the preferable temperature in the house, etc. There are already The climate sub-system controls the heating, the ventilation tools of that kind at the market. One of them is XS PRO-1000 and the air conditioning systems. The control is carried out via access control device and FaceLogOn Xpress program [8]. programmable thermostats that have functions such as This type of tools is not very wide spread yet, but grows in following the working patterns depending on the time or on popularity, not only in the field of private home automation, the user’s presence in the house, supporting flexible energy but also in the commercial markets. One of the examples saving mode. The system can automatically detect opened consists in the usage of XS PRO-1000 in a hotel business: the doors and windows to adjust its behavior to their position. system is able to recognize the guest of the hotel, Such functionality is achieved via implementing various automatically open the door and make the elevator move to motion and temperature sensors. the proper floor. Such systems provide a rather extreme level The lighting control sub-system has two main functions: of convenience and are also helpful from the point of reducing control the state of light (on/off) and the control of brightness costs – in case of using the automatic recognition the hotel can level of the light. “Intelligently” carrying out these functions decrease the number of people working at the reception. brings energy savings and, consequently, money. The lighting Another important tendency in the home automation field

consists in simplifying the products and making them more generic. The most urgent key points in achieving that are the integration of smart house protocols with the IP protocol, due to its popularity and the compatibility need with the most widely spread devices such as iPhone, iPod, iPad and Android-based devices. These will bring the home automation to mass market and make it more affordable for the general user.

VII. CONCLUSION In this paper we have briefly overviewed the home automation and home networking concepts. We discussed their definitions as well as surveyed the history of domotiques, thus aiming to understand the emergence of home automation. We have considered the control methods and technologies used in intelligent homes. We have also put forward the most important protocols, interfaces and software components that are used in the field of home networking. Eventually, we have discussed the benefits provided by home automation and home networking and considered the possibilities of the future smart housing development.

REFERENCES [1] A. Cawson, L. Haddon, I. Miles. The Shape of Things to Consume. Delivering Information Technology into the Home. Avebury, 1995. [2] Wings of Success. Home Automation. Inside Out! Wings of Success. [3] G. Meyer. Smart Home Hacks. O’Reilli Media, Inc, 2005. [4] http://www.homeautomationinfo.com/Drupal/technology [5] Alladi Venkatesh, Erik Kruse, and Eric Chuan-Fong Shih. The networked home: an analysis of current developments and future trends. In Cognition, technology, and Work, Vol 5, Number 1, pages 23-32, Springer-Verlag, 2003. http://www.springerlink.com/content/77q68cklmunb9ny1/ [6] http://www.homeautomationinfo.com/Drupal/technology_ethernet [7] http://www.homeautomationinfo.com/Drupal/technology_upb [8] http://www.cepro.com/article/is_facial_recognition_the_future_of_home _automation/ 1

Quantum Cryptography on Computer Networks

Alonso J. Gragera Aguaza, Student Member, IEEE

 We are particularly interested in a subset of these states, Abstract — Nowadays the confidentiality of the information namely the ones that in the end, when we open the box, we transmitted on computer networks is crucial. In addition, can observe. traditional cryptography can enhance its performance significantly by applying quantum physics principles. These statuses are called, auto-states:

Thus improvements in communication security one of the key research areas for our society, and therefore, understanding its | ⟩ | ⟩ principles is essential for the future development of our network based software. And they have the important property of been incompatibles between themselves, in other words, a coin cannot “be in” In this paper we aim to study the quantum cryptography „heads‟ and „tails‟ at the same time. notion, aiming to survey its definition, state-of-the art, as well as existing and forthcoming protocols, in an approach to be easily understood by someone without deep prior knowledge in B. Quantum superposition quantum physics. But, what is the point of having these auto-states, if we can

say easily if the result of throwing a coin is „heads‟ or „tails‟ at Index Terms— Cryptography, Data Transmission Security, Quantum Cryptography, Quantum Mechanics. the end?

The meaning of having this artifact is to mathematically express the state of a system before we can observe it, as for example if we flip our coin in a box again, we can describe the I. INTRODUCTION state of our system as: LONG this document first of all, in Section II, we are A going to survey the fundamental principles of quantum | ⟩ | ⟩ mechanics and in which the quantum cryptography is based on. In other words, x possibilities of having „heads‟ and y possibilities of having „tails‟ before opening the box. After that, in Section III, we briefly examine the different protocols, analyzing its design and security.

And at least, in Section IV & V we will describe a glimpse ahead of the future in this kind of systems, which are the trending topics and how may that these systems be used.

II. QUANTUM MECHANICS A. Quantum state A quantum state is just a “number” that allows us to know all the information about the state of a system.

Assume that we flip a coin in a box; then we can have an infinite number of possible states such as:

| ⟩ | ⟩ | ⟩ We can represent our space as the following square area, in which any point is a possible auto-states.

2

But we quickly discover that as well as there are valid And the middle point will be at getting as result the points (or auto-states) like A (1,0) as being tail and B (0,1) as √ √ auto-state represented by: being head; there also are invalid ones, like C (1,1 )with 100% of chances of becoming head and tail at the same time or D (0,0) with no chances of becoming one or the other. | ⟩ | ⟩ √ √

In other words, half of possibilities of having „heads‟ and the half of possibilities of having „tails‟ before opening the box.

And, when we open it the resultant status is:

[ ] | ⟩ | ⟩

[ ] | ⟩ | ⟩

We are completely sure that it is „tail‟ [a] or completely sure that it is „head‟ [b], this is called status collapse.

C. Quantum entanglement This last definition only describes that we can have a system with two coins in a way that if we flip them both in their respective boxes, we still do not know their resultant So we should define the valid auto-states as the ones that statuses of each coin but we are completely sure that there are the combination of the individual possibilities are equal to 1, opposite. or in other words, the ones that belongs to a circumference of radio 1 and center un (0,0) It is detonated by:

In this way, we ensure that the auto-states will be consistent with the reality observed and the laws of probability. | ⟩ | ⟩ | ⟩ | ⟩

√ √

In where the symbol is read as “entangled with”, and in this case means that even if both coins individually have one half of possibilities of be head and one half of be tail, we know that in overall, when we open both cages, there will be one that is tail and the other head.

3

That is really important because it allow us to know Bases used by

X X X X beforehand the result that is going to appear in one cage by Bob just opening the other one. Photons

measured by

III. BASICS PROTOCOLS Bob A. Protocol BB84 BB84 is another protocol for the key distribution, developed Step 4 by Charles Bennett and Gilles Brassard in 1984. This schema Alice and Bob share, in a public channel, the order of the is algorithmically more complex than E91, but physically bases used to generate or measure the photons. much simpler, as it only uses the no-cloning theorem. Bases used

X X X X Bennett and Brassard propose an algorithm that could be by Alice divided in 5 steps: Bases used

X X X X Step 1 by Bob Alice and Bob choose a couple of orthogonal polarization pair directions, which we are going to refer to as “Bases”. They also store the ones in common.

In our case, the bases are denoted as and X, and bit ✓ ✓ ✓ ✓ values are going to be assigned to each possible direction of Common X X X X these polarizations. Step 5 Alice and Bob create a key with the valid common-base Bases 0 1 measured photons, as illustrated in below.

Photons

X Key 1 1 1 0 We observe that both Alice and Bob need to generate polarized photons in the same way; it does not matter if a third person knows this combination, so these bases could be Then, they share a couple of values to confirm that both standard for the dispositive. have the same resultant key and there has not been a third party involved in spying the communication. They can use the Step 2 rest of them to codify the messages. Alice generates a random sequence of polarized photons using both common bases; she then writes down in which direction they were polarized. B. Protocol E91

This protocol for key distribution was developed by Artur Bases used by Ekert in 1992. This schema is algorithmically simpler than

X X X X Alice BB82, but was developed later as it physically takes advantage of the more complex properties of the quantum mechanics. Photons generated

by Alice Here we are going to use two properties of the quantum entanglement, namely the correlation and the Bell‟s inequality. Then, Alice sends them to Bob. Ekert divided this algorithm in 3 steps: If there was a third person spying the conversation (as a man-in-the-middle), this person could not know which base Step 1 was used for measuring each pair. Moreover, if the spy tries to Alice selects a base and creates a randomly generated series check them randomly, then the chain is corrupted. of orthogonal polarized entangled photons pairs.

Step 3 Bob measures the photons with a randomly chosen bases order and then writes down the obtained results. 4

Bases 0 1 V. FUTURE OF CRYPTOGRAPHY The cryptography methods described so far are starting to

be used in very punctual cases, mostly as experiments.

In this section we want to outline a couple of remarkable Then, she sends both the base and one photon of each cases: entangled pair to Bob. Spain islands experiments Step 2 In 2006, a 144 kilometers transmission between La Palma Alice and Bob check that their photons are still entangled, and Tenerife, a pair of Spanish islands, using the E91 protocol so there is not a third party involved (using man-in-the-middle for sharing the key. techniques) because for reading the information of one single will destroy it. Another experiment was realized later on the same year, on a 148.7 kilometers optical fiber but this time using BB84 as Step 3 protocol. Alice and Bob measure their photons polarity, which would be opposite.

VI. CONCLUSION Photons of

After this essay we can conclude that even if several recent Alice protocols have been developed, are the first ones themselves Photons of the ones that presents a solid improvement over the traditional cryptography and built the foundations of the quantum key Bob distribution.

Then, they create the private key based on the sequence of REFERENCES one of them, in this case Alice. [1] BENNETT C. H., BRASSARD G., and EKERT A. K., "Quantum Cryptography", Scientific American, October 1992, pp. 50-57. [2] BENNETT C. H., BRASSARD G., and MERMIN N. D., "Quantum Key 0 1 0 1 0 0 1 1 cryptography with-out Bell's theorem", Physical Review Letters, vol. 68, no. 5, 3 February 1992, pp. 557 - 559. [3] BENNETT C. H., "Quantum Cryptography: Uncertainty in the Service of Privacy", Science, vol. 257, 7 August 1992, pp. 752-753 [4] EKERT, A. K., "Adventures in quantum cryptoland" (in Japanese), IV. OTHER PROTOCOLS Parity, vol. 7, February 1992, pp. 26 – 29 Into the considerable amount of new protocols that are [5] GOMEZ-ESTEBAN P. and CRICK G., "Cuántica sin fórmulas – Criptografía cuántica" (in Spanish), El Tamiz, 24 November recently been proposed, we can highlight the following ones 2009.[Online]. Available:http://eltamiz.com/2009/11/24/cuantica-sin- as possible significant improvements and mark them for formulas-criptografia-cuantica/ further study: [6] ELBOUKHARI M., AZIZI M., and AZIZI A., “Quantum Key Distribution Protocols: A Survey”, International Journal of Universal Computer Sciences,Vol.1 2010, pp. 59-67 A. Protocol SARG04 [7] SHARNA A., OJHA V., LENKA S.K., “Security of Entanglement Based Version of BB84 protocol for Quantum Cryptography”, Computer SARG04 is an equivalent protocol to BB84 in the theoretical Science and Information Technology (ICCSIT), 2010 3rd IEEE part; it differs from BB84 by using attenuated laser pulses International Conference, July 2010 vol. 9. instead single-photon sources.

Gragera Aguaza, Alonso J. (M‟10) B. Protocol S09 S09 is a qubits exchange based protocol that allows massive Born in Granada on June 27th, 1988 is a 5th year student of Computer Science in the University of key distribution between n-1 computers and one key message Granada (Spain), but currently is completing his distribution center. This protocol is also immune to man-in- studies as an exchange student at ÅboAkademi the-middle attacks as it does not use the classical (Finland). communication channels. He have been working as Intern in Development Platform Evangelism at Microsoft Iberica participating on diverse projects related with .NET C. Protocol KMB09 technologies, web page design and the adaptation of KMB09 is an alternative quantum key distribution protocol, in the different learning resources to the Bologna study plans. Nowadays is actively interested in Artificial Intelligence, Algorithmics, which Alice and Bob uses two bases: one for encoding „0‟ and Computability and Computational Complexity the other for encoding „1‟, instead of using the two directions. of one single base. Mr. Gragera has been awarded with the Microsoft Student Partner during the years 2008, 2009, 2010 and 2011. 1

Security Issues in Wireless Communications

Sushil Pandey (35176) Åbo Akademi University

 portability. Various other devices such as video game Abstract—Wireless communications are becoming more consoles, digital cameras, digital audio players and consumer essential than ever in today’s modern society. With wireless electronics are also taking advantages of wireless communications used in offices, homes or for travelling, there communication. are new security issues to be dealt with. Along with A network interface card (NIC) either built in or installed continuously growing markets for wireless technology and its separately is needed for all devices to access any wireless mobility, security is a big unresolved issue. In this paper we network. Wireless NICs can be PCMCIA cards, SD cards or present the security threats and vulnerabilities in wireless communication and review the security functions specified in USB adaptors. Wireless networks are configured basically in the wireless standards IEEE 802.11, Bluetooth and HiperLAN, one of the two forms, adhoc or infrastructure mode [1]. also comparing these standards. We also discuss how to mitigate risks by applying security countermeasures such as management, as well as operational and technical countermeasures to address specific threats and vulnerabilities.

Index Terms—Wireless LAN, Wireless standards, Wireless security, Security Countermeasures Fig. 1, Ad-hoc mode Adhoc networks are created by establishing wireless I. INTRODUCTION communications between two or more devices directly. Adhoc mode is designed such that only the clients within the ireless technology enables one or more devices to transmission range of each other can communicate. W communicate using radio frequency transmissions as the means for transmitting data. Despite their popularity and success, wireless networks are more exposed to various additional vulnerabilities than the networks using wired communication. In this paper we present security threats and vulnerabilities in wireless communication. We proceed as follows. In Section II, we present a general overview of WLAN networks and their associated standards. In Section III, we discuss the WLAN security, threats and vulnerabilities that wireless network may face. In Section IV, we present various countermeasures to mitigate those threats and vulnerabilities and in Section V we conclude the paper.

II. OVERVIEW OF WIRELESS LAN AND STANDARDS A. Wireless Local Area Network A Wireless Local Area Network (WLAN) [1] is a data communication system that provides wireless peer-to-peer Fig. 2, Infrastructure mode and point-to-point connectivity using Radio Frequency (RF) transmissions. The data transfer rate is slower than a wired In infrastructure mode, one or more access points (AP) is LAN that uses cables for the connectivity. Wireless installed and devices communicate through AP. communication has made communication simple and more B. Wireless Standards economical with respect to installation costs. It has also IEEE (Institute of Electrical and Electronics Engineers) is made it possible for multiple users to access and share data, the leading authority in the specification and ratification of Internet connection, various applications as well as network standards relating to technology. The IEEE 802.11 standard printer. Wireless enabled devices such as laptops, smart for WLAN is one of the most widely adopted standards for phones or PDAs allow the user to take advantages of the wireless Internet access. In the following we discuss this and WLAN features such as flexibility, convenience and two other widely used standards Bluetooth and HiperLAN.

Manuscript received March 2, 2011.

2

1) IEEE 802.11 necessarily have to sit in front of a server or any user IEEE 802.11 is a set of WLAN standards developed by computer in order to gain access to files or messages. In working group 11 of the IEEE LAN/MAN standards wireless LANs, this threat is even higher as the attacker does committee (IEEE 802). The initial standard was finalized in not even need to be in the same physical location with the June 1997 which has operating frequency of 2.4 Ghz and communication parties. data rate of 1 to 2 Mbps. The 802.11 family includes over- The minimal requirements for any wireless LAN in order to the-air modulation techniques that have the same basic ensure secure communication consist of addressing certain protocol. Among the family, 802.11a was the first wireless security attributes, namely confidentiality, integrity and networking standard but 802.11b was among the one widely availability [3]. Confidentiality refers to allowing only adopted. 802.11g and 802.11n standards have followed authorized persons to view the data. Integrity refers to the 802.11b. 802.11n is a recent amendment based on the message being transmitted remaining not tampered with or previous 802.11 standards, using the new multi-streaming modified in between. Availability refers to the network being modulation technique MIMO (Multiple-Input Multiple- accessed by authorized user only, thus providing access Output). control. Attacks on the wireless communication can be classified as 2) Bluetooth either passive or active. In the following, we describe these Bluetooth was created by Ericsson in 1994 and is now types of attacks and the security attributes they aim to managed by a group called Bluetooth Special Interest Group violate. The classification of attacks is also illustrated in (SIG). Today most of the telecommunication companies Figure 1. have joined the group and use the technology in various In passive attacks, an unintended or unauthorized person devices. It was developed as an alternate to data cables. It gains access to the resource but does not modify the can be used to connect one device to another device or create contents. A passive attack can be eavesdropping or traffic an ad hoc network of several such devices. analysis. In the case of eavesdropping, the message that is Bluetooth uses the radio frequency range of 2.45 GHz. transmitted is monitored by an unauthorized person and Bluetooth devices are classified into three classes. Class 1 hence the confidentiality attribute of secure communication device has maximum transmission power of 100 mW and is breached. In the case of traffic analysis, the attacker covers the range of 100m. Class 2 and Class 3 devices have monitors and analyses the transmission of data possibly maximum transmission power of 2.5 mW and 1 mW identifying patterns. This type of attack also breaches respectively and cover the range of 10m and 1m respectively confidentiality. In addition, the data being monitored is read [7]. by unauthorized users, thus affecting the network availability attribute as well. 3) HiperLAN HIPERLAN (High Performance Radio Local Area Network) is another Wireless LAN standard in the 5 GHz range. It was defined by a technical committee of ETSI (European Attack Telecommunications Standards Institute) and consists of s family of standards referred to as BRAN (Broadband Radio Access Network). The family consists of HiperLAN type 1 and 2, HIPERACCESS and HIPERLINK. They have a very high transmission rate of up to 54 Mbit/s. HyperLAN also Passive Active includes optional encryption and power saving [8]. Attacks Attacks

III. WLAN SECURITY, THREATS AND VULNERABILITIES

A. WLAN Security Despite the commercial popularity and success of wireless LANs, security has always been one of the highest concerns for them. This is due to the fact that transmitting through the Eavesdro Traffic Masquer Replay Message Denial- pping Analysis ade Modificat of- air is open to everyone, in contrast to wired transmissions. ion Service The vulnerabilities and attacks of earlier implementations have also boosted the perception of WLANs having security Fig. 3, Attacks on wireless communication problems. However the latest developments and improvements in WLAN standards have taken significant steps to minimizing the security risks [2]. In active attacks, an unauthorized person gains access to the resources and modifies the contents. These kinds of attacks B. Threats and Vulnerabilities are detectable and are divided into four types: Masquerade, The sole feature that any network (wired or wireless) has to Replay, Message Modification and Denial-of-Service. An perform is that any users can exchange information across a Attacker impersonates an authorized user and gains certain distance and over a shared medium. This transmission of unauthorized privileges while masquerading. It breaches the data in a shared medium allows that an attacker does not integrity feature of secure communication by gaining 3 unauthorized privileges. In Replay, the attacker monitors the b. Operational Countermeasures: transmission and replays the message acting as the Physical security must be ensured so that only authorized authorized user; thus the confidentiality, availability and the users have access to the equipments. Any network cannot be integrity feature of secure communication are altered. While said as secured unless it is secured physically. Physical doing message modification, the attacker modifies the security includes activities such as [3]: original message by adding, deleting or changing it and Access control – Photo identification, card reader or hence breaching the integrity feature of secure biometric readers should be used to minimize the risk of communication as wells as the confidentiality. Denial-of- unauthorized access by the unintended persons over the Service (DoS) attackers prevent the access of certain facilities. facilities for the normal user, breaching the availability Personal Identification feature of secure communication. External boundary protection – It can include activities like having secured locks and install CCTVs for the surveillance IV. WIRELESS SECURITY COUNTERMEASURES of the surroundings to discourage unauthorized access to The threats outlined above can be mitigated by applying various devices. Despite this, it may also be possible that the suitable countermeasures against them. Here we are attacker may be outside the physical perimeter but might be describing three countermeasures namely management, still within the proximity of the wireless network; hence operational and technical countermeasures [3]. some wireless security assessment tools like vulnerability assessment should be used. Also, security audits should be conducted in regular intervals. a. Management Countermeasures: Management Countermeasures consist of the effective security policies in place for a network, in order to mitigate c. Technical Countermeasures: the attacker threats. The security policy and the ability to Various hardware and software solutions are used to secure enforce its compliance lay the foundation for the other two the wireless environment [3]. countermeasures. The security policy should clearly address the following issues [3]: Software solutions: - Operational and security settings on an access points - Identify which users or group of users are authorized (APs) has to be configured properly to use the network and also who is responsible to - Network administrators should regularly check for the install and configure wireless equipments. new software patches and upgrades and install them - Define physical security requirements for WLAN and as needed. the devices, including limitations on the service areas - Only authorized person should be allowed to use the of WLAN network. Authentication can be done by passwords, - Define types of information that may and may not be biometric or smartcards. - Intrusion detection systems (IDS) can be used to sent over the WLAN. determine if any unauthorized user is attempting to - Define how WLAN transmissions should be access or have accessed the network. protected, including requirements for the proper use - Personal firewall also helps to increase WLAN of cryptographic key management. security, which resides on a user device and can be - Define the conditions under which WLAN client managed by individual users or centrally as well. devices are or are not allowed to be operated. - Encryption provides higher level of security and for this various encryption techniques may be used. - Define standard hardware and software configurations

that must be implemented on WLAN client devices to Hardware solutions: ensure the proper level of security. - Smart cards or biometrics system may be used to - Define limitations on how and when WLAN client prevent access from unauthorized persons from the device may be used. equipments. - Describe the guidelines on reporting losses of WLAN - Virtual private network (VPN) provides a secure client devices and reporting security incidents. communication mechanism for data and other information transmitted over a wireless network by - Mention the guidelines for the protection of WLAN creating a VPN tunnel. client devices against theft. - Public key infrastructure (PKI) helps binding public - Define the frequency and scope of WLAN security keys with respective user identities using certificate assessments. authority. - Define the actions to be taken to address rogue or - Network segregation also helps to maintain higher mis-configured devices that are identified. level of security by separating one network into various parts and keeping the critical one on shielded

network.

4

V. CONCLUSION Individuals or organizations can only benefit from wireless communications if they are protected. Various threats and vulnerabilities exist in the wireless environments in addition to wired one. However, by applying suitable countermeasures to address specific threats, these risks can be reduced. Management, operational and technical countermeasures do not completely prevent all the vulnerabilities, but they are quite effective in reducing major risks.

REFERENCES [1] Wireless Local Area Networks (WLAN), Becta, http://www.becta.org.uk/subsections/foi/documents/technology_and_educati on_research/w_lans.pdf [2] Wireless Security: Models, Threats, and Solutions, Nicholas, R and Lekkas, P, McGraw-Hill Professional, 2002 [3] Guide to Securing Legacy IEEE 802.11 Wireless Networks, National Institute of Standards and Technology, U.S Department of Commerce http://csrc.nist.gov/publications/nistpubs/800-48-rev1/SP800-48r1.pdf [4] Security in Wireless Local Area Networks, Sami Uskela, Helsinki University of Technology, http://www.tml.tkk.fi/Opinnot/Tik- 110.501/1997/wireless_lan.html [5] Wireless Security Today: Wireless more secure than wired, Siemens Enterprise Communications, 2008, http://enterasys.com/company/literature/WLAN%20Security%20Today- Siemens%20whitepaper_EN.pdf [6] Wireless LAN Security and IEEE 802.11i, Jyh-Cheng Chen, Ming- Chia Jing, Yi-Wen Liu, National Tsing Hua University http://wire.cs.nctu.edu.tw/wire1x/WC02-124-post.pdf [7] Bluetooth Security, Juha T. Vainio, Helsinki University of Technology, http://www.mowile.com/bluesec.pdf [8] HiperLAN/2, Janne Korhonen, Helsinki University of Technology http://www.tml.tkk.fi/Studies/Tik-110.300/1999/Essays/hiperlan2.html

Other Links: [1] Wikipedia – IEEE 802.11, http://en.wikipedia.org/wiki/IEEE_802.11 [2] Wikipedia – Bluetooth, http://en.wikipedia.org/wiki/Bluetooth [3] Palowireless – hiperLAN and hiperLAN2 resource center, http://www.palowireless.com/hiperlan2 ABO˚ AKADEMI UNIVERSITY, NETWORK SOFTWARE, SPRING 2011 1 Fault Tolerance Methods for Ethernet Networks Tony Karlsson, M.Sc. Student, Abo˚ Akademi University (This paper was written as a course assignment at Abo˚ Akademi University)

Abstract—In the Internet-centered society we live in today, we IEEE 802. Thus, this paper offers a rather brief introduction put many hard requirements on our technical systems. One of explaining the basic concepts of these methods. the most important requirements is the need of high reliability; While proprietary methods can sometimes provide better we need to be able to access our data at any time, and we expect the systems we use to always be available and fully functional. functionality than open alternatives, the largest drawback is For network connectivity, this need of high reliability includes that they are generally manufacturer specific, i.e. compatible an implicit need of fault tolerance and redundancy. In this paper only with other network equipment of the same brand [1], we provide a brief review of open methods that can provide such [2]. For this reason, we will instead focus on common, open fault tolerance and redundancy. We compare methods working IEEE standards, which should be supported by most Ethernet in OSI layer 2, focusing on standardized IEEE 802 methods that can be applied in Ethernet networks. The surveyed protocols equipment available on the market. Proprietary methods are include STP, RSTP, MSTP and LAG. This paper provides a brief mentioned briefly, but are otherwise outside the scope of this introduction, explaining the basic concepts of each protocol. paper. This paper is structured as follows. In Sections II through Index Terms—Fault tolerance, Fault tolerant systems, Redun- dancy, Local area networks, Ethernet networks. IV, we explain three different Spanning Tree Protocol re- dundancy standards, and how they are related. The methods are approached in historical order, in order to give a better I.INTRODUCTION overall understanding of the subject. We will describe the background and functionality of the methods, while at the N THE ever increasingly Internet-connected world of to- same time comparing them to each other. Each chapter is I day, high availability is of utmost importance. In order to be ended by discussing eventual drawbacks of the method. In able to access our online data at any time, we rely on various Section V, we will briefly discuss Link Aggregation Group network connections, that we expect to always be available (LAG), an IEEE standardized method that can be used for in fully functional condition. This requirement of high avail- fault tolerance. Finally, we conclude the paper by summarizing ability has become very strict, and unexpected downtime (i.e. and discussing the methods and their combined potential uses. network outages) is generally not accepted, but rather frowned Ultimately, we should have gained some understanding of upon by unhappy, demanding users. For this reason, commonly IEEE standardized methods for fault tolerance and redundancy, available computer network technology should have sufficient that can be utilized in local Ethernet networks today. redundancy and fault tolerance mechanisms in place, in order to be able to prevent unncessary network outages and provide II.SPANNING TREE PROTOCOL (STP) high availability. A. Standardization Undebatably, Ethernet (IEEE 802.3) has become the de In 1985, while working at Digital Equipment Corporation, facto industry standard for most local computer networking. Radia Perlman developed and published an algorithm and Therefore, it would be reasonable to assume that Ethernet tech- protocol, based on spanning trees for the purpose of pre- nology can provide sufficient mechanisms for such redundancy venting loops in a Local Area Network (LAN) [3], [4]. This and fault tolerance. These mechanisms can then be utilized in algorithm and protocol were adapted by IEEE, simply naming order to fulfill the aforementioned requirement, in cases where them Spanning Tree Algorithm and Protocol. The method is such functionality is needed or wanted. More specifically, a generally referred to as Spanning Tree Protocol (STP), the critical backbone Ethernet network should be able to operate term Spanning Tree Algorithm (STA) only being used when with a broken piece of equipment (e.g. a dysfunctional switch the actual inner workings of the algorithm are discussed [1], or bridge), or provide connectivity to its users despite having [2], [4]. an interconnecting cable unplugged or broken. The STP algorithm and protocol became officially recog- In this paper, we will provide a brief review of some existing nized standards when IEEE in 1990 published them as part of methods for redundancy and fault tolerance in Ethernet-based the Media Access Control (MAC) Bridges standard document, physical Local Area Networks (LAN). By doing this, we will 802.1D-1990 [5]. As is the case with many IEEE standards, see what functionality can be achieved by using these methods, 802.1D has further evolved since its first release. Consecutive and determine whether the functionality provided by these revisions of 802.1D have been published in 1998 as 802.1D- methods is sufficient, or if the redundancy and fault tolerancy 1998 [6] and in 2004 as 802.1D-2004 [7]. is somewhat lacking. We will focus on methods that have been standardized and published by The Institute of Electrical and B. Purpose Electronics Engineers (IEEE), and more specifically on the The primary purpose of STP is to prevent logical loops in OSI layer 2 LAN/MAN standards as defined and published by networks. This is necessary because many network standards, 2 ABO˚ AKADEMI UNIVERSITY, NETWORK SOFTWARE, SPRING 2011

Fig. 1. A graph representing a network with six nodes and two loops. Fig. 2. A spanning tree representing the same network. E is the root node. including Ethernet, do not allow such structures [1]. If multiple nodes need to be able to send each other relevant data. Nodes network nodes, such as switches or bridges, are incorrectly in a STP network do this by exchanging Bridge Protocol Data connected to each other by more than one path, either directly Unit (BPDU) frames, which containing unique identification or indirectly (through other network nodes), the whole network information, and other data needed by STA [4]. By analyzing can potentially be wiped out; theoretically, data packets can these BPDU frames, all nodes in a spanning tree can agree on traverse the logical loop forever, circulating with such high which node will be the root [8]. We will not go into details on speed that nearly all network bandwidth is consumed for this how this is done here. In our example network, we will just purpose, thereby effectively blocking all other traffic [1], [3]. assume that the root bridge is the node E. Network loops can easily be constructed as a result of By definition, each non-root node is allowed exactly one negligence or human mistake, e.g. by simply plugging a possible path to the root node [7]. The open path between network cable into the wrong place. This was one of the any non-root node and the root node will be the logically primary reasons for developing STP in the first place [3]. shortest path, which is calculated in terms of port cost and Redundant links between bridges or switches could also de- path cost [4]. Port cost represents the speed of a link between sirable, added intentionally for the purpose of providing fault two directly connected nodes, while the path cost simply is tolerance in case a network device or cable breaks. Clearly, in the sum of port costs between two nodes, whether directly or a normal network without STP (or other similar functionality), indirectly connected. In other words, the path with the lowest redundant links cannot be implemented without automatically path cost is the logically shortest path. obtaining undesired network loops. For simplicity, we will assume that all the links in our network are of the same speed, and thereby that all the port C. Method costs are equal. Assume that all port costs are 1 (one). As such, In order to explain the inner workings of STP, we rely all path costs in our example simply depend on the length of on graph theory. Assume that a network is represented by the path, and represent the number of hops between the given a graph, where the vertices represent switches and bridges. nodes. For example, the path cost of A-D is 1 (one), while the Interconnecting links between the switches and bridges are path cost of D-E-F-C is 3 (three). represented by the edges in the graph. Computers and other Knowing how to calculate the path cost, determining the network devices will not be included in the graph, as STP shortest path between any node and the root node is very is only concerned with bridges and switches [7]. From here straightforward. For example, for node A, the cost of the path on, we will refer to our switches and bridges as our network A-D-E is higher than the cost of the path A-E. For node D, nodes. the results are very similar; the path D-E is shorter than D- In Fig. 1 we show a graph with six vertices labeled A- A-E. It is clear that the path A-D is the redundant link that F, representing network nodes, and seven edges representing can and will be disabled. Fig. 2 shows the result of applying links between the nodes. In graph theory terms, the graph STA on our example network. The dotted edges, A-D and B-C, is connected, i.e. every node is connected to at least one represent the redundant links that have been disabled. other node, and there is a path between any two given nodes. When deciding which network ports on each bridge or Also, the graph is cyclic, i.e. the represented network includes switch are allowed to communicate, STP differentiates be- loops. In this state, the network would not work very well. tween three different port roles [4], [5]: Essentially, the network loops need to be eliminated, which 1) Root port: The port which provides the lowest cost path can be achieved by blocking some redundant links. In graph to the root node. The root bridge is the only node that does theory, this corresponds to constructing an acyclic connected not have a root port. graph from the cyclic graph, by excluding some edges, i.e., the 2) Designated port: Ports which are not pointed at the root, ones representing the redundant links. This acyclic connected but allowed to communicate. Per definition, all ports of the graph follows the very definition of a spanning tree, hence the root bridge are designated ports. protocol name. 3) Non-designated port: Ports which are not allowed to The first step that needs to be done is to make one node communicate, i.e. network ports that have been blocked. the root node or root bridge, i.e. the root of our tree. STA is a By regularly exchanging and analyzing BPDU frames, STP distributed algorithm, in the sense that all the nodes take part nodes are able to detect and react to changes in the network in the decision making [4], [8]. In order for this to work, the topology. If a network node or link disappears or stops KARLSSON: FAULT TOLERANCE METHODS FOR ETHERNET NETWORKS 3 functioning, STP will be able to reconstruct the network tree, As of 2004, the RSTP specification from 802.1w-2001 has enabling disabled paths if needed [7]. In the same way, if the been incorporated into the updated Media Access Control root node for some reason would stop functioning, the other (MAC) Bridges standard revision 802.1D-2004 [7], thereby nodes will be able to elect a new root, which will take over making regular STP as defined in IEEE 802.1D-1990 [5] the role [4], [8]. and 802.1D-1998 [6] obsolete. In other words, RSTP is the currently preferred STP protocol for all new implementations of IEEE 802.1D. D. Drawbacks

STP has primarily been designed to prevent loops in net- B. Purpose works, not to provide fast recovery times [1]. The time needed for STP to reconfigure after an error situation depends on the In 1990, when the original STP standard was published, number of network nodes, as well as the complexity of the recovery times of 30 to 60 seconds were considered acceptable network topology [1]. The recovery time a network using STP [11]. Much has happened since then, though, and proprietary needs in order to restore full functionality after an error has protocols have been needed to get better recovery times occured, is generally between 30 and 60 seconds [2], [9]. In a than STP has been able to offer [1], [2], [11]. The reasons complex network topology with many nodes, the recovery can IEEE had for developing RSTP was to be able to provide be even slower, with recovery times of up to several minutes adequate recovery times using a standardized, open protocol, [1]. The fact that STP provides a relatively long recovery time, while at the same time providing backward compatibility with is one of the worst drawbacks the protocol suffers from. equipment utilizing the original STP standard [10], [11]. The recovery time provided by STP can still be acceptable in many office environments, where there are no critical C. Method systems relying on short response times, and a short network When STP builds its spanning tree, deciding which network outage can be tolerated. For many other network environments, ports on each node should be allowed to communicate, each however, e.g., networks used for industrial automation, the network port is assigned one of five possible port states: recovery performance provided by STP is way too slow. disabled, blocking, listening, learning or forwarding [4]. RSTP For example, some automation applications used in chemical redefines three of these states; disabled, blocking and listening industry can have a required response time from 0.5 to 3 are combined into one state called discarding [4], [11]. seconds [2]. By adjusting some parameters, the recovery time Instead of relying on many different port states, RSTP im- of STP can be slightly tweaked [1], but still not enough to proves on the concept of port roles. Where STP differentiates fulfill these requirements. between only three different port roles (root port, designated STP also suffers from low resource utilization, as well as port and non-designated port) [5], RSTP also identifies edge a complete lack of load balancing [1], [9]. As every network ports (i.e., ports that are not connected to other bridges or node can only have one open path to the network root node, switches, and can thereby never cause loops to occur), and STP blocks redundant links completely. Physical, redundant further divides non-designated (blocked) ports into alternate links are thereby completely unutilized, when they could rather ports and backup ports [11]. Alternate ports can provide be used for load balancing of some sort. Two nodes that have guaranteed alternative paths to the root bridge, while backup a blocked direct link between them, are instead forced to ports cannot [11]. communicate via links and network nodes closer to the root. By differentiating between these different port states and In other words, instead of using a logically very short path, the port roles, RSTP is able to precalculate alternate paths, that data packets can be forced to traverse very long paths, closer can be easily looked up and applied when a problem situation to the network root node [1]. As a result of this, the network arises [1]. An important performance improvement is also load is increased for the links closest to the root, which in turn provided by RSTP, allowing individual switches and bridges to leads to an increased risk of network congestion [9]. inform the whole network about errors and fixes using broad- cast notifications, whereas STP requires all such information III.RAPID SPANNING TREE PROTOCOL (RSTP) to be sent via the root bridge [9]. The recovery time of RSTP is, just as for STP, dependant A. Standardization on the topology and complexity of the network. However, the In 2001, the IEEE 802.1 working group introduced an recovery times achieved by RSTP are considerably better than evolution of the Spanning Tree Algorithm and Protocol (STP), the corresponding STP values; in cases where STP needs from called Rapid Spanning Tree Algorithm and Protocol (RSTP). 30 to 60 seconds for full recovery, RSTP can perform the same RSTP was published by IEEE as an amendment to the Media recovery in one or a few seconds [2]. In very simple cases, Access Control (MAC) Bridges standard document 802.1D- where the network consist of only a few nodes, RSTP can 1998 [6], entitled Rapid Reconfiguration and named 802.1w- even manage to recover the network to full functionality in 2001 [10]. less than a second [1], [2]. RSTP is generally referred to as Rapid Spanning Tree RSTP is fully backwards compatible with equipment us- Protocol (RSTP), omitting the Algorithm part of the complete, ing the original STP standard [11]. Equipment running the official name [1], [2], [4], [9]. Even in the 802.1w-2001 802.1D-1990 or 802.1D-1998 versions of STP do not under- standard document, the names are used interchangeably [10]. stand the Bridge Protocol Data Unit (BPDU) type 2 frames 4 ABO˚ AKADEMI UNIVERSITY, NETWORK SOFTWARE, SPRING 2011 that the 802.1w-2001 RSTP standard defines [10]. However, any additional functionality for such environments [13], [17]. when recognizing STP equipment, network bridges using In other words, STP and RSTP are VLAN-aware and can RSTP are able to automatically run individual ports in STP coexist in the same network as multiple VLANs, but do compatibility mode [11]. A RSTP compatible bridge or switch not offer any VLAN-specific features. In a network with can then communicate with other RSTP compatible network multiple VLANs, it may sometimes be desirable to direct nodes using RSTP and BPDU type 2 frames, while using STP network traffic from different VLANs to different spanning and BPDU type 1 frames to communicate with legacy STP tree instances. Unfortunately, STP and RSTP are unable to equipment [4]. provide any such functionality [13], [17]. As such, the purpose of MSTP is to adapt the STP and RSTP protocols for efficient D. Drawbacks use in networks with multiple VLANs [13]. In particular, Even though RSTP provides significantly improved recov- MSTP is supposed to allow the existence of multiple spanning ery times compared to STP, the improvement is not sufficient tree instances within a network, and let traffic from individual for all purposes. Of course, the radically improved recovery VLANs be linked to some specific spanning tree instance [4], times make RSTP usable in many network setups where STP [16]. cannot fulfill the requirements. However, many applications, C. Method e.g. systems used for critical industrial automation, have too high requirements for both STP and RSTP; for example, MSTP defines a single Common and Internal Spanning applications used for conveyor belt automation in factories can Tree (CIST), and introduces a concept called network regions, require response times of down to 50 milliseconds [2]. which the network is divided into. Each region contains an As a result of the backwards compatibility with STP, RSTP Internal Spanning Tree (IST), and zero or more Multiple also retains many of the same drawbacks; RSTP, as well as Spanning Tree Instances (MSTI) [9], [13]. Each MSTI has STP, suffers from low utilization of network resources, com- its own root bridge, and is basically an instance of RSTP [9]. plete lack of load balancing, and risk of network congestion The root bridges from the ISTs of each region are connected near the root node [9]. together through a Common Spanning Tree (CST) [13], [17]. VLANs can be assigned into the different MSTIs residing in IV. MULTIPLE SPANNING TREE PROTOCOL (MSTP) the regions [9]. It is possible to balance VLANs over different MSTIs in order to achieve effective load balancing [9], [17]. A. Standardization In 2002, the IEEE 802.1 working group published an D. Drawbacks amendment to the Virtual Bridged Local Area Networks stan- Although sharing its inheritance with STP and RSTP, MSTP dard document 802.1Q-1998 [12], entitled Multiple Spanning does not inherit all of their drawbacks. The nature of MSTP Trees and named 802.1s-2002 [13]. The 802.1s-2002 standard makes it possible to get working load balancing by assign- defines an addition to the spanning tree family of proto- ing VLANs to different MSTIs. Low utilization of network cols, named Multiple Spanning Tree Algorithm and Protocol resources does not apply to MSTP either, since MSTP with (MSTP). MSTP is based on the Spanning Tree Algorithm and a little effort can be configured to practically employ every Protocol (STP) as defined by standard 802.1D-1998 [6], as single physical link there is. well as on the Rapid Spanning Tree Algorithm and Protocol Risk of network congestion near the root bridges still applies (RSTP) as defined by standard 802.1w-2001 [10]. for MSTP, but the problem has moved to the root of each In the very same way as STP and RSTP, MSTP is also MSTI. Recovery times after an error situation are around the generally referred to as Multiple Spanning Tree Protocol, same as for RSTP, basically leading to the same limitations leaving out the Algorithm part of the name [4], [9]. [16], [17]. Because MSTP requires and completely relies on Virtual LANs (VLANs), IEEE has considered the protocol a member V. LINK AGGREGATION GROUP (LAG) of the family of VLAN standards. As such, in 2003 the A. Standardization MSTP specification from 802.1s-2002 was incorporated into In 2000, The IEEE 802.3 working group published an the updated Virtual Bridged Local Area Networks standard amendment to the 802.3-1998 standard [18] entitled Aggre- revision 802.1Q-2003 [14]. The 802.1Q standard has also been gation of Multiple Link Segments and named 802.3ad-2000 updated in 2005, making the latest official standard 802.1Q- [19]. The 802.3ad-2000 standard defines the Link Aggrega- 2005 [15]. As MSTP has been standardized in 802.1Q, IEEE tion Group (LAG) and its Link Aggregation Control Protocol has chosen not to include the protocol in 802.1D (where STP (LACP). and RSTP are defined). In 2002, the 802.3ad-2000 standard was incorporated into As a side note, Cisco Systems claim that MSTP was origi- the updated 802.3 standard revision 802.3-2002 [20]. The nally inspired by their proprietary protocol Multiple Instance standard remained in the subsequent 2005 standard revision STP (MISTP) [4], [16]. 802.3-2005 [21], but as of 2008, and the release of 802.3 standard revision 802.3-2008 [22], LAG has been moved B. Purpose to its own standard, entitled IEEE Standard for Local and The general STP and RSTP protocols can be used within Metropolitan Area Networks - Link Aggregation and named networks using VLANs, but the STP protocols do not provide 802.1AX-2008 [23]. KARLSSON: FAULT TOLERANCE METHODS FOR ETHERNET NETWORKS 5

B. Purpose The purpose of Link Aggregation is to provide an open, standardized method for combining multiple physical network links into one virtual link, also known as the Link Aggregation Group (LAG). There are two reasons for doing this. The first reason for constructing LAGs is to overcome the bandwidth limitations of individual physical links. The second reason is to provide fault tolerance in the form of resilience, by using two Fig. 3. A graph representing a network using STP and two LAGs. physical links which are both able to function independently of each other [23]. VI.CONCLUSION C. Method We have established that there are cases where the recovery LAG utilizes the Link Aggregation Control Protocol (LACP) time provided by the STP, RSTP and MSTP protocols is in order to exchange information about LAGs between net- not sufficient. In such cases, proprietary redundancy methods work nodes. LAG supports fully automatic configuration, i.e. and/or another network standard than Ethernet might be a bet- network devices with automatic LAG configuration enabled ter choice. In other cases, such as various office environments, will be able to automatically configurate LAGs if multiple even the relatively long recovery time provided by STP can links are plugged. LACP identifies network ports by using be sufficient. This is something to be considered on a case by Aggregation Keys, in order to be able to tell LAGs apart. When case basis. switches or bridges analyze network ports using LACP, ports We need to always consider the size of a network. If a that carry the same Aggregation Key are potentially part of network is small enough, there might be no need for any the same LAG. LAGs can also be manually configured [23]. spanning tree protocol at all; if the network contains only The links used for a LAG must be point to point, full duplex two switches or bridges, a two port LAG between them and share the same data rate for each port. The total bandwidth could provide adequate fault tolerance. In that specific case, of a two port LAG is not necessarily two times the data rate for STP and RSTP would provide no additional features. Thus, one port, but rather a little lower. This is caused by protocol it is important to not use bigger tools than a specific task overhead [23]. needs, just because it would be possible. There is no need LACP can be implemented in software, which in practice to implement unnecessary redundancy where it is not needed. means that a computer with two or more Network Interface Instead, one should put more effort into the important cases, Cards (NICs) can achieve the same fault tolerance and band- where redundancy and fault tolerance is really important. width improvement as LACP enabled switches or bridges. In many cases, implementing a combination of a spanning For example, the Linux kernel has supported 802.3ad Link tree algorithm and LAGs could be a wise choice. In Fig. 3 we Aggregation (LAG) for years [24]. illustrate the same spanning tree as constructed in Section II, but with two port LAGs added between nodes E-F and F-C. In D. Drawbacks this case, for example, one cable could break between nodes When relying on LAGs for resilience, one must always E-F without causing any disruptions in connectivity. In the remember that the overall bandwidth is reduced when a link, original spanning tree, nodes B, C and F would in this case i.e. a network port or cable, breaks. For a two port LAG, have been isolated, unable to contact the root node. removing one of the links naturally cuts bandwidth in half for In this paper, we have given an overview of some methods the remaining connection. In the same way, for a four port for fault tolerance and redundancy in Ethernet networks. As LAG the decrease in speed is one fourth per disconnected expected, there is no single protocol or standard that would be physical link. If a LAG is heavily loaded, then the desired fault perfect in all situations. Hence, it is necessary to be familiar tolerance might not work as expected; if the remaining part of with the tools at our disposal, and use this knowledge to make the LAG is congested, the total required bandwidth cannot wise choices when needed. be provided. In other words, the LAG must have enough bandwidth to be able to withstand the loss of a physical link, REFERENCES if the resilience is to work as desired. [1] G. Prytz, “Redundancy in industrial ethernet networks,” in Factory Another, minor but notable drawback, is the increased use Communication Systems, 2006 IEEE International Workshop on, 2006, of network ports. Large LAGs quickly consume network pp. 380–385. [2] K. Hansen, “Redundancy ethernet in industrial automation,” in Emerging resources in the form of physical ports, e.g. a two port LAG Technologies and Factory Automation, 2005. ETFA 2005. 10th IEEE between two nodes consumes a total of four network ports Conference on, vol. 2, sept. 2005, pp. 941–947. (two per network node), and a four port LAG consumes a [3] R. Perlman, “An algorithm for distributed computation of a spanning tree in an extended lan,” SIGCOMM Comput. Commun. Rev., vol. 15, total of eight network ports. If many large LAGs are to be pp. 44–53, September 1985. included in a network, the added links may directly affect the [4] W. Lewis, C. N. A. Program, and I. C. Systems, LAN switching and hardware costs. In particular, more and/or larger switches or wireless : CCNA exploration companion guide. Indianapolis, Ind.: Cisco Press, 2008, ch. 5, pp. 227–330. bridges may have to be included in the network, in order to [5] “Ieee standards for local and metropolitan area networks: Media access fill the increased demand of network ports. control (mac) bridges,” IEEE Std 802.1D-1990, 1991. 6 ABO˚ AKADEMI UNIVERSITY, NETWORK SOFTWARE, SPRING 2011

[6] “Ieee standard for information technology- telecommunications and information exchange between systems- local and metropolitan area networks- common specifications part 3: Media access control (mac) bridges,” ANSI/IEEE Std 802.1D, 1998 Edition, 1998. [7] “Ieee standard for local and metropolitan area networks media access control (mac) bridges,” IEEE Std 802.1D-2004 (Revision of IEEE Std 802.1D-1998), 2004. [8] L. L. Peterson and B. S. Davie, Computer networks : a systems approach. Amsterdam ; Boston: Morgan Kaufmann, 2007, ch. 3, pp. 187–192. [9] M. Huynh, S. Goose, and P. Mohapatra, “Resilience technologies in ethernet,” Computer Networks, vol. 54, no. 1, pp. 57–78, 2010. [10] “Ieee standard for local and metropolitan area networks - common specification. part 3: Media access control (mac) bridges - amendment 2: Rapid reconfiguration,” IEEE Std 802.1w-2001, 2001. [11] “Understanding rapid spanning tree protocol (802.1w),” http://www.cisco.com/en/US/tech/tk389/tk621/technologies white paper09186a0080094cfa.shtml, Cisco Systems, Oct. 2006, document ID: 24062. [12] “Ieee standards for local and metropolitan area networks: Virtual bridged local area networks,” IEEE Std 802.1Q-1998, 1999. [13] “Ieee standards for local and metropolitan area networks— virtual bridged local area networks— amendment 3: Multiple spanning trees,” IEEE Std 802.1s-2002 (Amendment to IEEE Std 802.1Q, 1998 Edition), 2002. [14] “Ieee standards for local and metropolitan area networks. virtual bridged local area networks,” IEEE Std 802.1Q, 2003 Edition (Incorporates IEEE Std 802.1Q-1998, IEEE Std 802.1u-2001, IEEE Std 802.1v-2001, and IEEE Std 802.1s-2002), 2003. [15] “Ieee standard for local and metropolitan area networks virtual bridged local area networks,” IEEE Std 802.1Q-2005 (Incorporates IEEE Std 802.1Q1998, IEEE Std 802.1u-2001, IEEE Std 802.1v-2001, and IEEE Std 802.1s-2002), 2006. [16] “Understanding multiple spanning tree protocol (802.1s),” http://www.cisco.com/en/US/tech/tk389/tk621/technologies white paper09186a0080094cfc.shtml, Cisco Systems, April 2007, document ID: 24248. [17] P. Lapukhov, “Ccie blog: Understanding mstp,” http://blog.ine.com/ 2010/02/22/understanding-mstp/, Feb. 2010. [18] “Ieee standard for information technology - telecommunications and information exchange between systems - local and metropolitan area networks - specific requirements. part 3: Carrier sense multiple access with collision detection (csma/cd) access method and physical layer specifications,” IEEE Std 802.3, 1998 Edition, 1998. [19] “Amendment to carrier sense multiple access with collision detection (csma/cd) access method and physical layer specifications-aggregation of multiple link segments,” IEEE Std 802.3ad-2000, 2000. [20] “Ieee standard for information technology-telecommunications and in- formation exchange between systems- local and metropolitan area networks- specific requirements part 3: Carrier sense multiple access with collision detection (csma/cd) access method and physical layer specifications,” IEEE Std 802.3-2002 (Revision of IEEE Std 802.3, 2000 edn), 2002. [21] “Ieee std 802.3 - 2005 part 3: Carrier sense multiple access with collision detection (csma/cd) access method and physical layer specifications,” IEEE Std 802.3-2005 (Revision of IEEE Std 802.3-2002 including all approved amendments), 2005. [22] “Ieee standard for information technology–telecommunications and information exchange between systems–local and metropolitan area networks–specific requirements part 3: Carrier sense multiple access with collision detection (csma/cd) access method and physical layer specifications - section three,” IEEE Std 802.3-2008 (Revision of IEEE Std 802.3-2005), pp. 1 –315, 26 2008. [23] “Ieee standard for local and metropolitan area networks - link aggrega- tion,” IEEE Std 802.1AX-2008, pp. c1 –145, 3 2008. [24] “Linux ethernet bonding driver howto,” http://www.kernel.org/doc/ Documentation/networking/bonding.txt, Sept. 2009. 09, Roman 1

The Networking behind Facebook

Gema Román López

 Finally we will talk about a few technical features of Abstract — Since the Information and Communication Facebook, such us the infrastructure it uses, how it stores so Technologies (ICT) appeared, the way we interact with much data and how it makes possible the communication other people has undergone a great change and it is in a among so many users. process of unstoppable development. The huge amount of ICT services have had a deep impact on the way people II. SOCIAL NETWORKS AND FACEBOOK know other people, how they keep in contact, how they organize events and so on. In the late 1970s, Social Informally, a social network is a large structure that one Networks were created as a new way of interaction uses to build relationships with other people. between people using the online services, as a promotion for sharing personal information, making new contacts Formally defined, a social network site is a “web based and keeping in touch with them. In this paper, we uncover service that allows individuals to construct a public or semi- some fundamental issues about Social Networks Sites and public profile within a bounded system, articulate a list of we especially investigate Facebook. We other users with whom they share a connection and view and talk about several of its main features as well as its traverse their list of connections and those made by others applicability which have given Facebook more popularity within the system” [1]. over other social networks/communication means. We also Social Network Sites (from now on referred as SNSs) are study the networking infrastructure behind Facebook with therefore based on the idea that people with similar interests the aim of understanding the functioning mechanisms of are sharing them on the web. These sites also enable users to the most widely used social network these days. maintain connections that, although based on previous offline connections, would not be possible or lasting without social network sites. Index Terms — Communication, ICT, Social Networks, Amid the numerous features of SNSs, the most important Social Network Site, Infrastructure one is the profile of each user, sometimes defined together with a list of the system users, declared as the user’s friends. This profile allows the user to write personal information, sometimes divided by topics, such as birth data, interests, I. INTRODUCTION “about me” section, academic information, political opinion, sexual orientation, contact information and so on. Some SNSs HIS paper is generally about Social Networks Sites and encourage the user to also upload a personal profile picture, Tmore specifically about Facebook. First of all we will try while others have various modules, such as games or to give the reader a general impression about what Social applications that can be added to enhance the profile. Another Network Sites are, how they have had a deep impact on how particular feature for some SNSs is the possibility of using people communicate with each other, and why Facebook has them on mobile devices as well as adding contents in this way. become one of the most important and used Social Network The visibility and privacy policy each SNS has is also one Sites nowadays. of the characteristics that differentiate some SNSs from others, In the third chapter, we will talk about the features that which we will discuss shortly in Section 4. Facebook offers to its users and how communication is established between them. A. History of Social Networks After that, we will go through the privacy aspects and risks that all the Social Network Sites can cause to their users and we will analyze a particular study, which took place in Initially, most of the sites that are nowadays considered as Carnegie Mellon University in Pittsburgh, related with how SNSs began as web sites with various purposes. For instance, users behaved when they became Facebook’s members and QQ was a Chinese instant messaging service, Cyworld started how important privacy was for them. as a Korean discussion forum tool, while MiGente and BlackPlanet were simple ethnic community sites with a few features and little functionality. Many sites were eventually re- Manuscript received March 1, 2011. This work was supported in part by launched after SNSs were proposed elsewhere, by adding the Abo Akademi University). Gema Román López. Abo Akademi Student. 83778. (e-mail: numerous new features and organizing the site based on a [email protected]) predefined structure and requirements.

09, Roman 2

majority of them thought there was no much to do apart from accepting Friend requests or sending messages. The other possibility the site offered, meeting unknown people, was not very popular among the users. Following SixDegrees.com, a huge number of SNSs appeared, among them some of the most popular ones: Friendster, YouTube, Flickr, Last.FM, MySpace, Microsoft’s Windows Live Spaces, Google Orkut and Hi5. In early 2004, a new site called Facebook appeared as a “Harvard-only SNS” as Cassidy cited in [1]; this SNS was later to become one of the most used SNSs all over the world.

B.

Initially Facebook was created as a private SNS for Harvard students, which in practice implied that one needed a Harvard.edu email account in order to join the site. After a while, Facebook was expanded to support other university members, i.e., one needed an email account belonging to a university institution to maintain the site as a private community. In September 2005 Facebook started its biggest expansion, initially including high school students and corporate networks professionals and eventually everyone. At first, Facebook was aimed to offer online support to previous relationships that had happened/occurred offline or to relationships that had some offline elements in common. For instance, assume a Harvard University student X, who already had some friends A and B at Harvard, but who did not know students F and K. Since A is a friend with F and K, then X have an element in common with F and K. The main idea was to support one’s offline community.

III. COMMUNICATION BETWEEN FACEBOOK USERS

In this section we will go through the main features that Facebook offers to its users, as well as the two ways of communication that can be used to establish the communication on this social network.

Figure 1: Timeline of the launch years of many SNSs and dates when A. Facebook features community sites re-launched with SNS features. [1]. Since the very beginning, Facebook offers its users a way to get to know their “online friends” and to interact with them. According to [1], the first SNS (SixDegrees.com) was The main feature on Facebook is the personal profile that one launched in 1997, as illustrated in Figure 1. This site was the can create and modify after registering on the site. Personal first to be named as a SNS although most of the features it information, contact information, hobbies, pictures, etc. can be contained existed before in other applications and sites, such added to the profile to give an overview of oneself to the other as the personal profile on dating sites or the list of friends at users [3]. One can also add or list those networks, such as the sites such as ICQ. The revolutionary point of universities or home towns, to which one belongs to and find SixDegrees.com was the fact that all these features were put those people belonging to the same networks who are likely to together in the same site. SixDegrees.com was a huge success be or become one’s friends [3]. Facebook also allows users to but after three years, in 2000, it closed because most people create groups based on offline activities connecting those who had joined the site did not have many friends online. The users that are interested in the same activities. 09, Roman 3

Facebook allows other websites and companies to include recommended action each user should redefine his/her own Facebook’s personal Plugins, such as , among privacy, with several levels, restricting which users can have others. For this purpose Facebook has the Social Plugins, to access to his/her personal profile. However, research studies make this process as easier as possible. Developers can also related with SNSs revealed that users tend to not change their create their own applications, based on others previously privacy access [4]. existing or new ones, such as games or interactive maps. This To clarify some points related to privacy, consider the kind of applications can be created either for personal following scenario. Imagine a user X, who belongs to the computers or based on mobile devices. For these cases, group of friends (assume X is, in fact, the best friend) of a user Facebook already has many samples that can be used as a Y, and a user Z who does not belong to Y’s group of friends. starting point for the development process. By default, X would have access to Y’s profile, including

pictures, personal information, list of friends, lists of groups

which Y belongs to, etc. Z would not have the same access B. Synchronous vs. Asynchronous communication privileges. On the other hand, imagine another friend of Y’s, say W, who is a colleague from Y’s work. By default, this one We can define synchronous communication as the would have the same “privileges” as X. Y can decide which communication established between users when one of them parts of his/her personal profile will be accessible to X and remains blocked waiting for the other user to response him/her which ones to W, because they are different kind of friends before continue doing something else. and Y does not want to share the same contents with them. On the other hand we talk about asynchronous Since each user may have different groups of friends and communication when the person who is being the sender can sometimes may want to share with some of them different continue with his/her tasks immediately after sending the information than with others, Facebook allows different levels information to the receiver, that is, without waiting for a of privacy depending on the group of users or even on each response. user visiting someone’s personal profile. Further assume that Synchronous tools in Facebook: chat application. It allows X does not want to meet new people. Facebook offers every users to communicate with the others users with an instant user the possibility to restrict who can send private messages messaging service. This application identifies the users who to them. are connected at a time and those ones who are absents (active some minutes ago). A. Security Risks: The CMU Case Study Asynchronous tools: wall, message system, e-mail system, photo comments, like button, status. All of them allow users to In the following we discuss some security aspects based on leave messages to their friends in a way that no immediate a research study that took place in 2005 among college response is needed. students from Carnegie Mellon University (CMU) [4]. In this study, the researchers used the Facebook advanced search feature to extract 4540 profile IDs that allowed the researchers to download all those profiles, getting the whole CMU IV. SHARING INFORMATION AND USER’S PRIVACY Facebook population at that time. Apart from many interesting results in the study, the The structural variations in visibility and access to user following private information was obtained from the profiles: profiles are one of the most important differences between 90.8% of profiles contained an image, 87.8% revealed their SNSs. birthdate, 39.9% listed their phone number, 28.8% contained In some SNSs such as Friendster and Tribe.net, when one their cellphone number and 50.8% listed their current creates a personal profile, that profile becomes visible to residence. They also stated dating preferences, current anyone, independently of whether the people “checking” this relationship status (62.9% identifying their partner by name profile have an account on the SNS or not. In some other and a link to their Facebook profile), political views and SNSs, a user will be able to see the profiles of others, interest, as well as sexual orientation [4]. The results of the depending on having paid a membership fee to the SNS. study are illustrated in Figure 2. Facebook takes a completely different approach to sharing profiles. By default, everyone using Facebook appears in searches of other users, with certain information available, including the user’s name, picture profile and the academic institution the user is attending. The full profile information will be available only for the users belonging to the same “network/group”. However, these are the default settings and Facebook offers the necessary mechanisms to control the searchability and visibility of each profile, as well as controlling the contact information privacy. As a 09, Roman 4

be understood as interference in users’ rights. On the other hand Facebook has also established privacy rules that third parties should follow carefully when using users’ data got from Facebook. “You (third parts) will not include data you receive from us (referring to Facebook) concerning a user in any advertising creative, even if a user consents to such use.” Or “You will not sell any data. If you are acquired by or merge with a third party, you can continue to use user data within your application, but you cannot transfer data outside your application.” [2].

V. WHAT WE CANNOT SEE IN SNS

In order to offer its users the features we have talked about, Facebook needs a physical infrastructure behind it. This infrastructure is the responsible of supporting the data storage, Figure 2: Percentage of CMU profiles revealing various types of personal communication establishment, information transmission and information. [4]. so on that Facebook employs to fulfill its users’ needs.

Thus, one important point about security risks is the presence of fake information in the SNSs. In most of them, A. The infrastructure and particularly in Facebook, users are encouraged to give their real name and surname and real personal data. However, Facebook has been developed from the ground up using there is no proof that they are doing this, which can be a open source software. Developers building their own double-edge sword. On one hand, providing real personal applications are using many of the same infrastructure information can reveal personal data that might be used for technologies that the power Facebook.. damaging the owner. On the other hand, not stating true Until now, Facebook’s platform engineering team has information in the profiles would cause fake profiles and users released and maintains open source SDKs for Android, C#, that are also dangerous for the SNSs. One should carefully iPhone, JavaScript, PHP and Python. consider who is being accepted as Friend in SNSs and check that every person who has access to one’s information is the Facebook Infrastructure [2]: one who is supposed to be. . . Distributed storage system for Then, when accepting a user in Facebook as a Friend managing structured data. another problem might appear. Some people are willing to . is data warehouse infrastructure. Provides accept some other users as Friends just because they met them tools for data summarization, adhoc querying and analysis once, even if they do not really know them or they do not of large datasets. know if they can trust on them. This fact means sharing . FlashCache is a general purpose write back block cache personal information with strangers. for Linux. . HipHop for PHP transforms PHP source code into highly optimized C++. B. Principles and Privacy Policy . is an open hardware project to accelerate data center and server innovation and to Among its main principles, Facebook pretend to build a increase computing efficiency. social community which engages applications and to help its . is a scalable service for aggregating log data users sharing their contents. Facebook wants to respect users’ streamed in real time from a large number of servers. privacy and not to mislead, confuse or surprise them. . Thrift provides a framework for scalable cross-language According to Facebook’s terms of service and privacy services development in C++, Java, Python, PHP, and policy, on one hand it assumes some kind of rights upon users’ Ruby. data. As an example, its privacy policy reports that the site . is a relatively simple, non-blocking web server will collect additional information about its users (for instance, framework written in Python. It is designed to handle from instant messaging), not originated from the use of the thousands of simultaneous connections, making it ideal service itself. The policy also states that participants’ for real-time Web services. information may include information that the participant has not knowingly provided (for example, her IP address), and that personal data may be shared with third parties. This fact might 09, Roman 5

Facebook Engineers contribute to: C. How to manage data in Facebook

. Apache Hadoop that provides reliable, scalable, Facebook uses MySQL to store users’ data [2]. Facebook distributed computing infrastructure that is used for data has a special team for these databases issues, called “Facebook analysis, Database Team”, which is divided into three sections: . Apache HBase is a distributed, colum-oriented data store database operations, database performance and engineering built on top of the Hadoop Distributed Filesystem. teams. Some data related to the transactions Facebook . Cfengine is a rule-based configuration system used to performance with its databases: automate the configuration and maintenance of servers. - Query response time: 4ms reads, 5ms writes . Jemalloc is a memory allocator, fast, consistent and it - Network bytes per second: 38GB peak supports heap profiling. Facebook engineering added - Queries per second: 13M peak heap profiling and made many optimizations. - Rows read per second: 450M peak . MySql is the backbone of its database infrastructure. - Rows changed per second: 3.5M peak . memcached is a distributed memory object caching system. It was not originally developed at Facebook, but One of the most important issues Facebook has to deal with it has become the largest user of the technology. is the photo storage. It is one of the features preferred by the . PHP is the scripting language which makes up the users and it is the most used one. According to Facebook majority of its code-base. official blog [2], these are some numbers related with this . Varnish servers are the servers involved each time a user issue: load photos and profile pictures. - 1.7 billion user photos - 2.2 billion friends tagged in user photos - 160 terabytes of photo storage used with an extra 60 B. Developer tools terabytes available - 60+ million photos added each week which take up 5

terabytes of disk space One of the interesting things that Facebook offers to its - 3+ billion photo images served to users every day developers is the approach of many tools for developing more easily. Some of them are [2]: - 100,000+ images served per second during our peak traffic windows . codemod assists with large-scale codebase refactors that can be partially automated but still require human Apart from these numbers, we should multiply the number oversight and occasional intervention. of pictures by four, because they store four image sizes for each photo. . Facebook Animation is a JavaScript library for creating The current growth rate is 220 million new pictures per customizable animations using DOM and CSS week which translates to 25TB of additional storage consumed manipulation. every week. . flvtool++ is a tool for hinting and manipulating the The newest photo infrastructure Facebook is using is called metadata of FLV files. It was originally created for Haystack. It is basically based on the merging of the photo Facebook Video. serving tier and the storage tier into one physical tier. It . Online Schema Change for MySQL for altering large implements a HTTP based photo server which stores photos in database tables without taking the cluster offline. a generic object store called Haystack. The main achievement . PHPEmbed makes embedding PHP truly simple for of this system is to eliminate unnecessary metadata for the I/O developers. Facebook developed this PHPEmbed library operations. The main layers of this system are: HTTP server, like a more accessible and simplified API built on top of Photo store, Haystack Object Store, Filesystem and Storage. the PHP SAPI. [2]. . phpsh provides an interactive shell for PHP that features In Figures 3-6, we can see how each Haystack Object has readline history, tab completion, and quick access to the first 8KB occupied by the superblock. After that, there are documentation. It is ironically written mostly in Python. needles, consisting on a header, the data and footer. Each . Three20 is an Objective-C library for iPhone developers, needle is uniquely identified by its tuple. behind the iPhone application. . XHP is a PHP extension which augments the syntax of

the language such that XML document fragments become valid expressions. . XHProf is a function-level hierarchical profiler for PHP with a simple HTML-based navigational interface.

09, Roman 6

REFERENCES

[1] Danah M. Boyd, Nicole B. Ellison. Social Network Sites: Definition, History, and Scholarship. Journal of Computer – Mediated Communication. Volume 13, Issue 1, pages 210 – 230. October 2008. Available online at: http://onlinelibrary.wiley.com/doi/10.1111/j.1083- 6101.2007.00393.x/full [2] Facebook Developers Official Site. Last access 15.05.2011. http://developers.facebook.com [3] Catherine Dwyer, Starr Roxanne Hiltz, Katia Passerini. Trust and privacy concern within social networking sites: A comparison of Figure 3: Layout of the Haystack store [2] Facebook and MySpace. Available online at: http://csis.pace.edu/~dwyer/research/DwyerAMCIS2007.pdf [4] Ralph Gross, Alessandro Acquisti. Information Revelation and Privacy in Online Social Networks (The Facebook case). Pre-proceedings version. ACM Workshop on Privacy in the Electronic Society (WPES). 2005. Available online at: http://www.heinz.cmu.edu/~acquisti/papers/privacy-facebook-gross- acquisti.pdf [5] The Unofficial Facebook Blog. Last access 10.03.2011. http://www.allfacebook.com/ [6] Cliff Lampe, Nicole Ellison, Charles Steinfield. A face(book) in the crowd: social Searching vs. social Browsing. Proceeding. CSCW ’06 Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. 2006. Available online at: Figure 4: Needle tuples [2] http://portal.acm.org/citation.cfm?id=1180901 [7] Ralph Gross, Alessandro Acquisti. Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook. Springer – Verlag Berlin Heidelberg 2006. Available online at: http://www.springerlink.com/content/gx00n8nh88252822/ [8] The social software weblog. Last access 10.03.2011. http://socialsoftware.weblogsinc.com/

.

Figure 5: Haystack object store [2]

Figure 6: Information store for each photo [2]

VI. CONCLUSIONS

SNSs in general and Facebook in particular, have had an important impact in everybody’s life. Communication nowadays would be really different without them. Facebook is constantly growing and improving the features and possibilities it offers to its users. From a technical point of view, Facebook needs to increase and improve its servers and databases constantly because of the huge demand of services from its users. They are really reaching its aims with a big group of professionals who make it happen. Public-Key Cryptography and the RSA Algorithm

Carlo Zambon Åbo Akademi 83791 [email protected] May, 2011

Abstract ─ Symmetric-key encryption methods have a But how can these two users agree on this key if they number of disadvantages, such as the high number of cannot meet each other in advance? They must rely on keys needed to allow a group of users to securely a safe channel to exchange the key, avoiding the exchange messages among them and the necessity of possibility that it can fall into the attackers’ hands. An exchanging the keys before the communication can start. example could be to use a trustworthy courier, for Public-key cryptography has enabled us to overcome the main limitations of the symmetric-key methods. Since its instance internal at Alice and Bob’s organization, that creation in 1977, the RSA algorithm has become one of can reach B with the key created by A. Obviously this the most used cryptosystems in the world. Its security is is not always possible and moreover we cannot be based on the alleged difficulty of factorizing a product absolutely sure that the courier is not an attacker into its primes. In this paper, we review the central himself. aspects of this technology, starting with an overview of This key-related problem is not the only one that the public-key cryptography domain and the problems affects a symmetric-key cryptosystem. In fact, another that it resolves. Then, we study the RSA algorithm, its big difficulty is due to the necessity of having a key for mathematical aspects and its applicability in the field of every pair of user. This is not a problem if the purpose digital signatures. We also present some known common attacks to the RSA algorithm. of the cryptosystem is just to protect the communication between two users: they share the key Index terms ─ Attacks, confidentiality, digital signature, “in some way” and they use it to encrypt all the data non-repudiation, Public-key cryptography, RSA, security. that they send to each other, from instant messaging sessions to emails. However, it becomes a big issue when the users of a company need to communicate I. INTRODUCTION with each other. If every pair of users of the company HE need of achieving confidentiality in has to communicate in a secure way, without the Tcommunication s arose early in the human history possibility that the contents of the messages could be [1]. However only the inventions and the events of the revealed to other colleagues, then every possible pair of last century, in particular starting from the Second users needs to agree on a key. Hence, the number of the World War, led to a diffused interest concerning keys that are needed is given by the formula ( 1) cryptography, first among scientists and governments = = ( ) and, in the last decades, also among companies and 2 2 푛 푛 푛 − 2 common people, due to the general availability of where denotes� � the number ofΘ 푛the users of the cryptosystem [1]. This would face companies having personal computers. With the rapid growing of the 푛 Internet and of the use of wireless networks, the hundreds or thousands of workers with the non-trivial necessity of confidentiality in electronic transmissions fact of creating and managing the keys. became even more pronounced. The public-key cryptography techniques overcome Modern symmetric-key encryption makes it possible these two problems in an elegant and innovative way. to create a secure channel between two users [1]. In the The major problem to tackle consists of the need to following we refer to these two users as Alice and Bob, decide on a key and share it among the users who want and we denote them with the letters A and B, to communicate, before the communication can begin. respectively. We also assume the fact that they share an This seemed to be an insurmountable obstacle: how encryption key. Using their secure channel, the two can a message be sent in a secure way without agreeing users can send each other messages in a secure and on a key beforehand? A solution appeared when private way. This system is effective for sharing thinking at the double-lock riddle [1], which is a kind information when the two users can agree on the key of three-way handshaking. The user A wants to send a before the secure communication takes place. pack to his friend B, but the channel between them is insecure. To avoid that someone could open the pack, A sends the pack closed by a lock that she has, without mathematical background, needed to understand the sending the proper key to B. At this point B receives algebraic properties on which RSA is based, then we the pack, but he cannot open it, due to the fact that he explain the encryption and decryption steps using the has not the right key. However, B also has a personal RSA method. After this we prove the correctness of the lock that is applied to the pack, already protected by RSA algorithm and finally we explain how the A’s lock. Then B sends the pack back to A, who encryption and decryption keys are chosen. In section 4 removes her lock with her own key and finally sends we describe the basic concept behind digital signature the pack again to B, that now can open it using his key. schemes, and we present two implementations of this Using this procedure A has sent a pack to B in a secure idea, using the RSA algorithm. In section 5 we present way, without the need of agreeing on a common key. some simple attacks on a straightforward This simple story tells us that it is possible to send a implementation of the RSA algorithm. Section 6 private message over an insecure channel, without the concludes the paper. necessity for the users to meet each other beforehand or to rely on a secure channel to share the key. So how can this idea be applied in practice? It is possible to II. PUBLIC-KEY CRYPTOGRAPHY take it as it is and use a symmetric and commutative The true revolution in modern cryptography comes encryption algorithm to do the transformations, and with the mathematical concept of the so called “one- send the message in three steps [1]. The procedure can way” functions (based on research done by Diffie and be mathematically described in this way: A calculates Hellman that dates back to 1976) [2]. These are and sends ( ) to B, where is the original invertible functions that are easy to calculate but seem message; B applies his encryption function ( ) to it, difficult to reverse, unless some secret (the decryption 푓퐸푎 푀 푀 key) is known. For this reason they are also known as obtaining ( ( )), that, due to the commutative퐸푏 푓 ∙ “trap-door” functions [2]. Two examples are the property of 퐸푏, is퐸 푎equal to ( ( )), and he sends it 푓 푓 푀 discrete logarithm over a finite cyclic group and the back to A; A, using her decryption푎 푏 function ( ) gets 푓 푓퐸 푓퐸 푀 factoring of an integer into its prime factors [3]. ( ) and sends it to B, who, after applying his 푓퐷푎 ∙ We explain intuitively how public-key cryptography decryption function ( ), can finally read . 푓퐸푏 푀 works. First of all, instead of having a unique key (or Even if this approach퐷푏 works correctly, it has three a unique pair of keys and , respectively for the 푓 ∙ 푀 푘 major drawbacks: encryption and decryption phase), common to both A 푘푒 푘푑 • the necessity of having three communication and B, we have two pairs of keys, one for each user. A steps instead of one. This is a serious issue if pair is composed of a public key (used for encryption) the communication should be done in real- and a private key (used for decryption). In this case we time; have Alice’s pair ( , ) and Bob’s pair ( , ). The • this method restricts us to use encryption keys that are used for encryption (in this case and 푒퐴 푑퐴 푒퐵 푑퐵 functions that are commutative; ) are public, while the keys used for decryption are 푒퐴 • we cannot be sure of the receiver’s identity private to the respective users ( is private to Alice 푒퐵 (the receiver can be an attacker and not B); and is private to Bob). 푑퐴 • we must be sure that an attacker cannot break When A wants to send a private message to B, she 푑퐵 the system using the information obtained by does not use her encryption key, but instead (and here sniffing the channel (in practice ( ), is the clever step) B’s public key. B is the only one that ( ( )) and ( )). has the private key to decrypt the message, and so the 푓퐸푎 푀 Regarding the latter issue, it can be easily shown that message will reach B in a secure way. Similarly if B 푓퐸푏 푓퐸푎 푀 푓퐸푏 푀 even a perfect cryptosystem like One-time pad [1] can wants to send a private message to A he will apply A’s lead to the exposure of sensitive information if it is public encryption key to the message before sending it improperly used in this procedure [1]. to A [1]. Even if this is not a practical method it shows us that The procedure assumes that it should be very the need of exchanging the key in advance is not a significantly difficult for an attacker to obtain a user’s necessity at all. private key starting only from the public key of that The remaining of the article is organized as follows. user. If this assumption holds true, then the system can In section 2 we describe the basic idea behind public- be used to exchange messages in a confidential and key cryptography. In section 3 we present the RSA secure way [1]. algorithm, as it was invented by Rivest, Shamir and Public-key cryptography does not only remove the Adleman in 1977. In this section we first give a simple problem of having a secure way of sharing the key beforehand, but moreover it resolves also the rapidly have the same remainder when divided by ” [3]. For growing number of keys when a large group of users example, 13 is congruent to 7 modulo 2, because both 푐 wants to communicate with each other. In fact, a user 13 and 7 have the same reminder (1) when divided by needs only his pair of private and public keys to 2. We write this as 13 7 (mod 2). Moreover, we communicate with everyone in the world. If someone denote with [ ] (or simply [ ] if it’s clear at which ≡ wants to send a message to this user, it is enough to use we are referring) the unique reminder resulting from 푎 푐 푎 푐 the recipient’s public key (that can be put for example the division of by . We note that the reminder [ ] in receiver’s webpage or in a public archive). For this is always greater or equal to zero, but strictly less than 푎 푐 푎 푐 reason, if a public-key encryption method is used, then its divisor , that is, 0 [ ] < [3]. the number of keys is ( ), where is the number of As we already said, in our setting congruencies are 푐 ≤ 푎 푐 푐 users of the cryptosystem [1]. naturally related with remainders. This fact, united with Θ 푛 푛 We continue with the description of the RSA the “good definition” of congruencies, gives us the algorithm, that is based on the fact that it appears much possibility to understand the following proposition and easier to multiply two integer numbers than to factor what it implies. their product [3]. Proposition 1: Let , , , , and suppose that (mod ) and (mod ). Then [3] 푥1 푥2 푦1 푦2 푐 ∈ ℤ i. + + (mod ), 푥1 ≡ 푥2 푐 푦1 ≡ 푦2 푐 III. THE RSA ALGORITHM ii. (mod ). 푥1 푦1 ≡ 푥2 푦2 푐 As we said above the security of the RSA algorithm As stated in the hypotheses of the proposition, this 푥1푦1 ≡ 푥2푦2 푐 is based on the alleged difficulty of factoring a number holds for every . However, if we restrict ourselves into its primes. To fully understand the mathematical to the case in which > 0, from this proposition comes 푐 ∈ ℤ aspects on which RSA is based we need some the following property: 푐 definitions and theorems. [ ] = [ ][ ] , To keep the exposition short we do not present the where every remainder is modulo > 0. This fact 푥푦 � 푥 푦 � methods that are used to create huge prime numbers permits us to calculate in an efficient way the 푐 (except with negligible probability that they fail to be) remainder of huge numbers, for instance of powers, that are needed for the RSA algorithm. An interested like in the following example: reader can find examples of such probabilistic [19 ] = [19][19] … [19] = [1 1 … 1] = 1, procedures in [1], [3] and [4]. where2073 every remainder is modulo 6. Even though this 6 � � ⋅ ⋅ ⋅ is a simple case, the basic idea remains the same for A. Mathematical background more complicate cases, and the final remainder can be We start by defining what we mean with “divide”. efficiently calculated using the “squaring algorithm” Definition 1: Let , , . Suppose = . In this [1], [3], or with even more efficient algorithms [5]. case we say that ( ) divides [3]. Based on the well-known definition of prime 푎 푏 푐 ∈ ℤ 푐 푎푏 According to the definition we have that both 3 and 5 numbers, we recall here the definition of relative prime 푎 푏 푐 divide 15 (since 15 = 3 5), that 1 divides every numbers. integer (since = 1 , where ) and that every Definition 3: Two numbers and are called ⋅ integer divides 0 (since 0 = 0 , ). Denoting relatively prime if 푚 ⋅ 푚 푚 ∈ ℤ 푎 푏 with div( ) the set of the positive integers that divide gcd( , ) = 1 ⋅ 푚 ∀푚 ∈ ℤ we have, for example, div(7) = {1,7} (notice that 7 is where gcd stands for greatest common divisor [3]. 푥 푥 푎 푏 prime), div(15) = {1,3,5,15} and div(0) = . For example, gcd(3,17) = 1 (this is easily seen Then we need to introduce congruencies. because both 3 and 17 are primes) as well as ℕ Definition 2: Let , , . Then and are gcd(4,15) = 1 because we have div(4) = {1, 2, 4} and congruent modulo if divides (or similarly div(15) = {1, 3, 5, 15} and so the greatest common 푎 푏 푐 ∈ ℤ 푎 푏 ) [3]. divisor is 1. The greatest common divisor can be 푐 푐 푏 − 푎 We denote it as efficiently calculated with an algorithm due to Euclid 푎 − 푏 (mod ). [1], [3]. It can be proved that the congruence is well-defined Another necessary definition is the famous Euler 푎 ≡ 푏 푐 (so we can add and multiply in the usual way) and it is totient function. Before defining it, we must think at an equivalence relation [3]. the set of the natural numbers strictly minor of a certain In our setting will always be strictly greater than \{0}, i.e. the set of possible remainders of a zero ( > 0). In this case we can read the above division by . This set is usually denoted as / [3]. 푐 푛 ∈ ℕ definition as “ is congruent to if and only if they For example if = 5, then /5 = {0,1,2,3,4}. 푐 푛 ℤ 푛 푎 푏 푛 ℤ Definition 4: The Euler totient function calculated for a Proposition 3: Let , , . If and are relatively certain \{0} is defined as the cardinality of the prime, divides and divides then the product 푎 푏 푐 ∈ ℤ 푎 푏 set divides c [3]. 푛 ∈ ℕ 푎 푐 푏 푐 푎푏 ( / ) = { / , gcd( , ) = 1}, For example if = 2, = 9 and = 36 we have that where we have denoted∗ with ( / ) the set of possible gcd(2,9) = 1, 2 divides 36, 9 divides 36, and so 18 ℤ 푛 푥 ∈ ℤ 푛 푥 푛 푎 푏 푐 remainders of a division by n, relatively∗ prime to [3]. divides 36. On the contrary it should be noted that if ℤ 푛 The function is usually denoted by the Greek letter , = 3, = 6 and = 24 we have that both 3 and 6 푛 and for this reason is also called Euler -function. divide 24, but 18 does not divide 24 (because 휑 푎 푏 푐 Before stating two fundamental properties of the - gcd(3,6) = 3 1). 휑 function we show in the following tables the values Theorem (Euler): Let , \{0} with 휑 ≠ that it takes for the first 20 numbers greater than 0. gcd( , ) = 1. Then [3] 푎 ∈ ℤ 푛 ∈ ℕ ( ) 1 ( ). 1 2 3 4 5 6 7 8 9 10 푎 푛 ( ) 1 1 2 2 4 2 6 4 6 4 An interested reader휑 푛 can find a proof of this theorem 푎 ≡ 푚표푑 푛 푛 in [3]. We point out that, in the particular case in which 11 12 13 14 15 16 17 18 19 20 휑 푛 is a prime number we have ( ) 10 4 12 6 8 8 16 6 18 8 푛 1 (mod ), 푛 휑This푛 table has been generated listing, for every that is the so called 푛Fermat’s−1 little theorem [3]. 푎 ≡ 푛 number , the numbers , 0 < , and counting among these the ones relatively prime with it. 푛 푥 ∈ ℕ ≤ 푥 푛 This is a really slow procedure, and it is B. Encryption and Decryption Steps computationally infeasible for large [3]. However, In the RSA system, the public key, used for from these tables we can notice two important encryption, is composed by two positive integers, 푛 properties: denoted by and . The former is a product of two • ( ) = 1 huge and different primes, that we will refer to as and if is a prime, then ; 푁 푒 • if , \{0} are relatively prime then , while the latter is the encryption exponent. Here, 푝 휑 푝 푝 − 푝 ( ) = ( ) ( ). “huge primes” denote numbers around 100 decimal 푚 푛 ∈ ℕ 푞 The former comes from the fact that by the definition digits or more [4]. 휑 푚 ⋅ 푛 휑 푚 휑 푛 of prime every number , 1 1, is We denote by a mathematical representation of the message that we want to send, so that is a natural relatively prime with . For the latter an interested 푀 푥 ∈ ℕ ≤ 푥 ≤ 푝 − positive number strictly less than , i.e. 0 reader can find a demonstration in [3]. These two 푀 푝 1. If the message is too long to be represented using a properties form the “trap-door” of the RSA algorithm 푁 ≤ 푀 ≤ 푁 − [4]. number strictly minor than , the message is split in several parts, so that each of them can be represented We introduce now two propositions and an important 푁 algebraic theorem due to Euler that are used when with a natural number strictly less than [1]. Since every part will be separately encrypted, we assume proving the correctness of the RSA algorithm. 푁 Proposition 2: Let , . Then , such that here that the message that we want to send can be + = gcd( , ). represented (in some way) with a number so that 푚 푛 ∈ ℤ ∃ 휆 휇 ∈ ℤ Moreover, if , such that + = 1 then 0 1. 휆푚 휇푛 푚 푛 푀 gcd( , ) = 1 [3]. At this point, the encryption is very simple. A user of ∃ 휆 휇 ∈ ℤ 휆푚 휇푛 ≤ 푀 ≤ 푁 − So for example gcd(3,5) = 1 and 2 3 + ( 1) 5 = the cryptosystem has only to raise the number to the 푚 푛 th power modulo . For example if Alice wants to 1. The reader should observe that there are infinite 푀 ( ⋅ ) − ⋅ send a message to Bob she has to calculate , such that + = gcd , [3]. Suppose 푒 푁 = (mod ) = [ ] we have already found suitable , : if we denote 푀 휆 휇 ∈ ℤ′ 휆푚 휇푛′ 푚 푛 where B’s public 푒key퐵 is ( , ) and푒퐵 with we with = + and = then we have 퐶퐵 푀 푁퐵 푀 푁퐵 휆 휇 ∈ ℤ denote the resulting encrypted version of [4]. Notice that 푁퐵 푒퐵 퐶퐵 휆 휆 푘휆푛′ + ′ 휇= gcd휇 −( 푘휆푚, ), that 0 1, so the encrypted message is no 푀 holds true for every . Taking the example above longer that퐵 the original퐵 one [4]. 휆 푚 휇 푛 푚 푛 ≤ 퐶 ≤ 푁 − we have, for = 1, then ′ = 12, ′ = 7, and that To decrypt a user has to raise it to the th power 푘 ∈ ℤ 12 3 + ( 7) 5 = 1. modulo , where퐵 is his own private key.퐵 In our 푘 휆 휇 − 퐶 푑 From a computational point of view, we can say that example B퐵 will calculate퐵 ⋅ − ⋅ 푁 푑 (a pair of) these , can be efficiently computed ( ) = (mod ) = 푑퐵 푑퐵 with the so called extended Euclidian algorithm [1], 퐵 퐵 퐵 퐵 퐵 휆 휇 ∈ ℤ 퐷 퐶 퐶 푁 �퐶 �푁퐵 [3]. where is B’s private key [4]. In this way we obtain D. How to Generate the Encryption and Decryption = hence the original message that we Exponents 푑퐵 푑퐵 Before we explain how two suitable exponents and want to 퐵send. 푀 �퐶 �푁퐵 can be created, we recall that this procedure will be 푒 done only once for every user [1], [6]. 푑 C. Correctness of RSA If = , with and primes, the encryption Denoting with and two prime numbers, and with exponent is chosen as a natural number relatively 푁 푝푞 푝 푞 their product, we state the following proposition, on prime with ( ) = ( 1)( 1), i.e. 푝 푞 푒 which RSA is based. gcd( ( ), ) = 1 [3]. From proposition 2 we know 푁 휑 푁 푝 − 푞 − Proposition 4: Let be any integer and a natural that there exist , such that 휑 푁 푒 number. Then [3] ( ) + = 1, ( )푋( ) 푘 휆 휇 ∈ ℤ (mod ). and that there is an infinite number of them. 휆휑 푁 휇푒 Before moving푘 푝 −to1 푞the−1 +proof1 of this proposition we We choose the pair of , for which 0 < < ( ) 푋 ≡ 푋 푁 recall that if and are prime numbers than ( ) = (it is easy to prove that such exists and it is unique 휆 휇 휇 휑 푁 1, ( ) = 1 and ( ) = ( ) ( ) = ( [3]). This is the decryption exponent, so we denote it 푝 푞 휑 푝 휇 1)( 1) [3]. So the proposition can be rewritten as as and the equation can be rewritten as 푝 − 휑 푞 푞 − 휑 푁 휑 푝 휑 푞 푝 − 휇 ( ) (mod ). ( ) + = 1. 푞 − 푑 Proof: By Prop. 3푘휑 it푁 is+ 1sufficient to prove that the two Since ( ), and are strictly greater than 0 we 푋 휆휑 푁 푑푒 congruencies ≡ 푋 푁 have that < 0. Denoting with = we have that 휑 푁 푑 푒 ( )( ) (mod ), ( ) + 1 = , 휆 푘 −휆 푘 (푝−1 )(푞−1 )+1 (mod ) and so, recalling the discussion in the last two 푋 ≡ 푋 푝 푘휑 푁 푑푒 hold true. We 푘focus푝−1 now푞−1 +only1 on the first one, due to subsections, we correctly have that 푋 ≡ 푋 푞 ( ) the fact that the proof for the second one is similar. So [[ ] ] = [ ] = = , we must now prove that where every 푒 remainder푑 푒푑 is modulo푘휑 푁 + 1 and the last 푋 푋 �푋 � 푋 ( )( ) (mod ) equality comes from the fact that 0 1 [3]. 푁 is true. We distinguish푘 푝−1 푞− two1 +1 situations: divides and An important fact, from a security point of view, is 푋 ≡ 푋 푝 ≤ 푋 ≤ 푁 − does not divide . that the encryption exponent should be not less than 푝 푋 If divides then 0 (mod ) by definition of log ( ), so that every possible message (except = 0 푝 푋 congruence. Hence, using Prop. 1(ii), or = 1) undergoes a reduction modulo during 푝 푋 푋 ≡ 푝 2 푁 푀 ( )( ) 0 (mod ) encryption [4]. 푀 푁 and we get the 푘desired푝−1 푞 −result1 +1 . 푋 ≡ 푝 If does not divide then they are relatively prime, in fact div( ) = {1, } (because it is prime) and IV. RSA AND DIGITAL SIGNATURE 푝 푋 div{ }, so gcd( , ) = 1. So we can apply the Euler As already said public-key cryptography has enabled 푝 푝 푝 ∉ theorem, which gives us to resolve the problem of managing the high number 푋 푝 푋 ( ) = 1 (mod ). of keys and to remove the necessity of agreeing on a Using Prop. 1(ii)휑 푝we obtain푝−1 common key beforehand. Other interesting possibilities 푋 푋 ≡ 푝 ( )( ) ( ) ( ) 1 (mod ). of public-key cryptosystems are based on the fact that Multiplying푘 푝− 1the푞 −congruence1 푝−1 by푘 푞− 1we get such schemes have permitted us to digitally sign every 푋 ≡ 푋 ≡ 푝 ( )( ) ( )( ) (mod ), message we want to send [1], [6]. Moreover digital 푋 and so푘 our푝−1 assertion푞−1 is proved푘 푝−1 푞 [3].−1 + 1 signature schemes achieve three important goals [6]: 푋 푋 ≡ 푋 ≡ 푋 푝 As we have restricted the possible message lengths to • the signature is publicly verifiable; values of between 0 and 1 there is no ambiguity • the integrity of the message is guaranteed; in the decryption phase. In fact the uniqueness of the • the sender cannot repudiate the message after 푀 푁 − remainder of the Euclidian division by guarantees us it has been sent. that, even if there is an infinite number of s for which We present here the basic idea behind digital 푁 ( ) (mod ) signature, its straightforward employment of the RSA 푋 holds true, there is푘휑 only푁 +1 one such that 0 algorithm and a slightly different procedure that 푋 ≡ 푋 푁 1 [3]. permits us to overcome some technical problems. 푋 ≤ 푋 ≤ 푁 − ( ) We denote with an encryption procedure that uses A’s public key, and with ( ) a decryption 푒퐴 ∙ procedure that uses A’s private key. 푑퐴 ∙

A. Basic Idea the original message and then to sign the output of this In the following, we explain the idea behind digital function [1], [6]. The hash function should be a signatures. Suppose A wants to send a message “random-looking” one-way function with strong (encrypted or not) to B, certifying that the message collision resistance (like SHA-2 [1]). At this point, if A 푀 comes from her and moreover that it has not been wants to send a signed message to B, she must tampered with. For achieving this, A calculates, using calculate here private key , a signature that is message- = [ ( ) ] 푑퐴 dependent, = ( ). When B receives the pair and send the pair ( , ) as usual.퐴 Then calculates 푑퐴 휎 퐻 푀 푁 ( , ) he computes ( ) and then checks if ( ) and the value 휎 푑퐴 푀 푀 휎 퐵 = ( ). [ ], 푀 휎 푒퐴 휎 퐻 푀 If the last equation holds true the message has not where the reduction is modulo푒퐴 . If the equation 푀 푒퐴 휎 휎 been tampered with and the sender is definitely A [1], ( ) = [ ] holds true, then the message has not 푁퐴 [4]. been tampered푒퐴 with and A can be recognized as the 퐻 푀 휎 This procedure assumes that nobody (except A) owns sender [6]. and that the cryptosystem is secure, i.e. there are no Since the output of a hash function is limited in feasible procedures to calculate ( ) without length (e.g. for SHA-2 it can be 224/256 bits or 푑퐴 knowing the private key [1]. 384/512 bits [1]) the resulting overall message length is 푑퐴 푀 This procedure works if is encrypted or not. only slightly bigger than the length of the unsigned 푑퐴 However, it should be noted that, to ensure the privacy message. The strong collision resistance property of the 푀 of the message M, this one should be signed first, and hash function guarantees (except with negligible only after encrypted (carried out by encrypting the pair probability) that ( ) ( ) if and are two ( , ), where is a plain message) [1]. different messages [1]. ′ ′ 퐻 푀 ≠ 퐻 푀 푀 푀 The verification procedure can be carried out by As we describe in the next section the increased 푀 휎 푀 everyone, since we suppose that is public. security of this method comes from the fact that a “good” hash function, as every one-way function, is 퐴 푒 difficult to invert [1], [6]. B. Applying RSA

The application of the RSA algorithm to digitally signing is described in [6]. In practice the sender A V. COMMON ATTACKS TO RSA calculates In this section we explain some known attacks = [ ] against the most straightforward implementation of the where we assume that A’s public푑퐴 key is ( , ), the 휎 푀 푁퐴 RSA algorithm. However, it should be noted that corresponding private key is , and is the message 푁퐴 푒퐴 current cryptosystems, that are based on the RSA that A wants to send. The receiver B verifies the 푑퐴 푀 algorithm, should follow the recommendations sender’s signature and the integrity of the message described in RSA Laboratories Public Key calculating Cryptography Standard PKCS#1, a widely-used and [ ] standardized encryption scheme [6], [7]. It uses a and checking if it is equal푒 퐴to . It this is not the case, 휎 푁퐴 preprocessing padding algorithm, called Optimal (i.e. the sender is not A, or the original message has Asymmetric Encryption Padding (OAEP) [8], devised 푀 been modified) B simply rejects the message [6]. by M. Bellare and P. Rogaway, that is applied to the Unfortunately, this simple procedure has some message before the encryption phase takes place [1], drawbacks: [7]. • the signature’s length is comparable with the This addition, combined with an accurate choice of message size, and so the overall message the primes and and of the encryption and length is doubled (on average) [1]; decryption exponents, leads to an increased security 푝 푞 • this procedure is not secure (we describe [1], [6]. In this way we overcome the simple attacks shortly some possible attacks) [6]. described in this paper as well as other more complex attacks. Hence, at the time of this writing, the C. Hashed RSA implementations of the RSA algorithm that are based Numerous modifications to the method described on PKCS#1 standard are considered secure [1], [6]. above have been proposed [6]. The common idea is not to directly sign the message, but instead to apply a one- way function, and in particular a hash function ( ), to

퐻 ∙ A. Attacks Against Plain RSA Implementation in the hypotheses of the so called Chinese remainder To keep the exposition short and simple we present theorem [3], which guarantees us that the system two basic attacks described in [6]. We notice that other = (mod ) simple attacks can be carried out if the system is = 3 (mod ) 푐̃ 푀 푁1 misused, for instance: = 3 (mod ) �푐̃ 푀 푁2 • if the decryption key is too small (a brute has solution and that this3 is unique for 0 < , 푐̃ 푀 푁3 force attack is feasible) [6]; where = [3]. This solution can be 푐̃ ≤ 푐̃ 푁 • if the same is used for more than one user efficiently computed using the extended Euclidian 푁 푁1 ⋅ 푁2 ⋅ 푁3 [6]; algorithm [1], [3]. 푁 • if the decryption key has been partially At this point we have = [ ] (notice that the exposed [1], [6]. reduction is modulo ). Since 3 is strictly less than 푐̃ 푀 푁 There are also algorithms that attempt to directly min { }, , necessarily < = , hence 푁 푀 factorize , for instance Pollard’s ( 1)-algorithm we can calculate, as above, the3 cube root to obtain the 푖 푁푖 ∀푖 푀 푁 푁1푁2푁3 [3], Pollard’s -algorithm and the quadratic sieve message , since there has been no modular reduction 푁 푝 − algorithm [9]. These algorithms can be tried against referred to [6]. 휌 푀 every cryptosystem that is based on the RSA algorithm. 푁 However, a successful factorization depends heavily on B. Attacks on the Basic RSA Signature Scheme the attackers’ computational power and on the choice The two attacks that are presented below work only of the primes and [1], [3]. against the basic RSA signature scheme that we have

푝 푞 presented in Section 4.B, and cannot be carried out Encrypting short messages using a small encryption against schemes that use some good hashing algorithm exponent (like the standard RSASSA-PSS, described in [7]) [6]. As stated above it is important that the encryption 푒 exponent is not small, i.e. not less than log ( ), so that The no-message attack every message undergoes a reduction modulo [4]. In 2 푁 For an attacker it is easy to create a valid pair ( , ) fact, if the encryption exponent is too small (for 푁 without knowing the proper private key and even example = 3), then the encryption phase of short 푀 휎 without obtaining any valid signature from the messages < = does not involve any 푑 푒 1 1 legitimate signer [6]. It is sufficient to choose an 푒 3 arbitrary ( / ) (a legal signature) and calculate modular reduction,�푀 푁 since푁 the� calculated integer (in our example ) is strictly minor than [6]. 푒 = [ ] , where ∗ is the public encryption exponent. 푀 푒 휎 ∈ ℤ 푁 In this case we3 have that In defense푁 of the basic scheme one may argue that the 푀 푁 푀 휎 푒 = (mod ) = [ ] = . attacker has no control over the message . If this happens, 3then the calculation3 of the3 th root (in Nevertheless, we can observe that the possibility of 퐶 푀 푁 푀 푁 푀 푀 our example a normal cubic root over the integers) creating a valid pair ( , ) is not a good property for a 푒 gives us the plain message [6]. Since is a huge digital signature scheme [6], in particular considering 푀 휎 number, this can be a realist attack for several the non-reputability property. Moreover, the attacker 푀 푁 messages. has “some” control over the message : by choosing several random values of ( / ) the adversary 푀 A general attack for small encryption exponent and can set (with high probability) certain bits∗ of the output 휎 ∈ ℤ 푁 common message message [6]. 푒 Assume again that is small (say = 3) and a 푀 푀 certain message has been sent to multiple different Forging a signature on an arbitrary message 푒 푒 receivers (say three users). These users have public A serious attack on the basic RSA signature scheme 푀 keys ( , 3), ( , 3) and ( , 3), respectively. Suppose can enable an adversary to obtain a valid pair ( , ), that an eavesdropper that is listening to the channel with arbitrary [6]. Suppose the attacker wants to sign 1 2 3 푀 휎 푁 푁 푁 [ ] a certain ( / ) (a valid message), where the sees the three different encrypted messages , 푀 3 public key of the legitimate∗ signer is ( , ) and his [ ] and [ ] . We can assume that gcd , =1 푀 푁 푀 ∈ ℤ 푁 3 3 private key is . The adversary creates two messages 1, ,푁2, : in 푁3fact if this is not the case푖 , 푗 the 푁 푒 푀 푀 �푁 푁 � , and , such that the former is chosen at random, calculation of gcd , gives us a non-trivial factor 푑 ∀푖 푗 푖 ≠ 푗 while1 the 2latter is set to = [6]. We of and , and so 푖we푗 have broken the cryptosystem 푀 푀 −1 �푁 푁 � notice that appears to be 2random,1 but actually it is [3], [4], [6]. So, if gcd , = 1, , , we are 푀 �푀 ∙ 푀�푁 푁푖 푁푗 chosen ad hoc by the attacker. 푀2 �푁푖 푁푗� ∀푖 푗 푖 ≠ 푗 Suppose the adversary obtains (in some way) the [3] N. Lauritzen, Concrete Abstract Algebra: from valid signatures of the two messages and numbers to Gröbner bases, Cambridge, UK: (denoted with and , respectively). Then we claim Cambridge University Press, 2003. 푀1 푀2 [4] R. Rivest, A. Shamir, and L. Adleman, “A Method for that he can calculate a valid signature for the 휎1 휎2 Obtaining Digital Signatures and Public-Key message as = [ ], where the reduction is 휎 Cryptosystems”, Communications of the ACM, Vol. 21, modulo [6]. 푀 휎 휎1 ⋅ 휎2 No. 2, February 1978, pp. 120–126. This can be proved in the following way [6]: [5] D. E. Knuth, The Art of Computer Programming, Vol 푁 ( ) (mod ), 2: Seminumerical Algorithms. Addison-Wesley, 푒 but푒 푒 푑 푑 푒푑 푒푑 Reading, Mass., 1969. 휎 ≡ 휎1 ⋅ 휎2 ≡ �푀1 푀2 � ≡ 푀1 푀2 푁 [6] J. Katz and Y. Lindell, Introduction to modern (mod ), cryptography, Chapman & Hall/CRC, 2008. and so we have 푒푑 푀 ≡ 푀 푁 [7] RSA Laboratories, 2002, June 14. PKCS #1: RSA (mod ). Cryptography Standard, Version 2.1 [Online]. Even if this is푒 clearly not a good property for a digital Available: 휎 ≡ 푀1푀2 ≡ 푀 푁 signature scheme, one can argue that it is difficult to http://www.rsa.com/rsalabs/node.asp?id=2125. convince a signer to sign two arbitrary messages. [8] M. Bellare and P. Rogaway, “Optimal asymmetric Unfortunately, this is not always true: in fact signing a encryption -- How to encrypt with RSA”, Advances in random value is sometimes used as authentication Cryptology - Eurocrypt 94 Proceedings, Lecture Notes in Computer Science Vol. 950, A. De Santis ed, method within a three-way handshaking procedure [6]. Springer-Verlag, 1995. In this case it is easy for a fake router to get the desired [9] C. Pomerance, “A tale of two sives”, Notices of the signatures and , and so enabling the attacker to American Mathematical Society, Vol. 43, 1996, pp. correctly sign an arbitrary message [6]. 1473-1485. 휎1 휎2

푀 VI. CONCLUSIONS In this paper, we have briefly presented the main limitations of symmetric-key encryption techniques, and discussed how public-key cryptosystems overcome these limitations. We have then explained the well- reputed RSA algorithm, on which many widely used cryptosystems are based. We have put forward the applicability of RSA to the important field of digital signatures. Moreover, we have outlined some simple attacks that can be carried out against a straightforward and deterministic implementation of this algorithm, without a proper padding applied before the encryption takes place [6]. At the time of this writing, cryptosystems based on the RSA algorithm, which meet the recommendations specified in [7] can be used to guarantee confidentiality, integrity and non-repudiation of the messages exchanged among a group of users. Cryptosystems that meet those recommendations are in fact trustworthily used during online bank transactions and other kind of digital communications where security and privacy are key issues.

REFERENCES [1] A. Languasco and A. Zaccagnini, Introduzione alla crittografia, Milano, IT: Ulrico Hoepli Ed., 2004. [2] W. Diffie and M. Hellman, “New directions in cryptography”, IEEE Transaction on Information Theory, Vol. 22, No. 6, November 1976, pp. 644-654. Software Performance Testing in the Cloud

Fredrik Abbors Department of Information Technologies, Abo˚ Akademi University Joukahaisenkatu 3-5 A, 20520, Turku, Finland Email: {fredrik.abbors}@abo.fi

Abstract—Performance characteristics such as response times, computing. Cloud computing offers a solution to many of the throughout, and scalability are becoming key quality attributes challenges presented above by offering computer resources for modern software applications. These attributes can be very as an on-demand and pay-as-you-use service. With this new hard to test due to the fact that software applications usually are deployed on different platforms and the amount of data technology companies can rent computer resources, configure needed to test these systems is just overwhelming. However, them as they like, and build their entire testing infrastructure researchers within the field of performance testing have found upon it. This opens up a totally new window of opportunity several ways to solve this dilemma. The most novel solutions for companies that are facing a hard time in dealing with of performance testing take advantage of the benefits that cloud performance testing issues. computing provides in terms of unlimited computational resource and automatic scalability at lower costs. In this paper, we discuss the idea of performance testing of software in the cloud. Customers demand highly efficient, low-cost, and reliable software products. Companies are forced to build high-end products with a low budget in a short time. The increasing demand for software products forces the companies to develop I.INTRODUCTION new products at a very fast pace. We see a constant decrease in the time-to-market and customers demanding more flexi- The complexity of software systems and applications is ble systems which result in the growing system complexity. increasing. Additionally, software systems are usually de- Missing the deadline for the time-to-market can have a huge ployed within different types of distributed environments. negative impact on the company’s profit. Unfortunately, this Performance characteristics such as throughput, response time, fast pace leaves the companies with less time for testing and scalability are becoming increasingly important for such their products. With cloud computing, companies could reduce applications and systems. For this reason, it is critical to the overall expenses of testing and possibly also the total verify that the system satisfies its performance requirements. testing time, leaving the developers with more time for actually Studies [1] show that a significant percentage of the deployed designing and implementing the software. This would result applications have performance issues in practice. in better and more reliable software.

The key to successful product engineering in the software We proceed with a short introduction to Cloud Computing in industry today is in many cases a good quality-assurance and Section II, followed, in Section III, by a brief overview of per- deployment of software systems. In a software development formance testing of software applications and the associated process, testing is the means to ensure the quality of a product challenges. In Section IV we describe how the cloud can be and to verify that the product meets its requirements. The used to tackle many of the challenges concerning performance purpose of testing is to find faults that have been introduced testing of software applications, whereas in Section V we during the development of the system, starting with the ini- present the conclusions. tial specification phases and ending with its implementation. Today software is often deployed onto several different plat- forms which usually causes problems for the testers. Buying II.CLOUD COMPUTING equipment for maintaining several different platforms, simply for testing purposes, may not be a practical solution for In this section we put forward a short introduction to cloud most companies. Additionally, more and more applications are computing, followed by a brief description of the different being deployed on the Internet and are accessible by millions types of cloud services offered to customers. of users. Testing the scalability of such applications is not a trivial task. Companies may not have enough money to spend on testing equipment for such large-scale applications. All A. Definition of cloud computing of these factors, plus more, contribute to the complexity and challenges that performance testers are facing. As we move on in the 21st century, the way we deliver services is changing. Customers need services that are offered Fortunately, there is a new trend on its rise called cloud on demand and are accessible from anywhere. Today, it is not uncommon to find services that are only offered through the internet and on demand rather than offering it as a piece of software installed on one computer. This led to what we today refer to as cloud computing.

Experts disagree on the exact definition of Cloud Com- puting, but most people agree that it includes the notion of web-based services that are available on-demand from an optimized, and highly scalable service provider [2]–[4], e.g. Amazon or Google. Think of it as the electricity delivered to every house. To get electricity, all we need to do is plug in the cord. We do not need to worry about generator and Fig. 1: Overview of the cloud pyramid [5] turbines, the electric company (the cloud) will do that for us. Additionally, we get the electricity on-demand, we pay by the amount we use, and the more electricity we need the more are also other techniques for a user to access cloud services. electricity is available to us. So, the term ”cloud” is just a Most of these requires installation of third-party software on metaphor to conceal complex processes that we do not need the users machine. Consider Dropbox as an example, users to deal with. This example also applies to cloud computing. can access content via a web browser or via an application installed on their computers. In its simplest form, cloud computing consists of shared computing resources that are accessible from anywhere and 2) The Platform: When we move down one level (to the offered as an on-demand service using the internet. By sharing platform layer) we now enter the back end of the cloud system. computer resources, cloud computing can also offer services The back end is the ”cloud” part of the system. The back end such as automation, scalability, and reliability. A very simple of the system is made of the various computers, servers and example is a web-based e-mail account such as G-Mail. With data storage systems used to create the cloud. The platform G-Mail one can log on to his account anywhere in the world. level is mainly targeted to developers. Developers write their The application and the information is stored on a collection code, upload it, and deploy it in the cloud where the appli- of servers rather than on the users computer, hence, there cation is running ”magically” somewhere. Usually this means is no need to download and install any software to use this that the application is running in a virtual machine, which in service. By hosting the application in the could, companies can turn is running on physical hardware. The cloud takes care provide increased availability and scalability. However, cloud of scaling the application automatically when the application computing is more than just software. usage increases. In practice this means that the application runs simultaneously on several identical virtual machines. The cloud itself takes care of redirecting user requests to different B. The Cloud Pyramid virtual machines as the workload increases. From the end user it appears as if the application is running on a single machine. From a user perspective, cloud computing is offered to its users as a normal web service, but behind this lies a very 3) The Infrastructure: At the bottom of the cloud architec- advanced and powerful 3-tier layered architecture [2]. Figure 1 ture we have the infrastructure layer. This layer offers the most depicts a typical cloud architecture. By architecture we mean powerful type of cloud service. In here, developers get access the logical architecture, rather than the physical architecture. to general computing, storage, networks, queueing, and other The easiest way to describe cloud computing systems is to resources to run their applications with the fewest limitations. divide it into two sections: the front end and the back end. With this, virtually any application and any configuration that They are connected to each other through a network, usually is fit for the Internet can be mapped to this type of service. the Internet. At the front end we have the actual application, This solutions offers everything one would normally need to while in the back end we have the infrastructure supporting it. build a complete infrastructure in a company. The difference is that it is offered through the cloud. 1) The Application: At the top most level we have the application layer. This is the side the user sees and is the front end to access the cloud system. The front end includes C. Cloud service levels the client’s computer (or computer network) and the applica- tion required to access the cloud computing system. In fact The three architectural levels mentioned in Section II-B almost everyone has already used this in the form of G-Mail, are often offered to users as a whole service. This enables Wikipedia, or any search engine found on the web. The most cloud computing vendors to offers different types of services general way to access cloud-based services is through a web depending on the need of the user. All service levels allow browser, such as Safari, Mozilla Firefox, or Chrome. There users to run applications and store data online. However, each offers a different level of user flexibility and control. manually in the later stages of development and constitutes a tedious and time consuming task. Performance testing is 1) Software-as-a-Service (SaaS): This type of service allow the means to evaluate the design of the system with respect users to run existing online application. The service is offered to the performance requirements and to determine how well via the Internet, hence there is no need to install any applica- critical aspects of a system perform under various workloads. tion on the customers own computer [6]. It can for instance With a sound and stable performance testing plan one can be an on-demand service where users pay via subscription measure different characteristics of a system, for instance, the or it can be offered to the users free of charge. In the latter response times, the reliability and scalability of the system, case, the revenues for the service are generated for instance detect possible workload thresholds and measure the systems from advertisement or by selling user information to other resource usage. All these factors play an important role in companies. Examples for this type of service is Facebook today’s software systems. and Google, where the revenues are largely generated from advertisement.

2) Platform-as-a-Service (PaaS): In this service type, users B. Different types of performance testing can create their own cloud applications using supplier-specific tools and languages. In other words, the cloud provider provide a complete software infrastructure for customers to build, There exist many different types of performance testing. deploy, and run their own application and services without Chih-wi Ho and Denaro et al lists a few common types in [11] them having to worry and manage the underlying software and [12], repectively. In this section we mention three types of and hardware layers beneath the platform. Hence, reducing performance testing and the techniques used to perform them. the time, cost, and complexity for the customer to manage this by themselves [7]. For example, Google have a product 1) Throughput testing: The focal point in testing for called App Engine [8], that enables anybody to develop, throughout is to examine how much data can flow through run, and maintain their own online applications on Google’s a system without the system resulting in a decrease in perfor- infrastructure. mance. The purpose of this type of testing is to understand the the boundaries of the systems capacity. Common techniques 3) Infrastructure-as-a-Service (IaaS): This type of service used for measuring the throughput of applications are stress offers a whole platform visualized environment to the cus- testing and endurance testing. The goal of stress testing is tomer. This means that uses can run any application they to determine if the system can perform adequately when the please on cloud hardware of their own choice. Hence, they load increases well over the expected maximum workload. In is no longer a need for customers to buy own servers, endurance testing one is typically interested if the system can data warehouses, software, or network equipment, instead handle a continuous expected workload for longer periods of existing application can be migrated from a company data time. center in order to reduce IT costs. Sometimes this service is also referred to as Hardware-as-a-Service [7]. For example, Amazon offers a product called ”Amazon Elastic Compute 2) Scalability testing: In this sub-genre the focus of the Cloud (Amazon EC2)” [9], that enables a variety of virtual testing lies in analyzing how well the system or application server instances to be rented or purchased by the hour. These responds to an increase in the workload. The purpose of this virtual server instances can be launched in minutes and are type of testing is to identify certain workloads and eliminate configured according to user specifications. bottlenecks that can depress the scalability of the application. For example, scalability testing can refer to the ability of an application to increase the throughput under an increased load III.PERFORMANCE TESTING when more resources are added.

In this section we briefly describe what performance testing 3) Response time testing: When performing this type of means and go through the most common types of performance testing, the tester is usually interested in the response time testing sub-genres. We also describe a few of the challenges of the system for a particular workload. Typically when that performance testers face today for each sub-genre. performing response time testing one would first ”flood” the system with a particular load and then measure the response time for a single transaction. A common technique used for A. Definition of performance testing measuring the response time of a system is load testing. In load testing one would typically estimate the number of concurrent Performance testing (PeT) is one of the activities of the Soft- users, the expected number of transactions per user, and apply ware Performance Engineering process as defined by Smith that on the system. The time taken for a single transaction to and Williams [10]. Performance testing is usually carried out execute is the systems response time for that particular load. A. How cloud computing can help

One of the biggest challenges performance testers are facing today is to test the system/application in the same environment as the one in which the system/application is going to be run in. There are a galore of different server platforms to choose from, e.g. Microsoft and Apple have each at least 3 active server platforms today, not to mention all the different distribu- tions of the Linux server platform. Further, all these platforms Fig. 2: An abstract architecture for performance testing tools can be configured in different ways, for instance, running on different versions of the Java Runtime Environment, etc. To ensure that the system/application runs smoothly on every C. Challenges with performance testing platform, one would have to buy all the different platforms and configure them each separately. In reality, this may not be practical for most companies. This is where cloud computing Although performance testers are facing a huge amount of can help. For instance, with the Platform as a Service (PaaS) a challenges today, in this section we will mention a few of the company could automatically rent all the platforms they need most relevant for this paper. In fact, Weyuker and Vokolos together with the desired configurations, run their tests, and report in their study [13] on the weakness of the published only pay for the actual time they used the service. At the end scientific literature in the domain of software performance of the day, they would have saved time and money by utilizing testing. First of all, before conducting performance testing one the power of cloud computing. needs to have stable enough software to test so that the results are conclusive and not influenced by, for example, a software Another challenge that performance testers must overcome crash or freeze. This is why performance testing is typically is the generation of large quantities of test data needed, e.g. done in the later stages of a software project when functional for scalability and throughput testing. Generation of the test problems have been removed. The definition of the metrics that data needed to test a system/application with millions of are going to be used is also important before one commence simultaneous users is not a trivial task. It could take hours performance testing. Without proper metrics the results may to just generate the amount of data needed. To overcome this be hard to analyze. hurdle, testers use specialized load generators to generate data. Figure 2 shows an simplified architecture used for most perfor- Performance testing is usually done before actual users mance testing tools. The downside is that the load generators start using the system. One of the biggest challenges with cost a lot of money. As pointed out in [15], by using the performance testing today is replicating a similar environment Software as a Service (SaaS) option of cloud computing, the (for testing purposes) as the environment in which the system application generating the load could be deployed in the cloud. is going to be deployed [14], e.g different types of platforms This way the cloud would take care running the application on with different types of configurations. Maintaining such an as many machines needed to generate the desired work load environment may in some cases be impractical, too expensive, in real time. This alternative would not solve the dilemma but or not even possible. Other challenges that performance testers would also save the company a lot of money as you only pay are facing is the generation of appropriate test data (work load) for the time you use the service. [14], the identification and specification of proper performance requirements, and the establishment of performance testing Performance testing requires large quantities of test data. goals. Generating meaningful and large quantities of test data This implies that the amount of output data can sometimes be for performance testing is not a trivial task, for example, overwhelming. Chief Technology Officer, Scott Barber, state generating test data from a system with millions of users with in one of his presentations [16] that collecting and analyzing multiple user profiles, user names, passwords, etc. all that data is very challenging. By using the processing power and the data storage possibilities of the cloud, collecting and analyzing large quantities of data can be made less painful. IV. PERFORMANCE TESTINGINTHE CLOUD In this chapter we have highlighted a few of the challenges in which cloud computing can help performance testers. How- In this section we will discuss how cloud computing could ever, there still remains many difficulties that performance help tackle a few of the challenges that performance testers a testers have to face, e.g. understanding the system under test, facing. As stated above, performance tester are facing many creating the actual performance tests, deciding what to mon- challenging tasks. Many of these cloud be tackled easier by itor, and understanding the metrics that are collected. These utilizing the benefits of cloud computing. are just a few examples of challenges that performance testers are facing on a daily basis. Unfortunately, there is currently [15] Mindtree, “Solutions to your performance testing chal- not much help that cloud computing can bring in helping lenges,” Web page, January 2010. [Online]. Avail- able: http://www.mindtree.com/blogs/solutions-to-your-performance- performance testers to overcome these types of challenges. testing-challenges-part-2 [16] S. Barber, Performance Testing Challenges, Presenta- tion, PerfTestPlus, Inc Std., 2006. [Online]. Available: http://www.perftestplus.com/resources/perf challenges ppt.pdf V. CONCLUSION

In this paper we have given a short introduction to the field of cloud computing and performance testing. We described the cloud pyramid and discussed the different types of cloud services that are offered to the customer. Further, we also highlighted a few challenges that performance testers a facing and also presented ideas on how cloud computing could help to overcome these challenges. Our opinion is that performance testing still remains difficult and very challenging, but cloud computing could address a few of the most painful tasks that performance testers have to deal with. Cloud computing can help performance tester overcome the painful task of maintaining a legion of different testing environments by offering the PaaS service type. With this type testers can easily rent the platform that they need together with the desired configurations and only pay for the actual time they used the service. Using the SaaS option of cloud computing, the task of generating large quantities of data can be made less agonizing for performance testers. We hope that this paper will trigger interesting discussions during the course.

REFERENCES

[1] Compuware, “Applied performance management survey,” Oct. 2006. [2] T. von Eicken, “The three levels of cloud computing,” Web Blog, July 2008. [Online]. Available: http://linux.sys-con.com/node/581961 [3] ServePath, “Glossary of dedicated server hosting terms,” Web Page, February 2011. [Online]. Available: http://www.servepath.com/support/definitions.php [4] Citizendium, “Definition of cloud computing,” Web page, February 2011. [Online]. Available: http://en.citizendium.org/wiki/Cloud computing [5] GoGrid, “Skydiving through the clouds,” White Paper. [Online]. Available: www.GoGrid.com [6] Wikipedia, “Software performance testing,” Webpage, February 2011. [Online]. Available: www.wikipedia.com [7] D. Hilley, “Cloud computing: A taxonomy of platform and infrastructure-level offerings,” Georgia Institute of Technology, April 2009. [8] Google, “Google app engine,” Web page, March 2011. [Online]. Available: http://code.google.com/appengine/ [9] Amazon, “Amazon elastic compute cloud,” Web page, March 2011. [Online]. Available: http://aws.amazon.com/ec2/ [10] C. U. Smith and L. G. Williams, Performance Solutions, ser. Object Technology. Addison-Wesley, 2002. [11] C. Ho and L. Williams, “An Introduction to Performance Testing.” [12] G. Denaro, A. Polini, and W. Emmerich, “Early performance testing of distributed software applications,” in Proceedings of the 4th inter- national workshop on Software and performance. ACM, 2004, pp. 94–103. [13] E. Weyuker and F. Vokolos, “Experience with performance testing of software systems: issues, an approach, and case study,” Software Engineering, IEEE Transactions on, vol. 26, no. 12, pp. 1147 –1156, dec 2000. [14] “Performance testing challenges,” Web page, February 2011. [Online]. Available: http://sites.google.com/site/performancetestingfun/challenges 1 The ZigBee Technology and its Applicability Frank Wickstrom¨ Network Software Abo˚ Akademi

Abstract—In this paper we discuss the applicability and low manufacturing cost and low power usage. We also do a potential of ZigBee, a low-power and low-cost technology for comparison to Bluetooth as this is one of the leading low-cost sending data over short distances. The technology has endless and low-power wireless technologies. We proceed as follows. possibilities such as remote metering, automation, and remote control. The main advantage over competing technologies is that In Section II we give a description of ZigBee techonology, in ZigBee devices can be manufactured much cheaper, and draw Section III we look at the applicability of ZigBee, in Section a lot less power. Security is also taken into consideration by IV we do a comparison to Bluetooth and Section V consists ZigBee as it allows its users to choose from several cryptography of the conclusion and discussion of this paper. methods. ZigBee aims to be a competitor and become the replacement for Bluetooth and similar technologies. II.ZIGBEE Index Terms—zigbee, bluetooth, low-power, low-cost ZigBee is a wireless technology created by the ZigBee alliance and built upon the IEEE 802.15.4 standard [5]. Like I.INTRODUCTION WiFi and Bluetooth it can operate over the 2.4 GHz band, but it can also use the 868 MHz band in Europe or the 915 HERE is a need to monitor almost everything today. We MHz band in the USA [5]. The data rates vary from 20 kbps have sensors in our cars, cellphones, washing machines T on the 868 MHz band to 40 kbps on the 915 MHz band and sometimes even in our bodies. For instance, for a little to 250 kbps on the 2.4 GHz band [3], [5], [6]. Using the device such as a pacemaker we need to monitor the state sub-GHz frequencies ZigBee can achieve a range of up to a of the equipment so that it does not fail, causing the user 100 meters between node. This is about the same length as to have irregular heartbeats. We also need to know what the that of the 802.11g wireless network card, but one of ZigBees weather is going to be like in the upcoming days, or how true strengths comes from its ability to create ZigBee-Wireless much rainfall have we had in the last week. All this needs to Mesh Networks (ZigBee-WMNs) by linking nodes together be done by small sensors that consume only a fraction of the and thus creating a mesh-network [3]. power consumed by a computer. When data has been collected, ZigBee-WMNs are “self-configuring, self-healing and it needs to be transported to a larger database or to another easy-to-maintain networks” [3]. This means that one can storage system and there processed so that humans can view easily add new nodes to the network and they will configure the results in a more practical way. themselves automatically. Also, if there is a problem and one There are several technologies that enable us to send this or more nodes fail, a self-healing property of the network that data from the sensors to a computer to be processed. For starts to reconfigure the network to use different routes to be instance, for home automation there is 1-wire, a wired solution able to access all the nodes. This makes ZigBee-WMNs really for sending data. There are also wireless solutions such as easy to configure as well as robust [3]. Insteon, Z-Wave and ZigBee. ZigBee, Z-Wave, and the other Another strength of ZigBee is that it consumes very technologies are very suitable for automation. But ZigBee has little power. In fact, hardware using ZigBee for wireless other functions as well. The constantly increasing demand for communication often lasts for more several than 100 days transferring data wirelessly has made it lucrative for companies before requiring battery replacement. This is a feature that to do more with the technology they have created. There battery powered WiFi and Bluetooth devices never come close are sensors being built for health care [1], for controlling to, because these often last no longer than a few days, as electrical motors [2] and for large mesh networks that can illustrated in Table I [3], [7]. control entire buildings [3]. For instance, ZigBee Gateway enables transferring over the IP networks [1], thus creating endless possibilities for the technology. A. Device types Bluetooth is also a technology still evolving [4]. Version 4 of ZigBee is a standard that uses Personal Area Networks Bluetooth has seen the need for lower power consumption and (PANs) to create a connection between devices. Whereas a longer transmission range. This is a big threat against ZigBee technologies such as Bluetooth use a star topology to create and technologies alike, as Bluetooth already has a large market PANs, ZigBee can be used to create more complex topologies share when it comes to mobile phones and laptop computers. such as mesh or tree [6], [8]. For the ZigBee PANs to be Power consumption and transmission range seem to be the constructed, there are three main types of devices used by most important factors to compare for these technologies. ZigBee: the ZigBee coordinator (ZC), ZigBee routers (ZRs) In this paper we aim to survey the applicability of ZigBee and the ZigBee end devices (ZEDs) [9]. Together, these form technology for automation and remote metering, due to its the backbone of ZigBee PANs. The main difference between 2

Table I COMPARISONOFWIRELESSTECHNOLOGIES, BASEDON [7] AND [3].

ZigBee Bluetooth Wifi Bandwidth 250 kbps 1 Mbps >200 Mbps Current Draw 30 mA 40 mA 400 mA (transmission) Current Draw <0.1 µA 0.2 mA 20 mA (standby) Transmission Range 1-100 1-10 1-100 (meters) Battery Life 100-1000 1-7 0.5 – 5 (days) Figure 1. Digi International Inc. XBee-PRO, ZigBee module [11]. Network Size >64,000 7 32 (# of nodes) Application Monitoring & Cable Web, Email, control replacement Video one node to another until it reaches the root or an end device much like a router in an Ethernet switched network. It is the duty of the router to discover and connect new these device are that ZCs and ZRs are Full Function Devices nodes, be them other routers or end devices, to the network. (FFDs) that can forward frames from other devices and act This makes ZRs an important part of the network, and also as coordinators. ZCs and ZRs are also often line powered means that the routers need more storage capacity than the [10], which means they get their power directly from the end devices [9], [10]. The storage is mainly used for storing telecommunications circuit to which it is connected. ZEDs network information and to be able to hold multiple frames for are Reduces Function Devices (RFDs) which only can interact forwarding. Routers are often powered by the line instead of with a single FFD [6], [9]. ZigBee end devices can be sensors, batteries because of there storage capability, mainly to prevent switches or other automation equipment. loss of data. 1) ZigBee coordinator: The coordinator is the central 3) ZigBee end device: End devices are the part of the device of a ZigBee PAN and there can only be one ZC in a ZigBee technology made to be low-powered and low-cost PAN. Of the 16-bit network addresses in a ZigBee network, the [6]. These devices range from remote controls to sensors to address 0 is always reserved for the ZC [9]. This also implies lamp-switches and are battery powered, with a battery-life of that the ZC is the root of the network. The ZC administers several hundred days. a network, controls the ZRs and ZEDs and also controls the Unlike the coordinator and routers, ZEDs are RFDs and security of the network. can not forward frames in the network [9]. This means that The coordinator is in many cases connected to a computer ZEDs always work as leaves in a network, being the outermost as this makes the ZC more reliable as it makes it easier to devices. It should also be noted that ZRs can be used as ZEDs monitor and administer [10]. It also supplies the coordinator when the network can not handle any more ZRs [9]. with power which means that no batteries needs to be changed. There are two types of frames used in a ZigBee network, the normal frames that are used to transport data and the B. Hardware superframes, that are used by the ZC to administer and control the network. Superframes are used to send out beacons at a ZigBee hardware is manufactured by several companies predetermined time, to let every node get an equal time slice to including Atmel, Texas Instruments, Digi International Inc. send and receive data [6], [9]. This ensures that every node gets and Samsung Electro-Mechanics [12]. These companies to send data and that the network operates with low latency. manufacture hardware supporting the ZigBee stack, but there The use of superframes also allows the coordinator to sleep are also companies such as Panasonic that create hardware while the superframe is sent out [6]. The PAN can also work using the 802.15.4 standard. These devices do not support without superframes and lets the coordinator constantly be on, the ZigBee stack, mainly because their modules do not have that is, allowing it be able to receive data at any time. The enough flash memory [12]. These modules “run plain 802.15.4 control of the network is then done by the coordinator polling protocol, sometimes with a lighter wireless protocol on top” the network nodes and letting them know when there is data [12]. They often have their own stacks, for instance the to be received. The nodes periodically wake up and check if Freescale Z-Stack and the Ember ZNet Stack are used by there is data to be received. Panasonic [12]. 2) ZigBee router: Routers, like coordinators are FFDs and Modules vary in size, from being as small as a one euro are also able to forward frames from another nodes. They can coin (16.25 mm in diameter) as the one in Figure 1, to being be used as leaf nodes in the network, but are most often used larger than a 500 euro bill (160x82mm). The enclosure for the as nodes inside the network, controlling the ZEDs [9]. There devices can of course be much larger, and often include more can be multiple ZRs in a PAN and they can be connected to than just the ZigBee module itself. This is mainly because of other ZRs. Their main function is to pass on information from all the different applications ZigBee can be used for. 3

C. Bandwidth and range The maximum bandwidth of ZigBee is 250 kbps, making it not fit for large data transfers. ZigBee instead offers good sensitivity, range, and battery life, factors playing a major role when creating automation and remote controls. The true strength of ZigBee is not in its bandwidth, but in its range as well as its power consumption and in being able to create large networks from linking several nodes together and creating a mesh network. This extends the physical network range from a 100 m to several kilometers in the situation of not having to change batteries for hundreds of days [3], [13]. On the physical layer, ZigBee supports three different Figure 2. Illustation of a possible ZigBee automation and metering network frequency bands: 2450 Mhz (2.4 Ghz) with 16 channels, 915 [16]. Mhz with 10 channels and 868 Mhz with one channel. All bands use Direct Sequence Spread Spectrum (DSSS) access mode, but the 2.4 Ghz band uses Offset Quadrature Phase technologies meant for continuous data transfers whereas Shift Keying (O-QPSK) for modulation while 868/915 MHz ZigBee is only used when it is needed [15]. implements Binary Phase Shift Keying (BPSK) [6]. One of the main concerns when building automation There is however a problem with running ZigBee at 250 systems is reliability. When monitoring a home, it can be kbps, mainly because it uses the 2.4 Ghz spectrum. This is also crucial to know the temperature of the rooms and of the water, used by many WiFi routers today. These routers tend to disturb in order to optimize the fuel used to warm the house. When the ZigBee signal, making it almost unusable. The ZigBee creating systems for larger buildings, this becomes even more Alliance does see this as a problem, and has implemented an important as a flaw could cost the building owner a significant automatic channel selection method for using the best channel amount of money. This concern is taken into consideration available for ZigBee [14]. by ZigBee as it supports the creation of mesh networks. By placing sensors so that they are in range of at least two other III.APPLICABILITYOF ZIGBEE sensors it supports, redundancy and the network has a much ZigBee technology can be applied in several situations. lower risk of failing [15]. In this section, we discuss the three most common uses of ZigBee mesh networks also ease the administrators work by ZigBee: automation, remote metering and remote controls. All being easy to maintain. The option to configure the network of these have been shown to work well with ZigBee, reducing wirelessly is a great advantage of ZigBee as this enables power usage and cost as well as increasing the transmission administration of sensors inside water tanks and other places range. There are many more applications in which ZigBee can hard-to-reach [15]. be used such as toys, motion tracking and inventory monitoring As ZigBee is an open standard, there are many different [6]. manufacturers and vendors to choose from. This increased competition makes it easier to find the sensors and other hardware needed to create automation and remote metering A. Automation and remote metering networks such as the one in Figure 2. Where several sensors By automation in this paper we imply creating a system are connected to a ZigBee gateway that can communicate via that can automatically monitor and change the states of SMS to mobile devices. The sensors can collect data about various parts of a building, also known as home or building power usage, temperature and also control security sensors, automation. There are several other types of automation, but which is all stored in the gateway. The sensor data is then this is what suits ZigBee the best and is also the most common analyzed to optimize the for example the heat pump or to type of automation. To better understand the strengths of sound the alarm if an intrusion is detected. ZigBee, we first outline some of the current technologies used for automation. There are several issues to consider when building an B. Remote control automation system, such as range, delay and lifetime. One of As multimedia equipment such as televisions (TV) and the most common wireless technologies for home automation set-top boxes (STB) get more advanced, they also require more is X-10, a technology created in the end of the 1970s [15]. advanced ways to control them. For example the media player X-10 is best know for its simplicity and ease of use, but Boxee Box by D-Link has a keyboard integrated into its remote also contains many flaws, such as the lack of security, low [17]. Features such as this draw more power and require a reliability and slow speeds. With a transfer rate of a mere 60 longer range from remote controls. bps, X-10 does not support any meaningful data transmissions The most common type of remote controls today are the such as larger data transfers [15]. The reason technologies like ones using infrared (IR) light to send signals to devices. Bluetooth and WiFi are not used for automation is mainly However there are many drawbacks to this technology. There because of the high power consumption compared to ZigBee. is the issue of line of sight, which is required when using IR There is also the problem of Bluetooth and WiFi being remote controls and this also has an impact on the effective 4 range of the remote. IR remote controls also consume a lot of C. Data rate power compared to remotes that ZigBee can offer [13]. High data rates are not ZigBees strong suit. ZigBees The main difference between IR remote controls and maximum data rate is 250 kbps over the 2.4 Ghz frequency, remotes using ZigBee is that ZigBee is using a low-power making it suitable for communication between devices, but radio frequency (RF) to communicate with devices [13]. As the not for larger file transfers. Bluetooth can transfer data up to ZigBee Alliance points out, the battery life of a remote control 1 Mbps, making it easier to transfer larger files than ZigBee, using ZigBee technology can be as much as three time the one as can be seen in Table I. If comparing WiFi with Bluetooth, using IR, when using a traditional button remote. [13]. Another WiFi still draws alot more power, but is much more suitable feature that ZigBee can offer is two-way communication. This for larger data transfers and has a longer range. This makes means that TVs or STBs can communicate back to the remote. Bluetooth a technology lacking in both range and performance This can enable features such as viewing the tv guide on your compared to similar technologies. remote or enabling your remote to know exactly what channel you are watching and then change the button configuration D. Cost accordingly. Both ZigBee and Bluetooth aim at being low-cost technologies, but ZigBee manufacturers have still managed IV. A COMPARISON AGAINST BLUETOOTH to get their hardware significantly cheaper as can be seen in Bluetooth as ZigBee are both designed to provide Wireless several studies [3], [10], [18]. While the studies claim different Personal Area Networks (WPANs) using the same 2.4 Ghz pricing from each other, one can see that the price of ZigBee spectrum for their highest data rate. Both are also trying modules are most of the times half the price as modules using to achieve the goal of being a low-cost and low-power the Bluetooth technology. device. While Bluetooth is a widespread technology, used in laptops, mobile phones and tablets, ZigBee is still lacking E. Difference in use a large market share. Bluetooth however is still consuming The main difference between ZigBee and Bluetooth is that too much power to be applicable for devices needing to be they were created for different purposes. ZigBee was created battery-powered for several months, and this is where ZigBee for monitoring and controlling devices, whereas Bluetooth was might make a difference. One thing to note is that Bluetooth meant as a cable replacement technology [3]. Bluetooth has was originally created as a cable replacement technology, and somewhat succeeded in becoming the mobile phone industry ZigBees data-rate suggests that it is made as a technology standard for wireless data transfers, but WiFi is a strong to enable automation, remote metering and remote controls competitor as its hardware is getting cheaper and draws less [3]. In this section we describe the main differences and power. ZigBee technology is not used as much as Bluetooth, similarities of the two technologies, and determine what is as it is not ment for data transfers and has less applicability on technology best to be used in certain circumstances. the mobile device market. ZigBee has still proven that it can handle a range of tasks such as monitoring greenhouses [18], A. Power electric motor rotor measuring [2], RFID reader networks [10] and remote controls [13]. As we can observe in Table I, ZigBee draws significantly less power then Bluetooth. This is mainly because ZigBee devices can go to sleep for long periods of time, whereas V. DISCUSSIONANDCONCLUSIONS Bluetooth is trying to provide a fast response time, by being Despite being the less obvious choise when creating awake for much longer periods [3]. This makes it possible wireless networks, ZigBee still manages to offer many of the for some ZigBee devices to run for over three years on two features of both Bluetooth and WiFi, and also includes features AA-batteries, in excellent conditions. Bluetooth however is not included in other wireless technologies. ZigBee is one of only able to run for about a week on the same amount of the leaders in low-cost, low-energy wireless transmitters when power. comparing price and battery life. ZigBee has also proven that it can do much more then just automation. There are several applications that still can be explored such as large scale traffic B. Range control using ZigBee devices as the transmitters of sensor data The range difference of devices using the two technologies and remote controls of robots. is fairly large. Bluetooth is able to connect devices at 10 The biggest problem ZigBee currently has consists in the meters away; ZigBee devices can transmit data for up to a low transfer rate. Whereas WiFi can transmit data over 100 100 meters as can be seen in Table I, but also are able to Mbps, ZigBee only manages to establish 250 Kbps links; this create mesh networks which can, then, function over much is a problem for future ZigBee applications, which will most greater distances. This is because mesh networks link devices likely require higher speeds. Another problem refers to the together, forming a larger network with several nodes. Often use of the 2.4 Ghz network for transmitting at higher rates. one node is connected to two or more nodes, this giving the This frequency band is highly sensitive to disturbance from not network the ability to choose an alternative route in if one only other network equipment, but also from some electrical node was to fail. devices such as microwave ovens. 5

Despite these problems, ZigBee can be used in many [13] Z. Alliance, “Advantages of energy-efficient zigbee remote situations, the main one being automation and remote controls,” White Paper, Januari 2011, [Online; accessed 22-February-2011. [Online]. Available: http://www.zigbee.org/imwp/ metering. Here, qualities outweigh the problems as the low download.asp?ContentID=19575 transfer rates are not a problem and longer range is often a [14] D. Egan, “Designing a zigbee network,” ESS 2006, 2006. [Online]. must. When building new buildings, companies should look Available: https://intranet.ee.ic.ac.uk/t.clarke/projects/Resources/zigbee/ zigbee%20data/zigbee-network-design-and-performance.pdf into integrating ZigBee technology into their control systems, [15] Schneider-Electric, “Wireless controller networks for as it is fairly easy to maintain, as well as keep costs down. The building automation,” White Paper, June 2006. [Online]. ability to create mesh networks is one of the main strengths of Available: http://www.tac.com/data/internal/data/08/99/1262115381268/ Wireless+Controller+Networks.pdf ZigBee as it can remedy broken nodes itself, without the need [16] F. semiconductors, “Ieee R 802.15.4 technology from freescale,” of a network administrator. This is of course depending on Brochure, 2010. [Online]. Available: http://www.freescale.com/files/ how the network is built, but if built according to the proper wireless comm/doc/brochure/BRZIGBEETECH.pdf [17] Boxee Inc., “Boxee box by d-link,” Website, 2010, [Online; accessed ways of building mesh networks [3], [6], each node in the 24-February-2011]. [Online]. Available: http://www.boxee.tv/buy network should in range of at least two other nodes. [18] Z. Qian, Y. Xiang-long, Z. Yi-ming, W. Li-ren, and G. Xi-shan, If used in the right situations, ZigBee can be a good “A wireless solution for greenhouse monitoring and control system based on zigbee technology,” Journal of Zhejiang University SCIENCE alternative to other wireless technologies as it can keep cost A, vol. 8, no. 10, pp. 1584–1587, 2007. [Online]. Available: low and does not require as much maintenance as many other http://www.springerlink.com/content/t562v78465802v25/ technologies due to its self-healing properties and its easy maintainability.

REFERENCES

[1] ZigBee-Alliance, “Understanding zigbee gateway,” White Paper, September 2009. [Online]. Available: www.zigbee.org/imwp/download. asp?ContentID=16883 [2] V. Sarkim¨ aki,¨ R. Tiainen, T. Lindh, and J. Ahola, “Applicability of zigbee technology to electric motor rotor measurements,” in Power Electronics, Electrical Drives, Automation and Motion, 2006. SPEEDAM 2006. International Symposium on, May 2006, pp. 137 – 141. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=1649759 [3] W. Guo, W. M. Healy, and M. Zhou, “Zigbee-wireless mesh networks for building automation and control,” Networking, Sensing and Control (ICNSC), pp. 731 – 736, April 2010. [Online]. Available: http://ieeexplore.ieee.org/xpl/freeabs all.jsp?arnumber=5461566 [4] Bluetooth, “Sig introduces bluetooth low energy wireless technology, the next generation of bluetooth wireless technology,” December 2009. [Online]. Available: http://www.bluetooth.com/English/Press/ Pages/PressReleasesDetail.aspx?ID=4 [5] P. Kinney, “Zigbee technology: Wireless control that simply works,” White paper, ZigBee, October 2003. [Online]. Available: http://www. zigbee.org/imwp/idms/popups/pop download.asp?contentID=5162 [6] P. Barontib, P. Pillaia, V. W. Chooka, S. Chessab, A. Gottab, and Y. F. Hua, “Wireless sensor networks: A survey on the state of the art and the 802.15.4 and zigbee standards,” Computer Communications, vol. 30, no. 7, 2007. [Online]. Available: http://cone.informatik.uni-freiburg.de/ lehre/seminar/adhoc-s08/RoutingMAC/zigbeeSurvey.pdf [7] J. M. Wilson, “The next generation of wireless lan emerges with 802.11n,” Technology@Intel Magazin, August 2004. [Online]. Available: http://www.ncs-in.com/downloads/ieee%20802.11n.pdf [8] M. Maupin, “ZigbeeTM: Wireless control made simple,” Presentation at Wireless Mobile WorldExpo 2005, 2005. [Online]. Available: http://www.wowgao.com/2005wirelessandmobile/PDFfiles/ 2005/presentation files/MattMaupin FreescaleSemiconductor.ppt [9] L.-H. Yen and W.-T. Tsai, “Flexible address configurations for tree-based zigbee/ieee 802.15.4 wireless networks,” in 22nd International Conference on Advanced Information Networking and Applications, March 2008, pp. 395 – 402. [Online]. Available: http://ieeexplore.ieee. org/xpls/abs all.jsp?arnumber=4482734 [10] V. L. Sheridan, B. Tsegaye, and M. K. Walter-Echols, “Zigbee-enabled rfid reader network,” Silicon Valley, Tech. Rep., 2006. [Online]. Available: http://www.wpi.edu/Pubs/E-project/Available/ E-project-041706-150556/unrestricted/ZigBee Enabled RFID Reader Network Report.pdf [11] D. I. Inc., “Xbee-pro R 802.15.4 oem rf modules,” Internet, [Online; accessed 20-February-2011]. [12] Wikipedia, “Comparison of 802.15.4 radio modules — wikipedia, the free encyclopedia,” 2010, [Online; accessed 20-February-2011]. [Online]. Available: http://en.wikipedia.org/w/index. php?title=Comparison of 802.15.4 radio modules&oldid=405154434 Fredrik Rantala 31820 Network Software 2011 paper Cyclic Redundancy Check (CRC)

Abstract revolutionizing idea was and what it did and still does to our everyday life among information technology. In 1961 William Wesley Peterson invented the Cyclic Redundancy Checking. This method is a simple In the second chapter I will present what CRC exactly method for calculating a check code that always has is, what it does, and the up- and down-sides of the the same length no matter how much data this technology itself. We will also present what CRC is not checksum is calculated for. This checksum for sent and what it cannot be used for. data is sent as a part with the data, so the receiving The third chapter will focus on separating the two device can decode the checksum and decide if the main CRC methods, software and hardware based received data is error free with a high degree of ones, and we will briefly describe these and security, and if the received data has errors, ask the demonstrate their main differences. sending device to send the data again until the received data corresponds with the checksum, In the fourth chapter I will concentrate on different making the data error free. This is today basis for our methods and algorithms that are connected to CRC, working networking infrastructure; error checking is there are literarily tens of these, so I will only briefly needed, because no-one is error free, not even our present a few most important ones. computers and their peripheral devices. The fifth chapter will summarize up my report to a conclusion what I have found out so it is easily and fast readable for a fast skimming through. This is also the Introduction last chapter of this document, before the sixth chapter In today’s world computers and networks between that holds the in-text references to websites and them have become more and more common, and literature used for this paper. these networks are used to electronically transmit data between computers and their peripheral devices. Tough computers and electronic equipment are far The beginning more exact in their function than humans they are also less tolerable for errors, and this is a problem Computers and electronic equipment started to widely because electronic devices and the networks between spread themselves in the 1960’s, and that meant that them are not error free. This means that errors can the amount of data transferred was to become larger and eventually will occur in data transmission, and this and larger over time coming. This resulted in a need of is a big problem, because we put our faith in electronic error checking for the data, because electronic is not equipment doing our job exactly and reliably. error free at all, as many of us seem to think, and the problem is that electronic equipment cannot utilize In this paper we will present the method of creating a data that contains errors at all, so the data has to be CRC checksum, we will discuss the usefulness and error free, or it is in most cases completely useless [2]. problems around Cyclic Redundancy Checking, Even a one bit error in a billion bits data stream could hereafter known as CRC or CRC check, and we will result in a non-working program or the problem could debate mostly theoretically about whether this face a snowball effect, and a small problem in the method is useful and in that case to what extent the beginning could result in a massive corruption of data method is useful. in the end and this would in its turn again result in huge problems everywhere rendering our automated In the first chapter of this document I will describe data processing equipment useless. This is why the where the need came for a technology like CRC, why it need arose to implement error checking in data exactly was needed, to what it was needed and what handling circuitry to get rid of the errors and the would happen in the long run without CRC. Also the indirect consequences that errors cause. inventor of CRC will be briefly presented and what his Fredrik Rantala 31820 Network Software 2011 paper

Luckily William Wesley Peterson invented CRC in 1961, Theory a rather simple algorithm that could practically make a small string of bits to every data stream so that the CRC calculation bases itself on calculating the receiving device could verify by itself that the received difference out of two polynomials that are predefined, data is correct, just by reversing the calculation that or more precisely calculating the remainder difference the sending device did implement as a few bits to the of the two polynomials that could be of the form x3 + end of the data transmitted. x + 1 and x4 + x3 + x2 + x [5 page 247] and comparing this remainder to the one reported by the sending What is CRC? device. If the receiver devices calculated remainder and the sending devices remainder differ, the receiver Cyclic Redundancy Check is a method of creating and re-requests the data from the sender. This automatic adding a small checksum to the end of the data block, comparison and automatic re-requesting of the either by software or hardware coding. [4] This possibly corrupted data makes CRC together with the checksum is then read at the receiver and calculated hardware and rest of software to some extent an error back against the rest of the data in the data packet. If correcting system, but it doesn’t correct it’s errors this calculation matches, we have a very high itself, rather it requests the corrupted data to be probability that the data is matching, but depending resent. on which CRC method used, there is still room for error, because the CRC doesn’t represent the whole data block, but it is only a product derived from the data block, so a small chance still exists that there is a What CRC is not corruption in the data, tough the CRC checksums We must keep in mind that CRC is not an error match. [1 page 2]. So to keep in mind, the CRC block is correction method or error correction code (ECC), it not the whole computation, it is rather a figure only detects errors, and sends a negative derived from the data package. acknowledgement or acknowledgement to the sending device, thus accepting the data or rejecting and requesting resending the data. So CRC is not any algorithm that repairs errors, it just detects errors with fairly good percentage, and acknowledges them to the sending device and then requests the sender to send the data again.

Imagine that you would have to manually check every text document or whatever file when you transfer them. Yes, a half page text document would be an easy case, but imagine checking all the pictures with bare eye in your hard-drive after downloading them from your digital camera, or even worse, doing this for a game’s data files that are compressed in every way, that’s something that most of us wouldn’t be up to do, and it would consume a great deal of time. Instead the computer now does this automatically to keep our data in a shape as we like it.

FIGURE 1 Diagram of CRC calculation [6 page 198]

Fredrik Rantala 31820 Network Software 2011 paper

Alternatives to CRC Software CRC means that we will implement a calculation, one of the different CRC methods like CRC- CRC has become the most used and widespread error 1 (parity check), CRC-16, CRC-32 or other even more checking technique due to that it is fairly easy and complex algorithms to calculate a checksum by the simple to implement, but this does not mean that the sending device circuitry. CRC has not got any alternatives or competitors, if wished to call them so. Hardware CRC means that we will implement a circuitry that processes the data and adds a checksum An alternative to CRC is to calculate a simple checksum to the end by direct hardware electronic means rather for the data amount; MD5 is a fairly known than software. These electronic means involve of implementation of this method. MD5 involves a course Xor (exclusive OR) transistor circuitry that program that calculates a checksum with an algorithm, processes the data. This has the disadvantage that the but this checksum is not necessarily transmitted with circuits can be very complex and much slower than the data itself, but only reported as a separate software, but again more reliable and error resistant. number, that the user itself has to use a checking program to verify that the received data is not corrupted.

Another alternative to CRC is parity check, this is maybe the simplest method of error checking for data, and implemented in most data transmission hardware, already from our ancient COM ports in our computers, FIGURE 2. CRC Hardware to simple analogue telephone modems, network cards in computers and also in today’s DSL modems. Parity check involves adding one bit to the end of every Different methods seven bits of data, and this bit tells the receiver how many ones (1) the seven bits of data includes, and if The choice of these methods depends basically what the seven bits data amount of ones does not match to hardware we have to utilize, we cannot execute the parity check, the data is corrupt. [5 page 246] complex algorithms with a slow processor, that would create a significant bottleneck in the system, thus

slowing down the entire system, but we cannot either compromise the integrity of data sent in bigger parts, Software or hardware CRC so we always have to choose a compromise between The consistency checkup of the data can be done by data integrity and how much money and processor two main means [3], either by computer software that time is spent on the checkup. performs the checkup, or then by hardware encoding and decoding the transferred data. Both of these two This is why we have CRC-1 for hardware based CRC, offer advantages and disadvantages, and the usage of this is a simple parity check, and applies one bit to the these two methods depends of course most often on end of every eight bits, and this parity bit is simply monetary issues and also on practical issues and the calculated together from the eight “main” bits of the goal of the check itself, is the check to be only a fast data. Parity check is the most common widespread and the simplest technique used in hardware for error check, or a in depth checkup. checking. This technique is considered rather reliable Of course then at both the software and hardware and fast to execute and doesn’t require big circuitry to side we have different implementations, the hardware be built. side offers different circuits that we can create to get a CRC check, and the software side offers a possibility to create different mathematical algorithms to do the same operation for various purposes. Fredrik Rantala 31820 Network Software 2011 paper

algorithms and a couple different competing approaches like parity checking and checksum like MD5, the harsware side is somewhat limited to only the simple circuitry. CRC can be a great timesaver and also spares the user from checking the data consistency manually, this would be a time consuming task in most cases, and even impossible in bigger cases. A one page raw text document isn’t a too big match, but a big game on one’s hard-drive might be too much to tackle for anybody. To remember is that CRC is not an error correcting code, it only detects errors, no matter if its software or hardware based, for this we have invented error correction circuitry for servers (ECC) etc.

FIGURE 3. Parity bit calculation

On the software side there are a huge number of CRC algorithms with varying length and complexity suited for most data packages, e.g. one for USB data transfer, References GSM protocol uses 40bit CRC, Ethernet uses 32bit along with Serial Ata inside computers, not to forget 1. Norman Matloff Dept. of Computer Science that there are several variations of the same bit length University of California at Davis 7.9.2001 CRC. CRC 16 and 32 are probably the most commonly http://frogchunk.com/documentation/netwo used on the software side [4 page5]; this means that rk/theory/CRC/CRC.pdf they add a bit to every 16 or 32 bits of data. Of course 2. Ritter, T. - The Great CRC Mystery. Dr. Dobb's there are many more unmentioned CRC algorithms Journal of Software Tools. 11.2.1986 p. 26-34, around, but here is just a few to mention the most 76-83 commonly used ones. http://www.ciphersbyritter.com/ARTS/CRCM YST.HTM 3. Texas Instruments - Patrick Geremia Application report SPRA530,1999

http://focus.ti.com/lit/an/spra530/spra530.p Conclusion df 4. Addison-Wesley, 2003 - Hacker's Delight Now we can finally state that CRC is not just one chapter 14 CRC method for checking the consistency of transmitted http://www.hackersdelight.org/crc.pdf data, it’s rather a bunch of methods collected under 5. Tannenbaum – Computer networks Fourth one name that we refer to when having data checked edition for errors. Earlier I presented the need and and the 6. Forouzan - Data communications and inventor of the methodology, after that a bit of theory networking Third editon about CRC and how CRC can be divided into two sections; Software and hardware side, whilst the software side has many different implementation

IPv6 (February 2011)

Alexander Pchelintsev

Abstract - In this paper we discuss the sixth version of the Address space problem IP protocol (IPv6). This protocol was developed in order to replace IPv4 – the version of IP already obsolete for a long The most serious issue of IPv4 is the address space time, although still widely used all over the world. Currently, exhaustion. It is not a secret for anybody that in near future IPv6 is barely used and it is not a standard, but the transfer to there will be no free IPv4 addresses available for allocation. IPv6 seems inevitable in the near future. We consider the most On the one hand, the amount of Internet users increases important aspects of that transfer. First of all, we put forward exponentially every year. Applications requiring huge amounts the reasons for transferring by discussing the following of IP addresses are becoming more common. On the other questions. What problems of IPv4 might force us to drop that hand IPv4 uses 32-bit address identifiers which, in theory, protocol for some other solution? How can IPv6 help to mean that there can be 4.3 billion addresses allocated. In overcome those problems? In order to answer these questions practice however, class-based allocation implies that only 0.5- we explore the basics of IPv6 and compare its structure to 1 billion addresses are in use [1]. There are special additional IPv4. Furthermore, we investigate the key factors that can technologies and protocols developed in order to save the IPv4 make the process of transferring to IPv6 smoother. space, such as Network Address Translation (NAT). However, it is obvious that large technology-intensive countries such as China, Japan or USA will not get enough addresses in the I. INTRODUCTION future. The discussions about the date of complete address exhaustion of IPv4 took place in 1994 for the first time.

Currently, the most optimistic estimation is 2020 [1]. he demand for a successor of IPv4 is existing for a

long time. We will discuss the most possible and T Architectural drawbacks of IPv4 obvious candidate on this role – the 6-th version of the IP protocol (IPv6). In this paper we will first of all summarize the IPv4 was developed in early 1980’s (Request for Comments existing reasons convincing us that IPv4 is obsolete and IPv6 (RFC) 791 in 1981) and its concepts did not change much is what will substitute it. After that we will briefly review the since. The protocol was developed in order to be as simple as main concepts of the IPv6, observe its address types and the possible. Its main purpose was to connect/build ordinary end- functions, provided by the protocol. Almost everywhere we to-end networks. For modern networks with complex will provide the comparison with the similar features of IPv4. topologies and enormous amount of nodes, bare IPv4 is A special part of the paper is devoted to the security issues of obviously not enough. Currently, IPv4 is used together with a the IPv6 and another one to the interoperability of the two variety of additional protocols and technologies installed with versions of the Internet Protocol. [Finally, we will spot some it, for instance, DHCP (used for dynamic configuration of the light on how IPv6 can support mobility and what issues it computers so they can communicate in the network), ARP might face.] (Address Resolution Protocol), NAT etc. These additional protocols generate significant mutual II. PROBLEM FIELD OVERVIEW compatibility overhead that might even make some of them useless. Thus, the initially light-weight, conceptually simple For several decades the IPv4 protocol was used for building and transparent IPv4 has become heavy and badly structured. efficient and flexible networks. This protocol is the In fact, the basic idea of simple end-to-end communication is cornerstone of public and commercial networks as Internet is lost due to the introduction of NAT, due to which, two based on the IPv4 protocol. However, nowadays we witness an communicating nodes do not see each other directly. extremely fast developing ICT sector and rapidly expanding Global Networks. Hence, we have to face the problems that IPv6 as a possible solution IPv4 is not able to solve. The protocol has a variety of drawbacks caused by different kinds of reasons. In the Considering these problems, it is reasonable to propose a following we investigate some of these problems. new protocol which would unite the features of IPv4 with those in the additional protocols. In addition, this new protocol

must have a large enough address space and support some method of interoperation with IPv4. Address Notation IPv6 was created in order to fulfill these requirements. IPv6 was first proposed in the early 90’s as a successor of IPv4 The hexadecimal system is used for address representation. (RFC 1752/1883/2460). Contrary to the idea of simplicity of The pairs of bytes (commonly denoted “blocks”) are separated IPv4, IPv6 has been developed to be as universal as possible, from each other by the colons. The hexadecimal digits (half- encapsulating the various features that the plain IPv4 is bytes) are called “nibbles” [2, p.21]. An example of an IPv6 lacking. The second significant difference between the two address is: versions of IP is the more complex hierarchical structure of fe80:0000:0000:0000:020c:f1ff:fefd:d2be IPv6 that we discuss in details shortly. Besides, IPv6 is totally The block of zeroes (:0000:) can be written as a single digit: suitable for mobility issues and it is more secure than IPv4. At fe80:0:0:0:020c:f1ff:fefd:d2be the same time, there are a number of convenient and efficient Any sequence of zeroes in an address can be reduced to the approaches of managing IPv4/v6 interoperability, making the double colon. For instance, the previously mentioned sample transfer to the new protocol possible. address can be reduced to the following notation: fe80::20c:f1ff:fefd:d2be Premises of transferring to IPv6 However, such a reduction can be made only once in order to avoid ambiguities. IPv6 is yet not a de facto standard. However, the leading software companies and Internet Service Providers have Header format already started taking actions related to the transfer to the new protocol. IPv6 has a slightly different header format than IPv4. The It is worth mentioning the activity conducted by Google, the main aim of the changes made was the reduction of the system world-leading provider of online services and the most popular information overhead transmitted with each package. Thus, search engine. The company invests a lot into developing its IPv6 distinguishes the main header that is compulsory from the own IPv6 infrastructure due to obvious reasons: maintenance extension (optional) headers. The obvious advantage of such of world-leading company image in case of an address space an approach is the simplification of the job of routers that only collapse; making sure that its own growth, which requires an process the main headers. Considering IPv4 protocol, the enormous number of addresses, is protected from the problem; routers have to parse all the header options along with the and anticipating the profits from the burst of interest to IPv6. main information, whereas the options are really needed to be However, Google does not intend to harm the existing IPv4 parsed only at the final nodes. Hence, IPv6 approach with its services, so the technologies do not intersect. Google has a simplified and fixed-size header (40 octets) increases network separate domain, bandwidth and routing for IPv6 services [6]- bandwidth and performance. Moreover, the big amount of the [7]. Of course, many other big companies providing dense, additional and optional parameters boosts the flexibility of the traffic consuming services (Facebook, NTT Communications) protocol. are also working on the problem [8]-[9]. A very important step, somewhat a proof of concept, will be June, 8 2011 - the “World IPv6 Day”. On that day all the IPv6 Version Priority Flow label (Quality of services of the companies listed above and many more will be Service protocol support available for the public use for 24 hours. This will hopefully field) prove that even though the era of IPv6 is in its infancy state, Payload Next Header Hop Limit (TTL the protocol is viable and efficient [u.l.1]. Length (Protocol field replacement) replacement) Source address Destination address Table 1. IPv6 Header format [1]. III. IPV6

Address basics The following types of extension headers are currently The biggest advantage of IPv6 consists in its huge address known: space, practically unlimited. This feature is provided by the • Routing – the header containing the information about the fact that IPv6 has 128-bit length address identifier, which is whole route from the source (Source Routing); four times longer than the address identifier’s length in IPv4. • Fragmentation – contains the information about package This outstanding property enables convenient features in IPv6, fragmentation, processed at the final nodes only; such as hierarchical management of the address space and automatic configuration of Internet devices.

• Authentication – used to maintain the package integrity and The most common application of the anycast address is in the data needed for authentication of the destination and the improving the routing by assigning it to a set of routers. source nodes; Including the anycast address in the Routing Header (possibly • Encapsulation – the header that provides the needed recursively) would enhance the process of routing significantly information for transmitted data privacy maintenance by [2]. means of encryption / decryption; Currently this type of addresses is not widely used due to a • Hop-by-Hop Option – special parameters that are used when number of complications and restrictions such as the problems the Hop-by-Hop algorithm is applied to the transmitted data; of forwarding anycast-sourced packets. • Destination Options - additional information for the destination node. Multicast addresses

Unicast addresses Multicast addresses are also used for one-to-many communication support. Similarly to anycast The unicast type of address is the one most resembling the one multicast address belongs to a group of regular IPv4 address, representing an interfaces. However, contrary to anycast the interface in the network [2, p.25]. The packet sent to a multicast address will be main conceptual difference is that IPv4 delivered to the whole group. Multicast was allows only one real address for each supported in IPv4. IPv6 extends multicast address range interface (excluding the virtual addresses, aliases etc), whereas introducing scopes of address validity. Multicast routing is the IPv6 allows an arbitrary number of addresses assigned to a cornerstone of the self-configuring network services and “intelligent broadcast” services like “Internet Radio” [1]. single interface. The address is evenly divided between the

Subnet Prefix and the Interface ID (64 bit for each). Such a IPv6 and DNS fixed division simplifies the address configuration and allows abandoning the net masks. There are several kinds of unicast Before transferring to IPv6 completely the problem of its address scopes supported by IPv6: incompatibility with Domain Name System (DNS) should be • Local scope solved. o Link-local Unicast Addresses; DNS assigns symbolical representation to IP addresses, thus o Site-Local Unicast Addresses; letting the user not care about digital address representation a • Global scope – when the unicast address identifies the lot. DNS-servers are responsible for the conversion and they interface uniquely in the whole network. hold the domain data inside the records with the information about the domain names, including the assigned IP addresses. There are some special types of unicast addresses as well: All the existing types of records before introduction of IPv6 • Loopback address (0: 0: 0: 0: 0:0:0:1 or simply ::1) which refers to 32-bit addresses (A-type records), automatically has basically the same functions as for IPv4 (test functions, becoming incompatible with the new version of the protocol. ability of interface to send packets to itself). In order to store 128-bit IPv6 address AAAA-type record was developed. The address itself is kept inside the There is also a loopback address type (::1) with basically information part of the record. For the symbolic representation the same functions as IPv4 loopback addresses, unspecified the special ip6.int domain was developed. So the name of the address (0:0:0:0:0:0:0:0) for communication with the neighbor address is stored as a set of symbol blocks separated by dots ending with ip6.int suffix. In order to keep IPv4/IPv6 nodes before the certain address is acquired by an interface. compatibility, the clients and the servers should support some

certain properties. For instance, server must keep both A and Anycast addresses AAAA records for every assigned domain name. This and some other major incompatibilities imply that DNS should be Anycast addresses are used to implement the one-to-many significantly reconsidered. communication. Basically, an anycast address is

created by assigning the same unicast address to Security in IPv6 several interfaces (it is important that the change of address status must be announced). A In order to provide enhanced security in network operations, package sent to the anycast address will be the Internet Protocol Security (IPsec) is used. IPsec provides received by the nearest interface that has the specified anycast transparency, integrity and confidentiality for end-to-end address. communications [5]. IPsec consists of a suite of open An anycast address consists of subnet prefix (n bytes) protocols whose main goal is to provide the communication followed by the sequence of zeroes (128 – n bytes). The n-bit security on the network layer, which automatically means subnet prefix identifies a topological area where all the group security on all the upper layer protocols. members belong to the same anycast address [1, 34]. IPsec can operate in two different modes:

• Transport mode This implies the encryption of the whole hosts. There are many implementations of IPv6/IPv4 package except the header. Thus, the end-to-end tunneling. communication can be performed with the headers readable; • Tunnel mode The gateway is used to encrypt and wrap the Dual stack original package into the new one. This approach allows the encryption or authentication of the whole package (but the The most obvious way to provide interoperability between need for the gateway is a drawback). two versions of IP is to make sure that the node supports them both. Thus, the node that implements both IPv6 and IPv4 is IPsec can be used in IPv4. However, due to the simplicity called a dual stack system. Such systems utilize the appropriate principles of IPv4, it is hard to include IPsec in IPv4 naturally protocol depending on the requirements of the communicating – some bridge mechanisms and technologies are needed. On peer (e.g., if the peer supports IPv4 only, then IPv4 is used, the contrary, IPv6 embraces IPsec easily because of its flexible otherwise it prefers IPv6). The biggest drawback of this architecture that is extension friendly. approach is that it does not work when there are no IPv4 Overall, the concept of the new protocol improves the addresses available anymore. Hence, this is a good transferring security of the networks significantly. The upgraded solution from IPv4 to IPv6; the method that does not involve robustness is achieved by means of following important IPv4 must prevail in the future. features: • Network Address Translation becomes unnecessary. Even Protocol translation though it is proved to be rather useful, NAT is a reason of exponentially increasing complexity of network applications. The protocol translation approach refers to the process of Also, it breaks the natural principle of the end-to-end translating the packets of one protocol version to the other one, communication of the IP networks; and vice versa. There are several ways of implementing • The package header is simplified and fixed-size; protocol translation: • • Abandoning the fragmentation of the IP packages; The usage of gateway protocols between IPv4 and IPv6; • • The virtually infinite number of addresses makes it High-level translation via proxy-server; • impossible to scan networks with “Brute Force” which is one Transport level translator processing the headers of the of the approaches used by malware to detect victims. incoming packet.

IV. INTEROPERATION WITH IPv4 IV. CONCLUSION

Tunneling In this paper we have revised the basics of the IPv6. The topic is highly actual nowadays due to the inevitable exhaustion of In order to understand what tunnel actually is we can the existing standard – IPv4. We have explained the reasons of consider two protocols: why it should become the main network protocol in the future. • Protocol P1 – network layer protocol. We gave the brief comparison of it and IPv4. We have paid • Protocol P2 – arbitrary protocol which is not a link-layer significant attention to the interoperation issues. The reason is protocol. that IPv6 is certainly more flexible, architecturally perfect and A tunnel is a mechanism that allows P1 to use P2 as a link- feature rich version of IP but IPv4 will not disappear for a long layer protocol. Usually tunnels utilize some network layer time. The specialists predict an era of coexistence of the two protocol as P2. In such cases protocol used as link-layer (P2) versions of the Internet protocol. Nevertheless, we certainly is called an outer protocol and P1 is called inner protocol [2]. can expect a variety of innovations related to the rise of IPv6 in earliest future. Tunneling is a simple and flexible approach for connecting networks when utilizing different versions of IP. The main REFERENCES advantage of this method is that it requires minimal changes in [1] Benedikt Stockebrand. IPv6 in Practice. Springer, 2007. the network framework. The tunnel can be built over the IPv4 [2] Youngsong Mun, Hyewon K. Lee. Understanding IPv6. Springer, protocol in order to reach the IPv6 host or it may be generated 2005. over the IPv6 network. The latter method is considered to be [3] http://isoc.org/wp/newsletter/?p=2902. [4] http://en.wikipedia.org/wiki/IPv6 the final step of the transfer to IPv6. [5] http://www.faqs.org/rfcs/rfc2460.html Tunneling is also a good approach for starting the switch [6] http://www.cisco.com/web/about/ac123/ac147/archived_issues/ipj_10- from IPv4 to IPv6, because it does not affect the existing IPv4 2/102_ipv6.html framework. Moreover, it allows internet service providers to [7] http://cert.inteco.es/extfrontinteco/img/File/intecocert/EstudiosInformes/ce create prototypes of the real IPv6 network easily. In addition, a rt_inf_security_implications_ipv6.pdf tunnel over IPv4 can act as a bridge between isolated IPv6 [8] http://www.google.com/intl/en/ipv6/

[9] http://www.ietf.org/proceedings/73/slides/v6ops-4.pdf [10] http://www.facebook.com/notes/facebook-engineering/world-ipv6-day- solving-the-ip-address-chicken-and-egg-challenge/484445583919 [11] http://www.ntt.com/ipv6_e/data/e_about_com.html

1

Context-aware systems

Jessica Laukkanen Network Software Course Åbo Akademi

Spring 2011

 II. TERMINOLOGY Abstract—Mobile computing is becoming more and more There is no clear definition for what context is. Dey [1] popular. People are no longer using computers by their desktop only, but also in very different surroundings while on defines context as "any information that can be used to the move. This raises the need for varied information about characterize the situation of an entity. An entity is a person, the environment of the users (for example, their location). place, or object that is considered relevant to the interaction Systems that can take these environmental factors into between a user and an application, including the user and account are called context-aware. In this paper we present an applications themselves". Jrad, Aufaure & Hadjounil [2] overview of context-aware systems with examples from the define context as additional, often implicit, information, tourism domain. which makes it possible to fully understand a situation,

Index Terms—context-awareness, mobile computing, such as an interaction or communication. Context can mobile tourism include location, time, user aspects (such as interests, nationality, gender), surroundings (e.g. indoor/outdoor, weather), social aspects (e.g. are there other people around), activity, device and cognitive aspects (e.g. the user's state of mind) [3]. I. INTRODUCTION The tourism industry is an information-intensive industry, A context-aware system is a system that takes contextual which makes it ideal for context-aware systems. Context- information into consideration. Context-awareness was first aware systems can help tourists find relevant information at introduced by Schilit and Theimer in 1994 [4]. Dey [1] has the right time with little effort. Tourists are also a defined context-awareness in the following way: "A system heterogeneous group, which makes tourism very suitable is context-aware if it uses context to provide relevant for context-awareness, as different tourists are interested in information and/or services to the user, where relevancy different things. However, many of the existing mobile depends on the user's tasks". A context-aware system tourism applications today only make use of one aspect of should only use relevant contextual information, e.g. for an context: location. indoor guiding system the weather outside is not relevant, although it can be classified as contextual information [1]. In this paper, we present an overview of context-aware This means that context-aware systems cannot only gather systems are and main design issues. First, in section 2, we data, but also have to take into account that some context provide definitions for context and for context-aware data is not needed and some data might be incomplete. systems and then, in section 3,discuss what kind of contexts Context data can be collected through various sources, for are important in tourism applications. In section 4-6, we example sensors connected to the device, fetching data explain how contextual information is gathered, how through networks and receiving input from the user. Since context is modeled and what issues there are when the devices are very mobile nowadays, the system also has to contextual information is interpreted. take into account that the context is continuously changing [5].

2

III. CONTEXT IN TOURISM APPLICATIONS IV. GATHERING CONTEXTUAL INFORMATION As mentioned earlier, the tourism industry is A context-aware system can gather contextual information-intensive. This means that tourists require information in many ways. Indulska & Sutton [9] have information on a large range of services, for example divided data sources for contextual information into three accommodation, transport, and destination. A more different groups: physical sensors, virtual sensors and thorough view of information that a tourist might need is logical sensors. Physical sensors refer to hardware sensors, illustrated in Table 1. such as a gps-module in a mobile device. In Table 2 we show some further examples of physical sensors, such as Table 1: Tourism information services [6] cameras for detecting visual context, accelerometers for detecting motion and microphones for detecting audio. Category Examples Virtual sensors refer to obtaining contextual data from Accommodation Hotels, hostels, camping other applications or by checking the user's activity, for Emergency, safety Medical services instance in the form of mouse movements or keyboard Gastronomy Restaurants, pubs, clubs input. An example of using virtual sensors is getting Navigation Maps, road conditions information about the user's location or activity through News Political news, business calendar events, e-mails or a travel booking system. news Logical sensors combine data from multiple sources, for Practical Tourist information office, example combining data from physical or virtual sensors information car rentals, cash points, with additional information from databases [ 9, 10]. currency exchange Shopping Gifts, souvenirs, clothes Sports Hiking/skiing/biking routes/tours, Spectator sports Table 2: Physical sensors for detecting context [11] Tourist attractions Museums, sights, churches, Type of context Sensors amusement parks Light Photodiodes, colour sensors, Transport Flights, buses, trains, boats infrared sensors, UV-sensors Weather Weather forecast, Visual context Cameras temperature Audio Microphones Motion, Mercury switches, angular acceleration sensors, accelerometers, In the tourism domain, a tourist needs information in motion detectors, magnetic three different phases: pre-trip, on trip or post-trip. These fields three phases naturally put different requirements on a Location Outdoor: GPS, GSM context-aware system. When the tourist is planning the trip, Indoor: Active Badge system i.e., in the pre-trip phase, the focus is on user preferences Touch Touch sensors and interests. While the user is on a trip, the system should Temperature Thermometers also take location and time into consideration. It is also Physical Biosensors to measure skin important that the system can provide up-to-date attributes resistance, blood pressure information on, for example, changes in opening times, special offers and current happenings [7,8]. The post-trip On a more concrete level, contextual information can phase is perhaps the least context dependent, as this usually also be gathered from the user, in the form of filling in a consists of giving feedback or sharing vacation memories user profile, containing background information and with others [7]. One of the first prototypes for context- interests. The user can also give feedback on services aware applications within tourism is Cyberguide, which proposed by the system. The system can then store was developed in 1996 and was created to support both information about previous trips; which sights have been indoor and outdoor tours [3]. Today there are many visited and how has the user rated these. Databases can context-aware systems within tourism, but many of them store similar data from various users, in order to be able to are either for a quite restricted domain (a certain city or a give recommendations based on these experiences [3, 6]. certain area, or focused on only one service type, such as restaurants) or use only a few aspects of context (mostly use either location or user profile, without combining these two) [6]. V. CONTEXT MODELING

The concept of context is vague, and in order to be able to compare contextual data and turn it into a machine 3 processable form, context models are used. There are a do some reasoning about it. The system should map this number of different ways of modeling contexts, for data to the chosen context model, i.e. convert lower-level example key-value models, markup scheme models, object- data to higher-level data. It is also necessary for the system oriented models, graphical models, logic based models and to consider which data is relevant and how it can be used ontology based models [12]. The models can be seen as within the system. The system should also be able to creating a vocabulary for the context, for example if we predict, detect, and react to changes in context [13]. have a context type called temperature, it could consist of the contexts cold, normal and hot [10]. The model should As an example of reasoning in a context-aware system also represent the relations and dependencies between we consider a system that gets GPS coordinates from a entities. A partial example of context modeling can be seen GPS module in a mobile device. These coordinates are in Figure 1, where the main types Service, Activity, mapped to the location Arc de Triomphe in Paris. The Destination and Tourist Type are pictured. Service includes system also gets a timestamp from the mobile device or a bungee jumping service, which is offered at a national network, which is mapped to "lunch time". According to park, which is a tourist destination. A tourist, who is what the system knows about the user, e.g. what budget is classified as the type Thrillseeker, might be interested in suitable and which food preferences the user has, the the bungee jumping service, if he/she is at a location where system can then suggest restaurants in the vicinity of the this service is offered [7]. It is apparent that a model like user. Or, if the system has noticed that the user has had an this quickly becomes very complex. It is important to keep alarm at a late time, it can assume that the user is not the entities and relations easy to understand, so that errors hungry yet, but instead suggest points of interests, based on do not occur due to misunderstandings. The model should the user interests, to the user. also be flexible, so that new entities or relations can be added later on. The model should also be generic, so that it We point out the complexity behind this type of can be applied to many different types of context and it reasoning with the meaning of the expression "in the should be expressive, to describe contexts to a detailed vicinity". The meaning of this phrase might be influenced level. by factors such as: -the transportation available, -the user's physical activity -the weather, VI. CONTEXT REASONING -the user's familiarity with the region, The contextual data retrieved from sensors, or other -the terrain sources, can be considered as raw data. In order to use the -the activity or goal [14]. retrieved contextual data, it is necessary for the system to For a pedestrian tourist, located in the middle of a large city

Figure 1: Part of a context model [7] 4 and looking for a restaurant, "in the vicinity" might mean [4] Schilit, Bill & Marvin Theimer. 1994. "Disseminating Active Map only a few hundred meters; for a nature tourist interested in Information to Mobile Hosts". IEEE Network, 8(5), pp. 22-32 [5] Williot, Romain & Dan Grigoras. 2008. "Context Modeling for rare birds, "in the vicinity" might mean hundreds of Urban Mobile Applications". Proceedings of the 2008 International kilometres. Symposium on Parallel and Distributed Computing, pp. 359-266.. [6] Grün, Christoph et al. 2008. "Assisting Tourists on the Move - An Evaluation of Mobile Tourist Guides". Proceedings of 7th International Conference on Mobile Business, pp. 17-180. VII. OTHER DESIGN ISSUES [7] Ihlström Eriksson, Carina & Maria Åkesson.2007. "Multi-users and There are also other design issues to take into Multi-contextuality - A Mobile Tourism Setting". Proceedings of HCII 2007: Human-Computer Interaction. 4, pp. 48-57. consideration when creating a context-aware system. [8] Barta, Robert et al. 2009. "Covering the Semantic Space of Tourism Although the capacities of mobile devices are improving all - An Approach based on Modularized Ontologies". Proceedings of the time, there are still some limitations compared to the 1st Workshop on Context Information and Ontologies, pp. 1-8. [9] Indulska, Jadwiga & Peter Sutton. 2003. "Location Management in desktop computers: smaller screen, limited bandwidth, Pervasive Systems". Proceedings of the Conferences in Research energy, processing, and storage capacities. In the case of and Practice in Information Technology series. 21, pp.143-152 . tourism, the way the system connects to a network can also [10] Baldauf, Matthias, Schahram Dustdar & Florian Rosenberg. 2007. "A Survey on Context-Aware Systems".Ad Hoc and Ubiquitous be important factor: Computing. 2(4), pp. 263-277.  connecting via cellular data can incur high [11] Schmidt, Albrecht & Kristof Van Laerhoven. 2001. "How to Build costs, especially abroad Smart Appliances?". IEEE Personal Communications. 8(4), pp. 66- 71.  connecting via wireless local networks can [12] Strang, Thomas & Claudia Linnhoff-Popien. 2004. "A Context possibly be used for free, but has limited range, Modeling Survey". Workshop on Advanced Context Modeling, if WLAN stations are not widely available Reasoning and Management, UbiComp 2004, pp. 33-40 [13] Nurmi, Petteri & Patrik Floreéen. 2004. Reasoning in Context-Aware  Bluetooth can be used for free, but has a very Systems. Helsinki Institute for Information Technology. Position limited range paper. The system should therefore also consider whether it is [14] Zipf, Alexander. 2002. "Adaptive context-aware mobility support for tourists". IEEE Intelligent Systems 17 (6), pp. 57-59. possible to use the system without a continuous connection [15] Kenteris, Michael, Damianos Gavalas & Daphne Economou. 2009. to a network [15]. Contextual information often includes "An innovative mobile electronic tourist guide application". sensitive information about the user, such as his/her Personal and Ubiquitous Computing. 13(2), pp. 103-118. location and activity. Therefore it is also important to think about security in context-aware systems and how can the user's privacy and integrity be protected [10] .

VIII. CONCLUSION In this paper we have briefly overviewed context-aware systems and some of the issues that have to be taken into consideration when developing them. Context-aware systems can lead to a more enjoyable tourist experience, since these systems can help tourists to find information easily. Context is, however, a very complex matter; contextual information is not always easy to obtain and it is not easy to employ this information in an efficient way. To create a context-aware system within the tourism domain, it is not only important to understand the technology that lies behind, but which factors influence tourists and their decision-making.

REFERENCES [1] Dey, Anind K. 2001. "Understanding and Using Context". Personal and Ubiquitous Computing. 5(1), pp. 4-7. [2] Jrad, Zeina, Marie-Aude Aufaure & Myriam Hadjouni. 2007. "A contextual User Model for Web Personalization". WISE 2007 Workshops, LNCS 4832, pp. 350-361. [3] Abowd, Gregory D, Anind K. Dey, Robert Orr & Jason Brotherton. 1997." Context-Awareness in Wearable and Ubiquitous Computing". Proceedings of the 1st IEEE International Symposium on Wearable Computers (ISWC '97), pp. 179- NETWORK SOFTWARE COURSE 2011 LATEX 1 Load Balancing Network Software Course Thomas Forss

Abstract—Load balancing is used as a way to direct the load among servers. The load is directed according to a predefined algorithm. There is no univeral algorithm for load balancing and as such we need to decide what scenario we are going to be balancing before we can decide which algorithm to use. In this paper we will be taking a look at load balancing in general and three different load balancing algortihms. The algortihms include local load balancing, peer-to-peer load balancing and an algorithm for distributed web servers.

Index Terms—Computer Science, Load Balancing, Domain name system, Distributed Hash Table !

1 INTRODUCTION 2 DEFINITIONS

Here we will go into some concepts relevant to There are several different ways to manage Load Balancing. These definitions will help the load balancing aswell as different kinds of reader understand the different approaches to load balancing algorithms. The purpose of Load Balancing later in the paper. this paper is to familirize the reader with the concept of load balancing and how a few different implementations work. 2.1 Distributed Hash Table A distributed Hash Table manages data by dis- There are web applications of different sizes. tributing it across a number of computers and Some applications are used by so few people implementing a routing scheme which allows and experience so little traffic that hosting the efficient lookup of nodes on which a specific application on one server is enough. Other data item is located. Each node in a DHT is applications, for example Facebook, have so responsible for a subrange of the total range many users that they have a whole network of items Steinmetz [6] concludes. A normal of servers in different countries. The problem hash table consists of a table of keys that have with having many servers is how to distribute assigned values, DHT has a similar structure. the load when users connect to the server. That Each pair in a table consists of a key and a is why load balancing is needed. Firstly we will value and any node in a network can retrieve define relevant topics in networking and load information about other nodes. balancing. After the definitions we will analyze Routing is a core functionality of DHT. Based when the different load balancing algorithms on a routing procedure, messages with their are used. destination IDs are delivered to the DHT node In chapter 2 we will take a look at some which manages the destination ID. The fun- concepts relevant to load balancing. In chapter damental principle behind DHT is to provide 3 we will be looking at three different ap- each node a limited view of the whole system proaches to load balancing. In chapter 4 we by storing on it a bounded number of links to continue looking at the different approaches by other nodes. When a node recieves a messsage examining three different examples. In chapter headead for another destination that it is not 5 we compare the different approaches and try reponsible for itself, it forwards the message to to draw a conclusion. the relevant node, Steinmetz [6] continues. NETWORK SOFTWARE COURSE 2011 LATEX 2

2.2 Domain Name System websites manually when they notice that a The Domain Name System is basically a site is working slowly, however, this is not database of host information. DNS is indexed a satisfactory solution. A better solution is by domain names. Each domain name is es- to transparently route traffic among several sentially a path in a large inverted tree, called servers that all answer to the same URL. the domain namespace, writes Liu et al [5]. The DNS tree has a root at the top and can branch According to Bourke [1] the purpose of a any number of ways at each node. Domain load balancer is to intercept traffic destined for names are always read from the node toward a site and redirect that traffic to the appropriate the root with dots separating the names in the server. The redirection is done according to path according to Liu et al [5]. A domain is a the algorithm that has been chosen for the subtree of the domain namespace. The domain load balancer. The load balancing is completely name is the same as the domain name of the transparent to the end user and there are often node at the very top of the domain. The data up to hundreds of servers operating behind a associated with domain names is contained in single URL [1]. resource records, also known as RRs. Figure 1 shows an example of how a Domain Name 3.1 Local algorithm System can be set up. A domain name system There are many ways to do local load bal- tree has several branches. Each branch has its ancing. According to Gehrke et al [2] any al- own name that is added as a suffix in the end gorithm that probes the servers in question of each other child node. For example a branch can be said to have a local algorithm. A nat- with the name org will add the suffix .org to ural and local load balancing algorithm is to each child node in that branch. periodically have each server to communicate with the other servers in the same system. With this algorithm we need to form a ring network so that each server can communicate their data forward. The servers with higher load could then send some of their load to neighboring servers in the system Gehrke et al [2] concludes. A static load balancing problem like this assumes that each processor has some initial measurement to balance and the load stays the same during the balancing period, writes Gehrke et al [2]. According to Gehrke et al [2] the load is measured in tokens of cpu-utilization and memory usage and sent Fig. 1. Example of a Do- main Name System tree. Source: clockwise to other servers in the server-ring if http://bio3d.colorado.edu/tor/sadocs/dns/dns.html their load is lower than the server that sends the request.

3.2 Peer-to-peer algorithm OADBALANCING 3 L According to Karger et al [3] a core problem Website administrators as well as in peer-to-peer systems is the distribution administrators of other load varying of items to be stored or computations to be applications face a constant need to carried out by the nodes that make up the improve server capacity. One approach is to system. A computation can be seen as any replicate information across a mirrored-server process, algorithm or measurement that needs architecture Cardellini et al [4] concludes. to be processed by a computer. The standard Mirroring servers allows users to change approach to distributing computations over NETWORK SOFTWARE COURSE 2011 LATEX 3 peer-to-peer systems is using a distributed unlimited amount of different systems but I hash table, also known as DHT [3]. As have chosen an example for each approach mentioned in the definitions each pair in a that will give a good understanding of that table consists of a key and a value and any approach. node in a network can retrieve information about other nodes. 4.1 Local example An important issue in DHT is load balancing. As mentioned in chapter 3 the local algorithm According to Karger et al [3] all DHTs make that we take a close look at is a ring system. efforts to balance load. The load balancing can Say we put a limit of 70 % of cpu-utilization be done by randomizing addresses associated as the line when a computer has to decrease its with each item or making each DHT node amount of computations. In figure 2 we can see responsible for a portion of the total DHT that if we measure cpu-utilization of each of the space. A problem with ranomizing addresses servers in the ring system we can decide when is that some nodes can and will end up having a system needs to reduce the load it has. In a larger portion of the addresses and thus re- the figure we can see that computer 1 has cpu- cieving more load [3]. We can make each node utilization at 90 % and thus sends some of the responsible for a portion of the total DHT space computations over to computer 2 as computer by using virtual nodes so that each physical 2 is the following server according to the local nodes load is a sum of each virtual node that algorithm described in chapter 3 and has a load it handles [3]. below 70 %. In this example none of the other servers have to reduce their load.

3.3 Distributed Web-server system A distributed web-server system is any architecture consisting of multiple Web-server hosts with a mechanism to spread incoming client requests among the servers, writes Cardellini et al [4]. Each server in the system can respond to any client request. Information can be distributed among sever nodes in two ways: content tree replication on a local disk and by distributed file system [4].

A DNS-based approach uses a DNS server for the system nodes translate the URL of the site to the IP address of one of the servers in Fig. 2. Example of local load balancing the DNS-cluster. This way many policies can be implemented and allows the DNS server to select which server to use and can spread the load among servers [4]. The DNS also specifies 4.2 Peer-to-peer example a validity period for the IP-address that later According to Karger et al [7] the main problem will have to be reassigned when the period with DHT is that they don’t offer as good load runs out. balancing as standard hash tables. This due to standard hash tables evenly partitioning the space of hash-function values while DHTs do XAMPLES 4 E not. To cope with that problem most DHTs use In this chapter I will show examples of the virtual nodes, as was mentioned in chapter 3. different Load Balancing approaches that I de- That means that each machine is seen as more scribed in chapter 3. There are ofcourse an than one machine as shown in figure 3. In this NETWORK SOFTWARE COURSE 2011 LATEX 4 example computer 1 and computer 2 are of the same type which means they get an equal amount of virtual nodes. Computer 3 is slightly faster and thus gets a third virtual node. The DHT-table is then divided among the virtual nodes. This example is simplified to make it easier to understand.

Fig. 4. Example of distributed web system

three different approaches to load balancing that fit different scenarios and use cases. A local load balancing algorithm can use a ring of nodes that share load in a clockwise order. A peer-to-peer algorithm can use a distributed hash table as basis for load balancing. A web- Fig. 3. Example of peer-to-peer system server system can use a network of nodes and a DNS server to distribute the traffic among the availible servers. 4.3 Distributed Web-server system exam- The immediate difference that is noticed be- ple tween these approaches are that the local al- As an example of load balancing a Distributed gorithm and the peer-to-peer algorithm do not Web-server system I have chosen to describe need to use any kind of centralized load bal- a system that uses a DNS as mentioned in ancing but instead each server or node commu- chapter 3. As can be seen in figure 4 the first nicates with each other in some other way. The thing that happens is a request to the URL DNS-based approach uses centralized servers www.reddit.com is sent. To be able to change to route traffic. the URL into an ip we use DNS. The DNS looks up the appropriate node for the URL and When talking about differences each of the redirects the user to one of the servers under approaches are meant for different situations. that node. There can be several servers under The local algorithm is meant for use with a one of the nodes. Which server is accessed can limited amount of known servers and the load be determined in different ways according to balancing is mainly for utilization of resources. Cardellini et al [4] and in this example the The DHT algorithms that are used in peer-to- server accessed after DNS has done its course peer systems are made to allow adding and is by round robin, which means that it is ran- removing hosts which means that it is meant domly given to one of the four reddit servers for load balancing between an huge amount in the figure. of computers. The DNS-based approach can be used on a large scale, just as the DHT approach, but is more static. That means that while the 5 CONCLUSION DNS is more secure and widely used it will There are many situations where load balanc- not be able to add and remove computers to ing is used to balance traffic between different its structure as fast as an algorithm based on nodes and/or computers. We have looked at DHT. NETWORK SOFTWARE COURSE 2011 LATEX 5

ACKNOWLEDGMENTS The author would like to thank the participants of the course and the lecturers Luigia and Petter for an interesting and innovating course.

6 BIBLIOGRAPHY [1] Tony Bourke (2001). Server load balancing. [2] Johannes E. Gehrke, C. Greg Plaxton, Ra- johan Rajaraman. Rapid Convergence of a Lo- cal Load Balancing Algorith for Asynchronous Rings. [3] David R. Karger, Matthias Ruhl. Simple Efficient Load-Balancing Algorithms for Peer- to-Peer Systems [4] Valeria Cardellini, Michele Coljanni, Philip S. Yu. Dynamic Load Balancing on Web-Server Systems. [5] Cricket Liu, Paul Albitz. DNS and BIND. [6] Ralf Steinmetz, Klaus Wehrle. Peer-to-Peer Systems and Applications. [7] David R. Karger, Matthias Ruhl. New Algo- rithms for Load Balancing in Peer-to-Peer Sys- tems. MIT Laboratory for Computer Science. 1

Virtual Worlds

Andres Ledesma

grounds where people interact not only between them but also Abstract —The interaction among people is evolving constantly. with the environment. However we explore in detail the ideas Using networks, individuals are able to interact almost underlying VWs in the next sections of this article. As such, continuously and everywhere. On top of the networks, the virtual there is no standard definition of the concept, and the ideas worlds provide new grounds where users can communicate and sometimes tend to differ between authors. We also explore a interact in various ways. It is not surprising to say that one can truly have a second life by interacting with other persons and historical background of VWs, where no electronic devices are exhibiting particular behaviors that will normally not be seen in employed by the participants. real life. In this paper, we review the most important characteristics of the virtual worlds together with their reliance Yet there is another important attempt to define the term of on the computer networks. We will also propose some definitions VW. Bell [2] combines the definitions of several authors and to enable a better understanding of the conforming concepts of proposes his own definition. According to him, a VW is “a Virtual Worlds. synchronous, persistent network of people, represented as Index Terms —Virtual World, Persistency, Avatar, Social avatars, facilitated by networked computers”, and discusses Networking, Computer Networking, Virtual Place. the terms used in his definition [2]. We examine these characteristics shortly.

I. INTRODUCTION We consider both definitions objective because they present HE Internet has brought a wide range of possibilities for a general idea of the concept. However, both authors consider Tpeople to interact. Besides the social networks, email, that a VW needs computers or networks of computers to exist. instant messaging, and online gaming, the virtual worlds We can argue this idea since there are numerous examples of (VWs) provide a unique form of experience. In this paper, we VWs that do not use any electric equipment. As a short review some existing definitions and explain the main features example, we can think of a role playing game where each of virtual worlds. participant represents an avatar and the manager of the game is constantly narrating the adventures and changes that affect Before we examine the concept of virtual world, we present the imaginary world. These games are usually based on some of building blocks first that will help us to understand imagination, and possibly paper and pencil, but still they are this concept. Richard A. Bartle is a well known authority in VWs. From this perspective, we propose the definition of a this matter. He was the co-writer of one of the first known VW as an imaginary space that exists and evolves VWs called MUD (“Multi-User Dungeon”) in 1978, and has independently of the presence of human participation and been publishing books about VWs ever since. Before we contains elements that interact with each other, some of which understand the definition of VW we take a closer look at the being controlled by humans. This is an attempt to exclude the term virtual . Bartle defines [1] reality as "that which is”; and dependency on computers and to make the definition more imaginary as "that which isn't". Between these two, there is a general. In the following, we present another example of an bridge that he identifies as virtual , defined as "that which isn't, early VW in modern Japanese societies as a historical having the form or effect of that which is". Then, according to background. him, VWs are "places where the imaginary meets the real" [1]. Historical Background - Ikegami and Hut [3] argue that "the Bartle [1] considers that a VW is "implemented by way in which Japanese art circles—fluid, casual, yet vibrant computers (or network of computers) that simulate an networks of non-political activities—effectively formed a environment”. He states that inside this environment there are 'virtual world' ". These circles took place during the last few entities, some of which are controlled by people. This author centuries in Tokugawa Japan. These authors state that these considers that "the environment continues to exist and develop activities helped renew the Japanese society and politics. They internally (at least to some degree) even when there are no point out that "the human search for experimenting with people interacting with it; this means it is persistent". He alternative identities in a kind of ‘second life’ is not new at points out that "because several such people can affect the all". This is true and we can read in their essay about the same environment simultaneously, the world is said to be Japanese society during the Meiji period. In those times, shared or multi-user" [1]. We can think of VWs as common people had different social roles, but inside the art circles, none of the social roles mattered anymore. All the participants were equal partners sharing their interests towards the 2 different expressions of art [3]. simple sheet of paper that contains a description of a character in a role playing game is an avatar. Bell says that avatars are Bringing Communication to a New Level - Goffman [3] "controlled by a human agent in real time". He attributes to an states that VWs (understood as supported by computer avatar the faculty of performing actions when its master networks) provide a stronger sense of "co-presence" than any commands it. This is why he mentions that a profile in a social other form of electronic communication. If we think of a video network website cannot perform an action. He describes the conference or any other type of electronic communication, we avatars as "user-controlled puppets" and he points out that the see that, apart from exchanging audiovisual elements, we do timing in which a human commands an avatar is in "real time" not interact with the other persons directly. We cannot [2]. This has many implications, especially regarding the participate in an activity that involves physical contact with performance of the network in which the VW software the other persons. VWs bring us one step closer to this type of operates. We discuss some technical issues shortly. interaction, because with our avatars we can "explore the totality of their joint spaces" as Ikegami and Hut mention [3]. Based on the literature we can therefore understand the term Even if we cannot have physical contact with the other person, avatar as follows. We consider an avatar to be an entity that is there is an "intimate sense of co-presence and being- controlled by a human. The human is the master of the avatar, togetherness" that we cannot find in any other kind of which performs action commanded by its master. An avatar is electronic communication [3]. formed by a description suitable for the virtual world in which We proceed as follows. In Section II we look at the it exists. characteristics of VWs and in Section III we examine the computer supported VWs along with the main technical C. Persistency implications. In Section IV we take Second Life as case study to illustrate the computer-supported VWs and we discuss Persistency is a wide term and an essential characteristic of some opportunities and peculiarities of this software. We VWs. Bell [2] focuses on the ongoing interaction "with or conclude in Section V. without a participant's presence". He says that "A Virtual World cannot be paused". The participant is not the "center of the world", but another element of a "dynamic community" II. CHARACTERISTICS [2]. Bartle focuses on another aspect of persistency. He refers to it as "the amount of a virtual world's state that would be In this section we aim to explain the most important aspects retained intact were the whole system to be shut down and regarding the idea behind VWs. We focus on the features that restarted". He mentions that VWs implement different make VWs so different from other types of networking "degrees of persistency". There are VWs that only retain the software. initial state and the record of users, but there are also some that save "the entire world state" [4]. A. Space and Physics Bell mentions that VWs give an "awareness of space" We consider that when a VW has a high “degree of where the participants and other elements are situated [2]. He persistency”, it “feels” more real. In order to keep a VW points out that the "virtual worlds also offer an awareness of constantly evolving, it is important to preserve the space, distance, and co-existence of other participants found in consequences of the participants’ actions. Imagine how real life spaces giving a sense of environment". According to constructions, wars, parties, exchange of items, and many him, the idea of “ ‘near’ and ‘far’ ” is hard to conceive in other more actions can affect the state of a VW. ways of electronic communications such as websites [2]. Bartle mentions that the "physics" in a VW defines the III. COMPUTER SUPPORTED VIRTUAL WORLDS "automated rules" of how an environment is changed by its A. Computers as Facilitating Agents participants [1]. The "awareness of space" and the "physics" in a VW are key factors that give a sense of reality to the When we participate in a role-playing game without any participants. For example, in LegendMUD [1] the participant computers, we need to consider many aspects. We need to is situated in a space that is regarded to be near or distant from keep sheets with the description of the characters (or avatars), other particular spaces. The principles of physics apply in the the items that we carry, the setting conditions and so on. The interaction with some items that might be unreachable, heavy, person in charge of the adventure has the most demanding role or fixed. because of the amount of information that is needed to be stored and tracked. This information is essential as it gives the participants a good experience in the adventures that take B. Avatars: The Other Me place in the imaginary world. Bell [2] explains the great Bell [2] defines the term avatar as a "digital representation improvement that networked computers bring to these games. (graphical or textual), beyond a simple label or name, that has We consider that this also applies to the VWs. Computers give agency (an ability to perform actions) and is controlled by a many contributions to the experience of a participant in a VW. human agent in real time." Therefore, according to him, a For instance, we consider that the administrative tasks are 3 mostly done automatically, although it might require some then private messages are sent to our mailbox. It might be the maintenance [2]. Besides this automation, the computers bring case that the server will store the messages until we are back a completely new set of multimedia experiences to the online, this depends on the design of the architecture of the participants, for instance the music streaming, voice, the video server. transmission, and the 3D animations. These are some of the advantages that networked computers bring to the VWs. Communication and Latency - Virtual worlds are mostly based However, there are technical implications attached to these on synchronous communication. CISCO CCNP Glossary contributions. defines synchronization as the “establishment of common timing between sender and receiver” [5]. We can understand that the actions and events that happen in a place are B. Networking Technical Aspects and Concepts constantly seen by the users, and thus there is a “common There are many technical issues behind computer supported timing” between the server and the clients. In the same way VWs. In the following section we discuss the main aspects we can see how our avatars perform the commands that we that involve design, networking communication, and issue. This gives us the impression that we interact with the architecture. virtual world fast enough to feel ourselves a part of it. However, if the server cannot deliver a message directly to the Virtual Places - Inside a VW, there are locations that receiver, then the communication becomes asynchronous. This constitute a domain where users can transport their avatars and is because the server will either send the message to an email share a common channel of communication with other users. address, or store it until the receiver is online. We propose the concept of virtual place as a way to refer to the many domains inside a VW. This is similar to a chat room Another important element that affects the synchronous where a person exchanges messages with a group of communication is the latency . CISCO CCNP Glossary defines individuals with similar interests. In a VW, these chat rooms latency as the “delay between the time when a device receives are “physical” places (virtual places) with some distinctive a frame and the time that frame is forwarded out to the elements, where the users might find like-minded people destination port” [5]. We know that IP packets are made of represented by their avatars. The main difference is that in a frames, and thus the information between the server and the VW we have audiovisual elements that give distinctive client is done via IP packets that travel through the Internet. characteristics to a particular place. A theme is mostly As IP packets travel through several routers in order to reach represented in these virtual places as a common and constant the destination, the latency accumulates. We can see that this reminder of the aspects that like-minded users seek in a VW. is a problem that lowers the quality of the user experience. If For example, if we like the 80’s disco music, we can transport we see the actions that happen in the virtual world very our avatars to a virtual place where we might find other delayed or if we cannot operate our avatar fast enough, then avatars dressed with fancy disco outfits, we might also listen we get the feeling that we no longer interact with enough to disco music from the 80’s, watch the spinning disco ball speed with the virtual world and thus we no longer have this hanging from the rooftop of the dance floor and find many sense of “reality”. There is another way of communication in other elements that will make us feel in a discotheque of the virtual worlds that uses broadcast . CISCO CCNP Glossary 80’s in such a virtual place. The variety of these places defines broadcast as “data packet that will be sent to all nodes depends most likely on the size and interests of the community on a network” [5]. This is useful when the maintainers of the behind a given VW. virtual world want to send a message to all the users about upcoming events, problems or fixes. Exchange of Messages - The communication inside a VW can happen in many ways. The most common ways are the instant A Word about the Architecture - We can determine that virtual shared messaging (like in a chat room). In this way if we want worlds are based on client-server computing . CISCO CCNP our avatar to say something, then all the other avatars are able Glossary defines client-server computing as a “term used to to read it. If we send a private message, then only the describe distributed computing (processing) network systems respective receiver(s) can read it. If we think in terms of in which transaction responsibilities are divided into two parts: networking communication, a private message is a unicast or client (front end) and server (back end) [5]. Both terms (client multicast to a selected group of receivers. According to and server) can be applied to software programs or actual CISCO CCNP Glossary a unicast is “message sent to a single computing devices”. In this case the clients are personal network destination” and a multicast is defined as “single computer with client software and the server is the back end packets copied by the network and sent to a specific subset of computer that runs an application that is in charge of network addresses” [5]. We observe that, if our avatar says managing the communication between several clients and the something in a place, then the VW server is in charge of VW. Depending on the number of users connected, a server delivering our message as a multicast to the users that are in needs to manage a considerable amount of connections. For the same place as our avatar. There are other possible ways of this reason, virtual world servers need to have an efficient communications. For instance, if we experience network architecture to manage several transactions at a database and problems and the VW server cannot reach our client computer, networking level. Furthermore, there are audiovisual elements 4

(and in some cases 3D graphics) that the servers constantly under a server schema of at least 6 clusters, and from there it transmit to the clients. This can be more problematic when the can escalate as much as needed. The following illustration streaming of audiovisual elements is critical. This is the case brings a clearer picture of the inner implementation of the when some DJs or artists perform through their avatars in server architecture of a VW. virtual world and the servers need to multicast the stream of information to all the clients that are connected. This gives us the idea of how complex and critical the architectures of virtual worlds are. In the next section we try to approach to this architecture and explain it in detail.

C. Architecture The Architecture may vary depending on the VW, but in general there are some common guidelines and issues that we discuss in the following sections.

Overall Architecture - The architecture differs depending on the virtual world, but according to Bartle [6], the general idea is to have some scheme as described in the following illustration.

Fig. 2. This is a more detailed view of a VW site presented by Bartle. This figure shows how there are two separate databases and how the connection between the servers and the clients is designed.

In an ideal environment [6] each sub-server should handle the same amount of connections. These sub-servers are the instances of the VWs and they are connected to at least two database servers. Generally these databases are separated for maintainability purposes. One database stores the data of the characters and the other the environmental data of the worlds. The connectivity between the sub-servers varies according to the design decisions of the VW. For instance in MMORPG (Massive Multiplayer Online Role Playing Game) EverQuest , the zoning system allows the characters to move from one zone to another. In this case each sub-server has a connection Fig. 1. This is an overview of the Architecture presented by Bartle that shows to an exclusive environment database, but they all share the the architecture of a VW site at a general level. same characters' database. Basically, the way in which the servers communicate depends on the design of the VW. The center of this schema is the User Database. This database needs to run on very high performance hardware in Synchronization - According to Bartle [6] there are two order to provide quick response queries to the rest of the sources of delays that affect the synchronization between the servers. The main servers connecting to the database and clients and the servers. The first on is the time it takes for the handling requests from the users are the Front-End and the information to travel. There is very little we can do here main purpose server. The Front-End server will handle the because no matter how small or compressed the information requests from HTTP users mainly, querying for information, might be, it always depends on the routers and the traffic in a news, profiles, articles, and other web content. These servers given network. The other source refers to the time that a given handle the requests coming from several web browsers. The instruction takes executes in a server. We can minimize this main purpose servers are in charge of creating an time by using powerful hardware and optimizing the execution "incarnation" of the VW itself. They run instances of the VW code. The most common approach to overcome this lack of in what Bartle refers to as "shards". These servers need synchronization is to retain the image of what is happing in the information from several data sources. More about this VW in the clients screen. Bartle mentions as a hypothetical internal architecture is discussed on the next chapter. example that even if the sun explodes, we will not realize that until 8 minutes. So keeping the current vision of the state of a Server Architecture - According to Bartle [6] the VW runs VW in the clients screen doesn’t seem to be a bad work typically on server clusters. The number of clusters pretty around to these problems. However there are other proposed much depends on the number of users. Normally a VW runs solutions that attempt to predict the next movement of the 5 characters and objects when the communication is not as fast being totally explored and there are many implications that fall as needed for the server to perform positioning calculations. outside the scope of this paper. This approach has the problem of warping, which consists on having the illusion that an object is in one position and after a VWs provide new ways in which people can interact. But couple of seconds we see it changing to another without any there are also other issues detached from these implications. smooth transition between them. There is a possibility to For instance, we can think of the legal and moral implications reduce the dependency con the synchronization of the of having a parallel virtual life. This gives us ample room for communication. For instance we can have a VW in which the further research in many different disciplines like sociology, actions happen after a command is issued, without any psychology and even other areas of knowledge like marketing, necessity of reflex response. As an example Bartle mentions administration and economics. The multiple ways in which the case of an archer firing an arrow to another player, and as social experiments can be carried out in a VW remains a hotly a result the player gets hit no matter if the character is moving. debated topic among the academics, but it clearly falls apart This means that we cannot rely on reflex based actions due to from the scope of this paper. No doubt the many applications the fact that the information won’t probably reach its of VWs have opened a new door for scientific observation and destination within the required speed. technological challenges. We hope that this paper has presented to the reader a basic knowledge behind the idea of VWs and that the reader is more prepared to enter the new IV. CONCLUSION arenas of digital and social life that VWs offer. The purpose of a VW is to connect people in various ways. The elements that enrich the communication between the users REFERENCES make this type of network software unique. It gives us a [1] R. A. Bartle, Designing Virtual Worlds . Indianapolis, Indiana: Pearson possibility to experience a virtual parallel life that can Education, 2003. Chapter 1: Introduction to Virtual Worlds, pp. 1-10. [2] M. W. Bell, Towards a Definition of "Virtual Worlds" . Journal of approximate the real one in complexity. Just as in the real Virtual Worlds Research. "Virtual Worlds Research: Past, Present & world, one can play board games, football, or have a chat in a Future", vol. 1 no. 1, Jul. 2008. Available: café. This parallelism is what we consider to be the key point http://journals.tdl.org/jvwr/article/view/283/237 . Visited 12.03.2011. [3] E. Ikegami, P. Hut, Avatars Are For Real: Virtual Communities and of VWs, giving the users a richer experience of the Public Spheres . Journal of Virtual Worlds Research. "Virtual Worlds interactions. Research: Past, Present & Future", vol. 1 no. 1, pp. 2-8, Jul. 2008. Available: http://journals.tdl.org/jvwr/article/view/288/242 . Visited 22.02.2011. We have seen that VWs can function even without [4] R. A. Bartle, Designing Virtual Worlds . Indianapolis, Indiana: Pearson computer networks, because they have existed long time Education, 2003. Chapter 1: Introduction to Virtual Worlds, pp. 53-61. before computers appeared. As an example, we have [5] CISCO Network Academy Program, CISCO CCNP Glossary. Cisco Systems 2006-2009. Available: mentioned the modern Japanese society that used the idea of http://www.cisco.com/web/learning/netacad/index.html (requires login VWs in the form of art circles. We have also considered role with a CISCO account). Visited 21.2.2011. playing games to be VWs that do not require computers. [6] R. A. Bartle, Designing Virtual Worlds . Indianapolis, Indiana: Pearson Education, 2003. Chapter 2: How to make Virtual Worlds, pp. 94-114. However, it is with the advent of networked computers that [7] Britannica Encyclopedia, “Second Life” Article Entry . Academic VWs have got a greater reach and a higher degree of realism. Edition, Britannica online. Available: In fact, computer supported VWs are far more powerful, http://www.britannica.com/EBchecked/topic/1264411/Second-Life. detailed, and attractive. This has many implications regarding Visited 19.2.2011 . [8] Linden Research Inc, What is Second Life? . Second Life, Linden the technical complexity underlying those enhancements. The Research Inc. Available: http://secondlife.com/whatis/ . Visited design, architecture, and communication processes are 24.02.2011. complex. [9] M. Wagner, 12 Things To Do In Second Life That Aren't Embarrassing If Your Priest Or Rabbi Finds Out . Information Week. Available: http://www.informationweek.com/blog/main/archives/2007/04/10_fun_t There are many examples of computer supported VWs. We hings_t.html;jsessionid=ZIEAAZEWPEG2HQE1GHOSKHWATMY32 can enumerate the many popular videogames that are played JVN . Visited 19.2.2011. [10] K. Lyons, Towards a Theoretically-Grounded Framework for over the Internet along with the different VWs that have as a Evaluating Immersive Business Models and Applications: Analysis of main purpose to socialize and connect people. For instance, Ventures in Second Life . Journal of Virtual Worlds Research. "Virtual Second Life is a representative example of a popular computer Worlds Research: Past, Present & Future", vol. 1 no. 1, Part 1: Second Life, pp. 3-5, Jul. 2008. Available: supported VW [7, 8]. The depth of this software has proven to http://journals.tdl.org/jvwr/article/view/289/243 . Visited 10.03.2011. be greater than any other networking software [3]. There are many activities that enrich the interaction of the users (referred as residents in the Second Life terminology), such as exploring landmarks, playing games, chatting and crafting items [9]. There are also many business-related activities that can provide real cash revenue to the residents [10]. As any other social space, in Second Life there are many examples of marketing, propaganda, branding and teaching related activities. However, the possibilities of this VW are far from 1 Introduction to Rich Internet Application Architectures

Petri Heinonen – Åbo Akademi

Abstract—In this article we overview different Rich Internet In this article, we take a look at what RIAs actually are and Applications (RIA) frameworks for creating Web 2.0 how they have evolved. We start in Section II with a history applications. We look at the evolution of web technologies and see lesson on the evolution of Web 1.0 and Web 2.0 and point out how RIAs fit in the picture. We present the fat-/thin-client concept and discuss the technologies used in RIA frameworks. why RIAs are Web 2.0, but all Web 2.0 sites are not We end our survey by summarizing our findings and proposing necessarily RIAs. We continue in Section III by studying fat pros and cons for both client and server side RIA frameworks. and thin client architectures and explain how different RIA frameworks enable us to create these fat or thin clients. In Index Terms— Rich Internet Application, Web 2.0, Fat client, section IV we talk briefly about the technologies behind RIAs Thin client, RIA and in Section V the security aspects associated with RIA development. Section VI will overview some of the RIA frameworks in use today and in Secions VII and VIII we aim I. INTRODUCTION at categorizing them as either client side RIA or server side st eb development has almost exploded during the 21 RIA frameworks. Moreover, we discuss their advantages and Wcentury. The transaction from traditional web pages to disadvantages. Finally, we conclude in Section IX. interactive web applications has transformed the way we browse the web today. Huge portals like Facebook and II. WHAT IS RIA? Amazon would not be possible if web development would The Web 1.0 standard dates back to the beginning of the 90’s have stayed document based, as it was in the beginning of the th when the Internet started to gain popularity with home users. 20 century. Web 2.0 technologies enabled a new kind of web, Tim Berners-Lee created the HTML standard in 1989-1990 to a web where a document is not just a static piece of text with create a unified way of sharing documents [3]. The HTML some links in it, but an interactive application that has language was originally designed solely as a document sharing animations, dynamic contents and a sleek look and feel. This language. The idea was to enable users to share documents however came with a price. The traditional web, Web 1.0, had with anyone over a network. The new thing that HTML a very lightweight information exchange that could be used brought was the possibility to link documents to other with just about any computer from any place, as long as you documents. This mix of text and linking got the name had an Internet connection. However, given the increased hypertext [3][10]. Later on, the HTML standard got extended content size and graphics, issues such as latency started to play with formatting, styling and scripting tags that allowed better a big role. Web 2.0 came to the rescue by introducing looking, and more interactive documents. However, the asynchronous communication that effectively hid the latency original design is still there and the Web 1.0 standard is still from the user. document based, with a single document fetched from a server In the same time, it turned out that developing Web 2.0 and displayed to the user. The fetched document might have applications by hand and from scratch was hard and time links to other documents that are fetched after this, but the consuming. To answer this call for help from the web principle is always the same: one document at a time per view. As the user clicks on a link, or a hyperlink, the current developers, Web 2.0 frameworks started to appear. They document is discarded and the process starts over from the handled things like browser independence, cryptic event beginning. In many cases this leads to unnecessary data handlers, and communication protocols but were still written transfers and, from the users perspective, idle time between and used in the same “low level” language as the original web. page refreshes [8]. This was suitable for relatively small websites, but as technology evolved, so did the size of the websites. To tackle As the Internet evolves, so must its technologies. Users this newest predicament, a subset of Web 2.0 was created. In demand more interactive, responsive, and visually stunning this new subset, all of the benefits from the early “low level” websites. This inevitably leads to more complex documents frameworks were incorporated under higher-level languages that require more scripting, more processing and become like Java and .Net. These higher-level languages enabled larger. That in turn leads to longer response times and larger developers to create huge web applications that were transmissions. The way Web 2.0 addresses these issues is by maintainable and easy to extend. But how did this all happen splitting up these tasks. In Web 2.0 applications users do not and what are these Rich Internet Applications (RIA)? navigate between documents, they navigate within a document. This document is in Web 2.0 called a Web 2

Application, and does not need to be a document at all. When exploiting these features, RIAs are often developed with a user enters a Web 2.0 application, a series of scripting tasks frameworks that are a collection of APIs providing a level of starts to run. In browser-based applications, the initial abstraction from these technologies. The frameworks usually document will contain some static HTML to enable the also bundle some ready to use components that are often scripting process to begin. In non browser-based systems the needed such as buttons, text fields and forms along with more need for a traditional HTML document might be omitted complex add-ons like multimedia, 2D/3D graphics and completely. The initial scripts will download additional scripts animations. In the world of browser-based RIA, different or files that are needed to start an interactive application that browsers follow the web standards to a different degree. A will present the data to the user. All of these scripts are run true RIA framework will abstract these shortcomings away inside a virtual environment like the JavaScript in a from the developer, so that the developer can focus on the task browser or a virtual machine like the Java Virtual Machine at hand. [5] (JVM) or the ActionScript Virtual Machine (AVM). [3][5][8] –Summarizing, a true RIA application is an application, not When the loading phase finishes and the initial content is a document, which exploits asynchronous data transfers and loaded, the user is presented with the first page of the data distribution, distributed processing and an enhanced user application. Until this point the process between modern Web interface. A true RIA framework will create a true RIA 1.0 and Web 2.0 has not differed much. But when the user application and aid the developer by abstracting away the navigates within the application, the entire page is no longer technical parts of the development. discarded. Only needed data and scripts are fetched from the server, while the scripts that were already used are reused if III. FAT VS. THIN CLIENT needed. What’s more, this fetching is done asynchronously The fat and thin client concept refers to the amount of which means that a request is sent to the server and the client processing that takes place on the client side in a client-server thread returns immediately while waiting for a callback from architecture. The “fattest” fat clients are desktop applications the server. What this means in practice, is that while the client that handle all of the logic and rendering on the client side. In is waiting for a response from the server, the application is these cases, the server is used only as data storage. [3][5] fully responsive to the users inputs, for example scrolling or typing. This gives the user the illusion of a fast and responding UI. When the server request returns with the data, the client updates only the parts that need to be updated. Hence, the entire UI does not need to be rendered completely saving time and computing power. [3][5][8]

Another key feature in Web 2.0 applications is the sharing of processing power. In traditional Web 1.0 applications all the processing, like data storage and UI manipulation, was done on the server. If a website has complex data structures and a deep Document Object Model (DOM) structure, then this required significant processing power from the server. In Web 2.0 applications some preprocessing of the data can be done on the client-side before it’s sent to the server, cutting down on the processing power needed for data manipulation Figure 1 - Fat and Thin client architecture [3] on the server. When the server returns data to the client, the client can handle data post processing and the actual rendering A thin client is one that has minimalistic logic in it. A thin of the page, so that the server does not have to spend client will handle only the required rendering of a document or computation cycles on these. This sharing of computation is page. This is basically how the early browsers worked, and one of the key aspects of Web 2.0 applications. [3][5][8] how simple web documents are displayed even today. What this means in practice is that the server is responsible for the Finally, a distinguishing character of a Web 2.0 application data storage, business logic and presentation logic, while the is the ability to store data on the client system.[5] Modern client only displays the content according to the specifications browsers support caching of documents to speed up the given by the server. [3][5] Figure 1 displays an overview of fetching of documents that have already been fetched. But the different levels of computation that can be shared between Web 2.0 applications take this further, by allowing storage of a client and server. The fat client architecture is illustrated to raw data on the client-side. [5] the left (1) and the thin client architecture to the right (5). In

between these we have the client architectures that distribute While all of these technologies are a part of the Web 2.0 the computational load more evenly between the client and the standard, Web 2.0 and RIA are not the same thing. All of the server (2,3,4). features mentioned above for Web 2.0 could be added to a traditional Web 1.0 document one at a time; a true RIA In web technologies, RIA applications are generally application is one that utilizes all of the features mentioned considered as fat clients due to the move of presentation logic above at the same time. [5] To aid the developers in fully 3 and parts of business logic from the server- to the client-side. V. SECURITY ASPECTS OF RIAS This brings RIAs closer to desktop applications. The More and more of the logic of an application move from a distribution of logic can vary from RIA to RIA and from relatively secure server to a potentially insecure client. This application to application. [3][5] inevitably creates more security vulnerabilities. In traditional web applications that consisted of a static page, there was not IV. TECHNOLOGIES BEHIND RIAS too much a hacker could do to compromise security. In RIA Effective communication is one of the cornerstones of applications, some of the logic is on the client side, which modern RIA applications. RIA applications tend to be data effectively gives a hacker access to it. As content to the intensive and complex. This results in large quantities of data application is dynamically loaded, a hacker could for example transfer. Another thing that increases the need for alter the way inputs are validated to allow inconsistent data to communication is the nature of interactivity in RIA be sent back to the server. Fortunately, this can be double applications. In very fat client implementations this might not checked with server side validation. [1][2][18] cause a problem since the changes are dealt with on the client side. But for RIAs that use middleweight client Another security concern for RIAs are cross-site-scripting implementations, the increased interactivity leads to more (XSS) attacks, where a hacker inserts some malicious code to communication with the server. In order to cope with this, an otherwise legitimate webpage. This could allow a hacker to RIAs use technologies that provide effective means of gain access to the user’s personal information or to the user’s communication between the client and the server. [5] system. XSS attacks can be done completely invisibly; a hacker could for example insert a post to a forum containing In the browser-based world of RIAs that do not rely on JavaScript that would run as soon as a user loaded the post. plug-ins, the standard way of communication is Ajax. The The embedded JavaScript could then connect to the hackers’ term Ajax stands for Asynchronous JavaScript and XML and server to download more harmful code and so on. [1][2][18] is in fact a working model that employs several technologies in a specific manner. In Ajax based systems, the presentation A third big security concern for modern web applications is of data is done through XHTML and CSS, which are both well the cross-site request forgery (CSRF) attack that exploits the defined, standardized and wildly spread technologies. trust between the server and the client. In CSRF attacks a Interaction and dynamic content updating is done through the hacker uses a compromised system to send unauthorized DOM, while the client-server communication is done requests to a service to which a user has logged on. This asynchronously through XMLHttpRequests. Finally, all of this means that a hacker could for example make a post on a forum is put together with an Ajax engine that is written in with a false username or withdraw cash from a bank account JavaScript. This engine works as the common link between all as the user was paying his or her bills. [2][18] of these technologies. What this does in practice is that it breaks the old fetch-render-wait cycle of traditional web pages and masks the latency of the roundtrip to the server. [11] VI. OVERVIEW OF RIA FRAMEWORKS Now that we have surveyed the RIA frameworks and some In the plug-in-based world of RIAs, a native implementation of their mechanisms, we move on to study the actual of an interpreter is needed. The interpreter usually takes the frameworks that exist today and their defining characteristics. form of a virtual machine like the Java Virtual Machine We do not aim at presenting all of them, but instead focus on a (JVM) or the ActionScript Virtual Machine (AVM). These short list of some of the most used ones. interpreters work as an intermediary between the client system and the program code that is used in the RIA application. A RIA framework that every web developer should be Before the application is deployed, it is compiled with a familiar with is the Google Web Toolkit (GWT). GWT is a compiler into a form that can be executed on the client system. RIA framework that was created by Google to make RIA These virtual machines work on byte code that is generated by development easier and more straightforward so that large the compiler. Byte code is an intermediary, platform applications would be easier to develop and maintain. GWT independent, version of the original code that is specifically development is done in Java and supports development generated for the target virtual machine. This byte code is then patterns such as Object Oriented Programming (OOP). Such mapped by the virtual machine against the native machine patterns become useful, as RIA applications get larger. The instructions for the client system when the application is run. core of GWT consists of a Java to JavaScript Compiler that The AVM works on byte code generated from ActionScript, takes Java code and cross compiles it to JavaScript. The which is a variant of ECMAScript 1.5, also known as compiler also compresses and optimizes the generated JavaScript. [7] The JVM interprets Java bytebode created by JavaScript so that the resulting JavaScript file becomes as the Java compiler from Java source code. This mapping small as possible and runs as fast as possible. The framework against native instructions gives the plug-in-based RIA also provides a large amount of Widgets that the developers frameworks the ability to use more of the available resources can use or extend to create RIA applications. (On its own) of the native platform like multithreading and socket based GWT creates JavaScript that has all the logic on the client side communication. [14][16] and runs like any Ajax application. When used in combination with Google AppEngine or any other server running the application business logic and the data storage, GWT can 4 create RIA applications with mixed amounts of computation application code is written in .NET. When deployed, the and logic distribution. GWT supports a vast number of actual application code is compiled into an Intermediate communication methods for client-server communication, Language (IL) that is then interpreted by the Silverlight including XMLHttpRequests and JavaScript Object Notation player. [16][17] Silverlight can employ several ways to (JSON) [2][1] communicate with a server including Simple Object Access Protocols through the Windows Communication Foundation A RIA framework that aims at a different approach is (WCF) and XML or JSON through serialization. [17] Vaadin. Vaadin is, in the words of its creators: “…a server- side AJAX web application development framework…” [1] VII. CLIENT-SIDE RIA that tries to minimize the “web” part of the web application We now aim at classifying RIAs. It is actually easier to development. The Vaadin toolkit minimizes the need to write describe the characteristics not displayed by a client side RIA JavaScript, HTML and CSS, by providing a client-side engine framework, than it is to say -“these are the characteristics of that can be programmed directly from the server side Java client side RIA frameworks”. In the end, it all comes down to code. This is done in practice by Vaadin employing GWT to the definitions of the Fat and Thin client architectures. In render the UI and a JSON based language called User essence, a client side RIA framework should be a framework Interface Definition Language (UIDL) to forward all of the that creates a fat client RIA application. However, how fat is user interactions asynchronously to the server. The framework open to interpretation. In this article, we define a client side maps the user interactions back to the server side code, where RIA framework as a framework that encourages the user to it can then be validated and modified. As the server side of place the following things on the client side: Vaadin runs entirely inside a Java Servlet, it can be used with any Java Application server and connect to any data storage • Rendering engine that Java can connect to. [1] • Presentation logic

• Most of the Business Logic Another server side RIA framework that uses a thin client side engine and a fat server side architecture is Echo 2. In This means that the server will handle: Echo 2, the server side is written in Java and runs inside a Java Servlet. Therefore, it can be used with a number of data • Data access and Persistence storage solutions and can handle the business logic securely on • A small part of the Business Logic the server side. The client side engine is written in JavaScript and supports additional JavaScript modules that can be added So what are the pros and cons of the client side RIA to extend functionality. The server side is kept up to date with frameworks? As we have noted earlier, the offloading and the client side through ClientMessages that are XML delegation of computations and logic to the client is what messages sent asynchronously over XMLHttpRequests. The RIAs are all about. In the client side RIA implementations this server responses are sent as ServerMessages, that are also effectively reduces the computations done on the server. When XML messages containing the components and data that the business logic and data manipulation is done on the client, should be updated on the client side. [12][13] then the server becomes almost just a persistence server that hosts the compressed client side application code as well. In essence, this means that servers can serve more clients with Adobe Flash is a plug-in based RIA application framework fewer resources and less bandwidth. On the downside, client that runs in a web browser. Flash applications are written in side RIA implementations pose a potential security risk. When ActionScript 3.0, an Object Oriented Programming language the business logic and data validation is done on the client, that is compiled and run in a separate virtual machine called there is always the possibility of misuse. ActionScript Virtual Machine (AVM). The Flash platform does not come as a standard part in any OS today, although it From the RIA frameworks discussed above, Flash is might be preinstalled on some systems; this means that users probably the clearest client side framework, due to the fact have to download and install it manually. Flash can also be that Flash applications tend to keep all of the logic on the used in conjunction with other frameworks that make it a client side and only fetch new data from the server. GWT in powerful add-on. RIA applications created with ActionScript its simplest form (no AppEngine or server running the can also be run separately in native mode through Adobe AIR business logic) is also a strong contender in this section. so that the application can take full advantage of the client However, we should notice that GWT applications can be system. [15] created so that the business and parts of the presentation logic are on the server, which becomes clear from the Vaadin Silverlight is a .NET based RIA application framework that framework. is plug-in based but can run both inside and outside of a web browser. It represents Microsoft’s vision of a perfect RIA VIII. SERVER-SIDE RIA application framework that mixes plug-in based RIAs with the .NET framework and Windows Presentation Foundation Server side RIAs are even trickier to characterize then client (WPF) (the Microsoft desktop application UI-framework). side RIAs. The point of the RIA architecture is to distribute Silverlight uses eXtensive Application Markup Language the load between the client and the server. So where do we (XAML) just like WPF to define UI elements, while the draw the line, when is the distribution of computations and 5 logic so big that a framework can no longer be called a server integrity you should lean towards the server side frameworks, side framework? In this article, we draw the line at the as these will help you to stay organized and at ease, knowing business logic and data storage. If the business logic is on the that security is less of an issue. server and the primary data storage and persistence can be found on the server, then we consider it a server side RIA framework. The presentation logic can be either on the client side or shared. Server side RIA framework characteristics in this article are therefore as divided below:

REFERENCES Client side: Rendering engine [1] M. Grönroos, Book of Vaadin, www.vaadin.com/book • [2] R. Cooper, C. Collins, GWT In practice, Manning 2008 • Presentation logic [3] M. Linnenfelser, S. Weber, J. Rech, An Overview of and Criteria for the Differentiation and Evaluation of RIA Architectures, Server side: [4] Daniel Woolston, Pro Ajax and the .NET 2.0 platform, 2006 Presentation logic [5] M. Busch, N. Koch, Rich Internet Applications – State-of-the-Art, 2009 • [6] P. Fraternali, G. Rossi, F. Sánches-Figueroa, Rich Internet Applications, • Business Logic http://www.computer.org/portal/web/csdl/abs/html/mags/ic/2010/03/mic • Data Access and Persistence 2010030009.htm [7] C. O’Rourke, A Look at Rich Internet Applications, The server side RIA frameworks tend to work against the http://wolfpaulus.com/wp-content/uploads/2010/06/richinetapp.pdf, [8] J. Farrell, G. S. Nezlek, Rich Internet Applications The Next Stage of distributed architecture of RIAs and move back towards the Application Development, traditional Web 1.0 model where the server does all the work. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4283806 However, there are benefits in keeping the logic on the server, [9] P. J. Deitel, H. M. Deitel, AJAX, Rich Internet Applications, and Web one of which is the security aspect. There is little a hacker Development for Programmers, 2008 [10] C. Musciano, B. Kennedy, HTML & XHTML: the definite guide, 2006 could tamper with the validation logic, as it’s never present on [11] J. J. Garret, Ajax: A new approach to Web Applications, 2005, the hackers system. And when the rendering engine and http://experiencezen.com/wp-content/uploads/2007/04/adaptive-path- presentation logic is on the client side we still get distributed ajax-a-new-approach-to-web-applications1.pdf processing as the server doesn’t have to care about how the [12] Echo2 Fundamentials, http://echo.nextapp.com/site/node/32 [13] Echo2 Client/Server synchronization, content is shown on the client side. A potential drawback here http://echo.nextapp.com/site/node/33 is the increased bandwidth requirements as the client sends [14] Adobe Flex – Training from the source updates to the server for each user interaction. [15] G. Lawton, New Ways to Build Rich Internet Applications, IEEE http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4597127 [16] J. Beres, B. Evjen, D. Rader, Professional Silverlight 4, The server side RIA frameworks that are designed as server http://www.amazon.com/gp/product/0470650923?ie=UTF8&tag=variou side RIA frameworks can be easily recognized from the ssite07- presented frameworks. Vaadin and Echo 2 are clearly made so 20&linkCode=as2&camp=1789&creative=9325&creativeASIN=047065 that the business logic is strictly on the server side, while the 0923 [17] Accessing Web Services in Silverlight, http://msdn.microsoft.com/en- client side engine handles the rendering and the presentation us/library/dd470099(VS.95).aspx logic. However, as can be seen from the Vaadin – GWT [18] G. Lawton, Web 2.0 Creates Security Challenges, relationship, a client side RIA framework can be used so that http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4343682&tag=1 it works like a server side RIA framework. If the developer [19] wants to, he or she can always move the business logic to the server. Hence, a definite categorization here is very difficult to make.

IX. CONCLUSION In this article we have presented how RIA frameworks work and how they evolved from Web 1.0 to their current form. We even presented a few RIA frameworks and discussed their potential classification - client or server side framework. We also noted that the definitions are quite hard to make, as the actual RIA application implementation can tip the fragile balance in one way or the other. The end result is that the choice of framework depends on the market that you are targeting. If you need a good-looking simple RIA application that runs fast on a lightweight server, you should probably look towards the client side frameworks. These will give great performance and enable the usage of heavy computations on the client system. However, if your target is a very large and mission critical system with absolute weight on security and

SECURITY IN CLOUD COMPUTING

Miikka Kaarto

34353

Table of Contents

1 Abstract ...... 3

2 Introduction...... 3

3 Clouds and cloud services...... 3

3.1 Public clouds...... 4

3.2 Private clouds...... 4

3.3 Hybrid clouds...... 4

3.4 Services in a cloud ...... 4

4 Security issues in cloud computing...... 5

4.1 Solutions for the security ...... 6

4.2 Responsibility in the cloud, legal issues and SLA...... 7

5 Conclusions...... 7

REFERENCES: ...... 8

”A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resource(s) based on service-level agreements established through negotiation between the service provider and consumers.”, (Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic: Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility. In Future Generation Computer Systems, Volume 25, Number 6, Pages: 599-616, ISSN: 0167-739X, Elsevier Science, June 2009)

1 Abstract

The need towards cost-effective, flexible, 2 Introduction scalable and secure ways for complex com- In this paper we go through security issues puter systems to operate has led to the so- and solutions for cloud computing. Chapter called cloud-computing model. This is a new three introduces different cloud types and technical solution for the IT-world where a services that these clouds provide. Security system virtually interacts with a third-part, for issues and solutions are examined in chapter example for storing and sharing files, for leas- four. The juridical side of the topic and re- ing a hardware system for a project or by sponsibilities in the cloud is explained along simply letting the company’s whole antivirus with SLA (Service Level Agreements). system to be run from it. These examples and Finally in the conclusion part a few questions their benefits are fascinating and lead to many and concerns are presented. possibilities. On the opposite side of the coin, there are a variety of issues that needs to be examined before completely relying on these 3 Clouds and cloud services platforms. For example a basic firewall or an In order to understand how to build and en- Intrusion Detection System and their locations sure the secure usage of cloud computing, we need to be carefully re-considered. System first survey the different types of clouds and failure situations uncover several questions as their associated services. There are three main well as the fact that proper juridical rules have types of clouds, namely the public, private not yet been set to secure the clouds. In this and hybrid types. These cloud types have as- paper, we set out to survey the cloud comput- sociated services they can provide individu- ing security. ally or together [1]. 3.1 Public clouds fer private clouds. “The key advantage of These clouds are easy accessible and the most private clouds is control” [3]. There is some cost effective to use since they are located criticism towards private clouds because of outside the user/users; hence, no infrastruc- the fact that everyone does not se them as ture needs to be built or maintained. This clouds, just intranets. Right or wrong, they do makes public clouds the most tempting option remind of traditional intranets. Also, private when considering what type of cloud to clouds still require investments in the IT choose. Public clouds are typically based on infrastructure, which can be seen as a disad- the pay-per-use model and make it easier for vantage. cloud clients to calculate their need of IT ex- 3.3 Hybrid clouds penditures. At the same time expenditure on The definition of a hybrid cloud is somewhat IT infrastructures can be re-invested. Basi- debated. George Reese explains that “A com- cally this means that money that is otherwise puting environment in which both public and invested in building the organisations net- private cloud computing environments are work, servers and physical IT, can now be presented” [3]. The benefits are that they are invested elsewhere [1]. In turn public clouds centrally managed, provisioned as a single are the most insecure because they are acces- unit and circumscribed with a secure network sible through mainstream web browsers that along with secure authentication [1]. As a often are exposed to security risks. The public conclusion hybrid clouds provide the security clouds also compose a great burden on SLA from private clouds in securing the data and (Service Level Agreement) [1], [2]. applications and benefits from the public clouds in compatibility and possibilities in modifications. 3.2 Private clouds These clouds are like an intranet i.e., limited 3.4 Services in a cloud in usage only to an inner circle of users. This Services in a cloud are classified into three feature makes them more secure when types: Infrastructure as a Service (IaaS), Plat- authenticating services and enables wider form as a Service (PaaS) and Software as a monitoring. The virtual cloud environment is Service (SaaS) [2]. built on the company’s servers and not on the Software as a Service (SaaS) uses a third third-party servers like for the public clouds. party to host software and services for costu- Companies and users that want to have com- mers. According to Ramgoving S, Eloff MM plete control over the infrastructure thus pre- and Smith E, this is the most flexible solution of cloud services, because the service is ac- from building and maintaining the infrastruc- cessible through web browsers and can be ture [1]. In other words, in IaaS the cloud pro- hosted from inside the organisation itself. At vider sets the rules for physical security in the same time, great security risks appear be- servers, hosts and networks [1]. cause of the web browsers and the security issues in them; therefore maintenance needs to put great effort in securing the applications Cloud computing is not something one should during the implementation and usage of the just buy and use without planning. The or- cloud service [1]. A vital issue in SaaS is the ganisations or cloud clients needs to first need of trust towards the service provider. understand the correct requirements for the The client needs to trust the third party with operations as well as the different levels of sensitive data that SaaS applications process service that the cloud provider offers. Choos- and the client will not have almost any control ing the right kind of cloud and level of service of the data itself [4]. is less expensive, less problematic and more secure in the long run. Platform as a service (PaaS) saves costs in the fact that the clients do not need to build the 4 Security issues in cloud computing complete infrastructure themselves; they only When comparing these listed types of systems borrow the platform needed to operate with and services to more traditional ones it is ap- [1]. These platforms are located on the service parent that a new way of taking information provider’s virtual servers and need to be pro- security into consideration occurs. Traditional tected as ordinary servers with security rules thinking that only the systems inside the or- “platform rules” that dictate how security ganisation need to be secured with firewalls policies are taken to use [4]. As ordinary ser- as well as different secure measures do not vers these virtual servers/platforms need to be immediately fit in the cloud-computing envi- protected against malicious attacks by means ronment. Security needs to go further and of securing the authentication during all ac- reach to the whole system, i.e., the cloud [4]. tivity and integrity in data transfers sent or John Steven and Gunnar Peterson also list a received from and towards the cloud [4]. few important concerns when dealing with the Infrastructure as a Service (IaaS) includes the infostructure (applications and data) in cloud whole infrastructure, which is virtually pro- services [4]: vided through the Internet. The main benefit with this type of service is the cost effective- • service bindings: service protocols and ness, gained as the organisation is released message formats, • service mediation: services that mediate • detection services: logging and monitoring access to applications and data, (recording and publishing events); and

• message and communication encryption: • key management processes: key generation, confidentiality services at the data and trans- distribution, and lifecycle management. port level, Infostruc- Content and context: Apps, data, metadata, and serv- • message and data integrity: tamper proofing ture ices messages and data, and

Glue and guts: Internet • malicious usage: dealing with asset abuse. Protocol address manage- Metastruc- ment, Internet access man- ture ager, Border Gateway Pro- These listed examples can seem like obvious tocol, Domain Name Sys- tem, Secure Sockets Layer, and public-key infrastruc- facts when talking about network security, but ture. in a cloud everything is little more compli- Infrastruc- Sprockets and moving cated especially when planning and imple- ture parts: Computing, network, and storage menting this on the cloud environment.

Taking metastructure (security policies) to examination John Steven and Gunnar Peter- Picture 1 Categorizes John Stevens and Gunnar Peter- son list these imported concerns [4]: sons Infostructure, Metastructure and Infrastructure [4].

• security token exchanges: the ability to validate and issue security tokens; The listed points of tasks from John Steven and Gunnar Peterson cover the entire entirety • security policy management: policy defini- of the security aspects in a cloud system. tion, enforcement, and lifecycle management; Examining these tasks and points the conclu- • policy enforcement points: mapping name sion can be made that monitoring and observ- spaces, resources, uniform resource identifi- ing different activity in the cloud is vital to ers, channels, and objects; the overall security.

• policy decision points: the workflow for 4.1 Solutions for the security determining access; Some practical measures for securing the • message exchange patterns: defining claims cloud are referred as “the four patterns” [4]. and schemas; These four patterns are gateways, monitoring and logging, security token services and pol- icy enforcement points. Gateways provide the first line of defence in the traffic control “in is a certain degree of inexperience in the and out” from the cloud. Monitoring and log- cloud security and therefore SLA works as ging is seen as the eyes of the security and documentation; in this respect, in a more enables countermeasures against threats. In practical way it is the first line of defence for cloud computing the monitoring and logging the cloud customer [1]. Thus it seems that the is more challenging than in traditional net- more complex a service is, the more precise works because the infrastructure is more SLA is needed to secure both the cloud cus- complex. Security token services are vital in a tomer and cloud provider. As cloud comput- cloud system when moving and authenticating ing becomes more standardized the legislation to several virtual interfaces. Without proper will customize through practical achieve- and secure token exchanges the services in the ments. cloud are vulnerable to a great amount of 5 Conclusions risks. Policy enforcement points go hand in hand with secure token exchanges; these Cloud computing is evolving and becoming a points ensures that proper security is exer- bigger part of our IT-society. There are many cised during the usage of services in the cloud-related aspects that need addressing, cloud. Practically meaning that person A can- among them legislation issues and a thorough not authenticate to one level of service or se- evaluation of the cloud service. The more curity and access other (more secure) levels complicated requirements the cloud customer with the same privileges as in the previous has, the more detailed the possible implemen- level without proper permission. tation of the respective cloud service needs to be, so that both the customer and the provider 4.2 Responsibility in the cloud, legal are on the same line of understanding. The issues and SLA fast pace of businesses often puts significant

The cloud is a concept that we cannot see or pressure on implementations, sometimes lead- touch like a traditional company server lo- ing to unwanted worst-case scenarios. Some cated in the basement of our organisation. of these cases have gained publicity; exposing This causes some problems on the legal side, the risk we all take part in, with or without especially pointing out the question, “on our accord. It is, however, possible that such whose responsibility is the service in the negative publicity puts more pressure on the cloud”. SLA (Service Level Agreement) de- investments and security considerations along fines an operational guideline for both the with the service execution, in this relatively cloud customer and provider [2]. Ramgovind young and explosively growing cloud. S, Eloff MM and Smith E mention that there REFERENCES:

[1] The Management of Security in Cloud Computing, Ramgovind S, Eloff MM, Smith E, 2010.

[2] Security and Control in the Cloud, Klaus Julish and Michael Hall, 2010

[3] Cloud Application Architectures: Building Applications and Infrastructures in the Cloud, George Reese, 2009.

[4] A Security Architecture Stack for the Cloud, John Steven, Gunnar Peterson, 2010.

Indoor Location Aware Computing Using Wi-Fi Networks (March 2011)

Björn Sjölund

Abstract - Location-aware computing (LAC) refers to a II. THE PROBLEM WITH NAVIGATING computing paradigm driven by the physical positioning INDOORS of a device. The outdoor functioning of LAC is nowadays dominated by the Global Positioning System The Global Positioning System (GPS) refers to a (GPS). However, for indoor functioning, GPS is not navigation technology based on a satellite system suitable. Given numerous applications for indoor LAC, devised and maintained by the US government. GPS researchers investigate alternative LAC technologies has proven so useful, that, although initially only such as employing the Wi-Fi networks. In this paper, intended for military purposes, the technology spread we study indoor LAC and review the most important quite rapidly into commercial applications. problems related to the available indoor LAC The first products to make use of GPS were quite basic, technologies. Moreover, we also investigate potential often only capable of displaying longitude and latitude. solutions for achieving an indoor infrastructure stable Thus, the technology was quite useless without an enough for allowing navigation. actual map. This situation changed quite fast as computers became more portable and maps were digitalized. The first commercially available GPS units I. INTRODUCTION with integrated maps changed the world of personal navigation forever. The field of location aware computing is expanding at a rapid pace and there is currently a large drive to bring But however great GPS is as a technology, it has its it indoors. LAC [1] has brought a variety of intelligent limitations. For example, it requires a constant uplink services to the outdoor world with the aid of GPS to a grid of satellites in order to triangulate an accurate technology. But the implementations of LAC for position, a requirement that has turned out not to be so indoor use have yet to make a big breakthrough. There successful in big cities and especially inside buildings. is a high interest in making it work, but it all relies on finding a dependable way of achieving indoor As GPS relies on satellite links in order to function, positioning. this link is easily interrupted if the sky is obstructed. Thus, GPS is fundamentally flawed for indoor use, Currently most of the efforts are being put into using although people have tried using it [2]. Some claim that the existing infrastructures, for example wireless the solution can be found by using cell tower network routers. The routers provide a constant radio triangulation [3] but this remains to be proven for signal output that can be used to achieve indoor commercial applications. positioning, but is it accurate enough? In this paper we will examine some existing techniques indoor The most interesting research direction nowadays is to positioning techniques and try to answer a few try to employ what is already indoors, e.g. the Wi-Fi questions about why it’s so difficult. networks that most large buildings today are completely covered with. Wi-Fi somewhat resemble We will start by looking at the current state of the GPS, in that neither technology was intended to indoor positioning field. In the following sections we function as a means for indoors navigation. However, will examine what methods are being used and what the following property can be employed for indoor advantages and disadvantages they carry. We will navigation. The Wi-Fi access points/routers are conclude by viewing two use cases that implement stationary and have a constant signal output, hence, LBC and indoor positioning. they can be seen as makeshift indoor satellites [4].

However, taming this continuous output into a form changes in the decoration can lead to disruptions in the that can be used for positioning is not an easy task. positioning. This is due to the fact that a building is much like a canyon with loads of nooks and crannies; GPS is B. Relative Signal Strength Indicator (RSSI) notoriously bad in such places, even though the sky is The use of RSSI for indoor navigation is widely unobstructed. The fact that the human body consists of thought of as the most effective technique. The RSSI roughly 70% water adds another potential signal technique requires that one map the entire Wi-Fi dampener into the equation. Building architecture also network characteristics of the area in which the plays a huge role, namely how open are the areas inside navigation is to work. The mapping process creates so the building, what type of building materials were called fingerprints of how the routers RSSI values are used, how many rooms are there? viewed at the different locations. In reality RSSI is not an actual value being measured but a value that is These are just a few of the problems that make indoor calculated based on the measured milliwatt decibels navigation difficult. However, many researchers are (dBm) output from the routers. The RSSI scale is pursuing it, because the potential payoffs are between 0 and 100 whereas the router outputted signal enormous. There are a lot of undeveloped services due dBm values typically range between -10 to -110. This to the fact that indoor navigation is not yet widely conversion is done to make it easier to interpret. available. By looking at what industries have sprung up around GPS it’s quite obvious why the interest is so C. Alternative methods high. Even though there are quite a few companies [5] The methods mentioned above represent the most offering indoor navigation, none of them has yet to common techniques of achieving positioning indoors. open up their technology in the same way that GPS However, there are a lot of variations [6] in how they has. There are many reasons for this, some economical, are used and sometimes the navigation includes an others related to the difficulties of implementing the additional source of information. These sources usually technologies. tend to be something that has a very limited signal range as to allow for greater accuracy in critical areas.

Some technologies used for this are Bluetooth and III. INDOOR POSITIONING THEORY infrared light sources.

The basics of indoor navigation theory are basically the There is also a multitude of ways for calculating the same as for outdoor navigation, i.e. we need to measure position based on either of the measuring techniques. certain indicators (satellites, radio signals, light sources Most techniques make use of program algorithms that etc.) and from those measurements to extrapolate search for patterns. Others may use a more values that can be used to determine one’s position. mathematical signal processing approach. And then There are a variety of ways for doing this and every there are artificial neural networks (ANN) [7], hidden method comes with its own set of advantages and markov models that both introduce the element of disadvantages. artificial intelligence.

The basic principles for most Wi-Fi based methods of The amount of effort put into research about indoor indoor navigation are the following: navigation clearly shows that there is a high interest for this kind of technology. All the different approaches A. Time Difference Of Arrival (TDOA) that are being examined show the lack of a clear path to This technique uses the measured time difference follow. There is no absolute way of achieving perfect of a packet that has been sent within the network. The positioning indoors, but there are multitudes of time difference is measured by looking at the time the techniques that together can improve the accuracy of packet was sent by a router and when it was received. one another. TDOA often makes use of very complicated mathematical models that can, based on the known IV. INDOOR POSITIONING USING RSSI router positions, create estimates of what the TDOA values should be at the different locations in the We will examine RSSI more closely and see how it can environment. This technique is very sensitive to its be used to achieve indoor navigation. As mentioned in environment and is therefore considered to be quite the previous section RSSI is the signal strength of a difficult to implement. Large amounts of people and router, as viewed by the receiving device. In a building that has one or more routers there will be a variation in certainly contain irregularities. And before any pattern these RSSI values depending on the receiver position. recognition algorithm can compare the data, these If you record these variations at a given amount of irregularities must be filtered out. This adds another locations then you are in essence creating a signal map layer to the problem and one that is quite difficult to [8]. solve. The devices used in the recording and the positioning process may contain different radio chips, making them detect signal variations differently. To solve this dilemma one must either limit the allowed devices to be used for positioning or create a model for tracking the differences between the chips. This model must then be used in the filtering process, adding to its complexity.

V. TECHNOLOGIES AVAILABLE FOR LAC

In order to create an indoor LAC service [9] it is necessary to have an environment in where a users location can be determined. The accuracy needed Figure 1. Image showing an indoor environment with depends on the type of service intended and what’s a Wi-Fi network. physically possible inside the environment. The other In the indoor environment that can be viewed in Fig 1 two critical parts are a Wi-Fi network and a device the routers have been marked with black dots and their capable of receiving the signals outputted from the Wi- signal coverage marked with dashed circles. The gray Fi network routers. In order for the positioning to work dots represent locations where the signal variations there must be some kind of way of comparing the have been recorded. signals that are present in the environment; this is what the receiver is for. It will continuously listen to the To record these signals a person would have to walk outputted router signals and using any of the presented around the indoor environment, stand on the marked techniques it will analyze them and extrapolate a spots and record the signal variations. Once the whole position. area has been recorded we have a clear picture of how the signals vary, if at all, between the different The type of LAC service that can be offered to the end locations. user depends on the device being used to determine the position. If it is a mobile phone, then the possibilities The next step would be to create a way to analyze these are enormous. However, if the device is more of an variations in order to determine the location of the embedded system, an electronic tag of sorts, then the device that is viewing them. This requires the service will probably be more oriented towards the development and use of a pattern recognition algorithm administrators of the environment. Modern touch that can identify and compare patterns with each other. screen mobile phones offer great possibilities for An example would be a user standing inside the area developers to create very immersive LAC services. depicted in Fig 1 and requesting that his location be They allow visual content to be displayed in such a calculated. He would most probably do this request via way that a user can get lots of added value from using a mobile phone. The task of the mobile phone is to it. The electronic tag alternative could be used for record the signal variations that it can detect and to tracking assets inside a building, e.g. keeping track of compare them with the variations that were previously shipping containers and boxes. recorded. The pattern recognition algorithm would handle the comparison and once a match has been What is required for both of these devices is a fully found it would return the answer to the user. programmable Wi-Fi radio chip. The chip must be able to receive and transmit data over a Wi-Fi network and This represents one of the most basic and reliable ways perform scans of all the available routers in its vicinity. of determining the position of a device. Even though this technique may appear simple, it is far from it. The problem lies in the recorded signals; they will most VI. APPLICATIONS OF LAC B. Asset tracking service The transport industry is huge and there is a There are a multitude of services that can utilize LAC variety of big companies providing global services. It is but the main idea is that they require some form of important for these companies to keep track of all the positioning to function. The types of possible services packages they ship and store. Say for example that range from consumer to industry oriented. The problem there is a large terminal that store containers that are to at the moment is that there is no open and widely be opened and their content shipped forward. In this distributed system for indoor positioning. This means terminal there can be several thousands of containers that developers of LAC services must often develop and they are moved around when certain containers their own or pay high licensing fees for others. We will need to be accessed. This leads to a problem of locating disregard this problem and take look at two examples them. There is a need for a service that keeps track of that make use of Wi-Fi positioning. The first example all the containers as they are moved so that they can be is a shopping mall application and the second an asset easily found when they are needed. tracking service. This kind of environment is perfect for a LAC service A. Shopping mall application since it needs a real-time system for tracking the Shopping malls are often very large and can positions of the containers. The position of the contain hundreds of stores. This makes it very difficult containers could be obtained by attaching small Wi-Fi and time consuming for new visitors to make the most transceivers to them. These transceivers would of their visit. Traditionally the shopping malls have communicate with a central administration service that made use of static and digital signs that inform and is continuously updated with the positions of each guide the visitors. These signs are only available at container. When a container is to be retrieved it can be certain places, and thus require the visitor to first find found by looking in the central administration service. one of them. This system of information spreading is quite primitive and the user can’t take it with them. This type of service would help the people in charge of finding and moving the containers and would also give The Solution is a mobile phone based LAC service that the managers a better overview of what is stored where. can guide the visitor wherever they might be inside the shopping mall. This application running on the mobile phone would act as a guide to the shopping mall. Now the visitors can check for information when they need VII. CONCLUSION it. The LAC features come into play when you start to In this paper we have looked at the state of some think of the marketing related features of the indoor position techniques and how they can be used. application. We have looked at a few examples of LAC services When a visitor moves around in the shopping mall and and how they utilize indoor positioning. We began by uses the application it is possible to use the positioning looking at the problem of indoor navigation and what information from the mobile device. This information challenges exist. The conclusion we can draw from can be used to direct more appropriate ads and offers to these sections is that the technology exists and it the visitors. For example, a visitor is standing outside a works, but it is very sensitive to changes in the sporting goods store when the application is accessed, environment. This means that there needs to be a by knowing their location the service could display variety of filtering methods to cope with the ever information, ads and offers related to that store. This changing environment. way the user is presented with the geographically most We examined the use of RSSI for achieving reliable accurate information. indoor positioning. We learned that it is possible to Another use for LAC in a shopping mall could be for create a positioning service with this technique but the finding, either places or products. By using the position more exact it has to be the more complex it gets. The data the service could find the closest result for the RSSI approach avoids some of the complexity visitor, be it the closest rest room or a cheap laptop. associated with the TDOA technique. TDOA involves a lot of complex signal processing and often requires specially designed hardware to function properly.

The technology required for indoor LAC is widely [9] M. Hazas, J. Scott and J. Krumm, “Location-aware available and ready to be developed for. New types of computing comes of age”, “Computing”, vol. 37.issue devices can easily be developed due to the standardized 2, IEEE Computer society, 2004, pp,95 – 97

Wi-Fi technology. The problem at the moment is that there is no API available to program against, making Table of contents the development of LAC services very time consuming. If the technology continues to advance in I. INTRODUCTION its current pace then soon developing indoor LAC services will be as easy as developing for GPS. II. THE PROBLEM WITH NAVIGATING INDOORS

III. INDOOR POSITIONING THEORY References IV. INDOOR POSITIONING USING RSSI

[1] A. Berman, S. Lewis and A. Conto, “Location- V. TECHNOLOGIES AVAILABLE FOR LAC aware Computing”, A. Berman, S. Lewis, A. Conto, 2008, pp. 1-7 VI. APPLICATIONS OF LAC

[2] C. Kee, D. Yun, Haeyoung Jun, B. Parkinson, S. VII. CONCLUSION Pullen, T. Lagenstein, “Centimeter-Accuracy Indoor Navigation Using GPS-Like Pseudolites”, (2001, November 1), GPS World, http://www.disa.bi.ehu.es/spanish/asignaturas/17223/C entimeter_Accuracy_Indoor.pdf

[3] E. Hansberry,”GPS Accuracy Without A GPS Chip”, (2009, September 9), Information week, http://www.informationweek.com/blog/main/archives/ 2009/09/gps_accuracy_wi.html;jsessionid=YYANZGU PFV4WVQE1GHRSKH4ATMY32JVN

[4] F. Evennou, F. Marx, “Improving positioning capabilities for indoor environments with WiFi”, IST Mobile and Wireless Communication Summit, 2005

[5] T. Green, “World's biggest cruise ship sails through wireless challenges”, (2010, January 12), Network World, http://www.networkworld.com/news/2010/011210- royal-caribbean-cisco-wireless.html

[6] F. Lassabe, P. Canalda, P. Chatonnay, F. Spies, “Indoor Wi-Fi positioning: techniques and systems”, Institut TELECOM and Springer-Verlag, 2009, pp. 652 – 663

[7] U. Ahmad, A. Gavrilov, M. Iqbal, S. Jin, S. Lee, “In-building Localization using Neural Networks”, Engineering of Intelligent Systems 2006 IEEE International Conference, 2006

[8] G. V. Z`aruba, M. Huber, F. A. Kamangar, I. Chlamta, “Indoor location tracking using RSSI readings from a single Wi-Fi access point”, Wireless Networks vol.13, 2006, pp. 221-235

Niclas Jern 31601 Institution of IT / Comp Eng. Network Software, period III / 2010-2011.

Location Privacy

The diminishing difficulty of finding you in the age of HTML5 and smartphones.

Abstract

In this paper we discuss the impact of new consumer-oriented technologies on location privacy. We focus on the technical aspect of the issue; e.g. how easy is it for a would-be attacker to find someone’s location and use it for nasty purposes. We also briefly discuss what location data can be used for and why many users are seemingly giving away their location data for “free”.

The technology discussion is focused on the impact made by HTML 5 on the computer side and the increasing market share and increasing complexity of mobile operating systems on the mobile side.

users’ location data, both more and less 1. Introduction nefarious cases. We also briefly mention possible future threat scenarios Over the last few years, the smartphone stemming from massive data hoarding segment of the mobile device business of location data by entities like has seen a huge growth [1] and by now Facebook, Foursquare and advertisers. smartphone sales represent 19 percent of total global mobile phone sales. With Finally, the conclusion sums up the 1.6 billion devices sold last year [1], this current situation and suggest some means roughly 300 million guidelines for safely using location smartphones were sold to consumers services on mobile devices and within during 2010 alone. modern browsers.

Simultaneously, HTML 5 has gone from a discussed future standard to a standard actively in use and supported 2. Technology by all major browsers. We begin by defining the problem we [2] defines location privacy as “the are studying. We have now been using ability to prevent other parties from mobile phones for nearly two decades learning one’s current or past location.” and the Internet has been widely In this paper, we argue that it is trivial available to consumers for almost as for a would-be attacker to gain detailed long. What has changed? information regarding the user’s location whether the user likes it or not. The answer is two-pronged. This completely shatters the entire concept of location privacy control currently pushed by major mobile 2.1 HTML 5 operating systems like Apple’s iOS and Google’s Android. The latest draft for the HTML 5 Geolocation API was published on In this paper, we focus on the technical February 10th, 2011 [3]. In practice, approach in chapter II and do not most browser vendors have supported discuss social engineering or game the Geolocation features since the theory-based methods where the user is middle of 2009 [4]. fooled into giving his location away for either social reasons (Facebook Places) This allows web pages that you visit to or for virtual rewards (Gowalla, offer location-based services by asking Foursquare). the browser for your current position. A

The different approaches to control over a user’s location data offered by the most popular mobile operating Figure 1: Geolocation function call. systems and browsers are briefly discussed and analyzed in chapter III to simple function call might look like this inform the reader how the user is given [5]: the illusion of control over his data. A few common attack vectors are then discussed, giving the reader insight into In other words, finding your location how their location may be inadvertently once involved using complex revealed. vulnerabilities to gain local access to your data, but today it’s accessible In chapter IV of the essay body we through a simple API. briefly discuss the possible uses for 2.2 Smartphones

Until a few years ago, a smartphone was a phone that could hold both your Figure 2 illustrates just how easy it is to calendar and your contacts. If you were initialize the relevant objects and start lucky, it even included a low-resolution receiving location data from a camera and an MP3 player. This all smartphone. changed in 2007 when Apple launched the iPhone. "We wanted to make a phone 3. Control so great, you couldn't imagine going anywhere without it," Jobs said in an According to the HTML 5 specification interview Thursday.” [6]. [3], when a function call like the one in Figure 1 is made “An … implementation While we in this paper are not seeking of this specification must provide a to pick the “greatest” smartphone mechanism that protects the user's platform, it’s clear that the launch of the privacy and … ensure that no location iPhone was the start of what some information is made available through might call a smartphone renaissance. this API without the user's express When other handset makers noticed permission.” iPhone sales taking off, touch screen- equipped phones quickly became the In other words, the responsibility is new standard to aspire to. Today there placed on the user of the browser to are at least five major mobile touch- protect his own privacy by denying or enabled smartphone operating systems accepting every request made while competing for market share: Apple’s surfing the web. Unfortunately, it has iOS, Google’s Android, Nokia’s Symbian, been confirmed time and time again [7] Palm’s WebOS and Microsoft’s WP7. We that most users are not very careful and focus here on the first two OS’s as they will gladly click any OK buttons thrown represent the fastest growing at them as long as they get to see the ecosystems today. latest video of a funny cat or help a poor Nigerian prince with his money The relevant things these platforms transfers. have in common are that they all have application distribution services (“app Smartphones handle this issue in a stores”). These app stores are open to similar way. The user is asked either at anyone registering themselves as a install time (Android) or at runtime developer and have very accurate and (iOS) whether the application should be sophisticated Geolocation API’s that are allowed to access location information easy to use, even for inexperienced or not. It is debatable which of these developers. models provide the user with better protection. At first glance, iOS has the upper hand as it allows the user to block location services at runtime while the Android user is forced to choose whether to install the application, accepting it’s use of the location API’s or simply not being able to use the application at all.

Figure 2: Android VS iOS code sample. the user does not want to share the location at this time [9].

One option at this stage is to simply not allow usage of the application if location services are not enabled, thus forcing

the user to enable them. However, a much nicer option is to explore alternative methods. For example, a developer might connect the fact that most smartphone users tend to prefer browsing over WiFi since mobile data is Figure 3: Typical iOS Figure 4: Typical somewhat expensive with the fact that location request. Android Location request. Google has built up an extensive global database of wireless router BSSID’s (Basic Service Set Identifier, basically Research has shown [8] that as many as your router’s MAC address). 67% of Android applications that request location services use location With a simple API call, one can retrieve data and other private data to increase the BSSID of the AP (Access Point) the their ad revenues or otherwise phone is currently connected to and misbehave in a way not documented in with another one send it off to Google. the application description. A valid Google will respond with a JSON1 object assumption would be that the containing the latitude / longitude / percentage is similar for other accuracy for the given BSSID. smartphone platforms. For the purposes of this essay, I wrote a However, so far application developers short example application using the have relied on the user’s ignorance or approach described above and ran it on willingness to accept reduced location my iPhone while in my apartment. Even privacy to access otherwise inaccessible though I denied the original location applications or features. What happens request, the application managed to if we have an intelligent user that blocks pinpoint my location within 30 meters the application’s location API request? in under five seconds without giving the user any indication that his location had

been accessed. 3.1 What if the user says no? It has been demonstrated that similar For the moment, let us assume the approaches can be used to attack position of a malicious application browser users, with location accurate to developer, interested in knowing the within 30 feet [10]. location of the users of an application, whether they agree or not. For the These cases clearly demonstrate that purposes of this exercise, let’s assume the privacy controls offered by we are developing an iOS application. browsers and smartphones are mostly just an illusion, designed to protect the The developer begins by trying the user against obvious attacks and make regular way, asking the user for him feel safe. permission to access the phone’s location, much like in figure 3. If the user accepts, that’s that. If the user 1 Javascript Object Notation – An denies the request, the location object will issue a callback informing him that open standard for human-readable text exchange. 4. Using the data been shown that users are prepared to share their location information for the There are three main types of location right price [11], this is not always the privacy abuse: case, especially if the main functionality of the app is available even without • Targeted location privacy abuse, location services enabled. such as when a spouse installs a malicious application on their The third case is perhaps the scariest partner’s phone to be able to one. Today, most social networks access their location at any time (Facebook, Twitter, Foursquare to name or a developer writes a a few) are actively gathering location malicious application that only data for their users by offering them activates it’s more nefarious virtual rewards for providing accurate features on certain phones (e.g. location data or simply recording any if (thisIsSteveJobsiPhone == true), data available to them if the user denies goto LocationAbuse;). location requests. While we have yet to • Untargeted location privacy see any major abuse of this available abuse where developers choose data, it’s not a big stretch of the mind to to send location data to think that if profits stop soaring, some advertisers to get better fill of these companies might consider rates or better CPM (cost per selling this data to third parties. Of impression) for their ads. course, many other applications • Finally, invisible untargeted available for smartphones may also be gathering of location where a recording our movements and saving potentially malicious entity them in databases in much the same gathers location data for a large way, with the difference that social amount of phones and stores it networks are, for the most part, open away for analysis. and unashamed about it.

The first case is perhaps the most It has been shown [12] that with access obvious case. A jealous spouse, to enough accurate location data, it convinced that his / her partner is becomes trivial to predict user having an affair, installs an application movements with a high degree of on their phone to be able to read their certainty. Imagine then a malicious SMS’s or track their location. Very little entity with access to data from your can be done to prevent cases like these home network about the number and since the person with malicious intent type of devices you have connected and has physical access to the device, thus access to your normal work schedule overriding any user safety warnings (e.g. goes to work at 8 am, never home that might pop up the first time the before 3 pm). It’s easy to argue that as application is run. criminals become more sophisticated, IT assisted robberies might become The second case is perhaps the most more commonplace. common one. Most advertisers require that the developer send the location of the end user to them so they can Conclusions provide context relevant mobile advertisements. Since many apps rely In this paper, we have demonstrated on a “free” model, where the end user that location privacy has become a more pays by enduring ads in an application pressing issue with the continuing or on a web page, developers are development of location-enabled sometimes strong-armed into making consumer technologies. this less than ethical choice. While it has The argument is made by both effectiveness of web browser phishing demonstrating the lacking safety warnings.” In Proceeding of the twenty- precautions taken by browsers vendors sixth annual SIGCHI conference on and mobile OS vendors to ensure Human factors in computing location privacy and simultaneously systems (CHI '08). ACM, New York, USA, presenting possible nefarious use cases 1065-1074. for this information. [8] William Enck, Peter Gilbert, Byung- In addition, the paper gives a brief Gon Chun, Landon P. Cox, Jaeyeon Jung, introduction to the different safety Patrick McDaniel, Anmol N. Sheth – models utilized by mobile OS vendors. “TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones”, 9th USENIX Symposium on Operating References Systems Design and Implementation (OSDI’10), 2010. [1] Gartner Press Release Feb 9, 2011 - http://www.gartner.com/it/page.jsp?id =1543014 [9] CLLocationManager Class Reference - [2] Alastair Beresford, Frank Stajano, http://developer.apple.com/library/ios “Location Privacy in Pervasive /#documentation/CoreLocation/Refere Computing”, Pervasive Computing, vol. 1, nce/CLLocationManager_Class/CLLocat 2003. pp 46-55. ionManager/CLLocationManager.html

[3] Andrei Popescu, Google, Inc - HTML [10] Hacker uses XSS and Google Street GeoLocation API Specification - view data to determine physical http://dev.w3.org/geo/api/spec- location - source.html http://www.securityweek.com/hacker- uses-xss-and-google-streetview-data- [4] Steve Block, Noam Ben Haim “The determine-physical-location blue circle comes to your desktop”. http://google- [11] George Danezis, Stephen Lewis and latlong.blogspot.com/2009/07/blue- Ross Anderson, “How Much is Location circle-comes-to-your-desktop.html Privacy Worth”, University of Cambridge, Computer Laboratory, 2005. [5] Daniel Bryant - A quick test of the HTML5 geolocation API with Firefox 3.5 http://infosecon.net/workshop/pdf/loc http://tai- ation-privacy.pdf dev.blog.co.uk/2009/07/17/a-quick- test-of-the-html5-geolocation-api-with- [12] Andrei Papliatseyeu, Oscar Mayora firefox-3-5-spookily-accurate-on-my- – “Mobile Habits: Inferring and work-pc-6533920/ predicting user activities with a location-aware smartphone”, [6] Jefferson Graham and Edward C. Baig, USA University of Trent, 2010. TODAY - iPhone's launch gives Apple's Steve - Jobs butterflies http://popleteev.com/static/pdf/2008_ http://www.usatoday.com/tech/wirele UCAmI-2008.pdf ss/phones/2007-06-28-iphone- launch_N.htm

[7] Serge Egelman, Lorrie Faith Cranor, and Jason Hong. 2008. “You've been warned: an empirical study of the 29 1 Online Social Networks

Joacim Päivärinne Åbo Akademi University, Information Technologies [email protected]

Abstract—Ever since the inception of Web 2.0, a flurry of further understand the network structure. online social networks (OSN) have risen. These networks enable users to share content, as well as keep social relation(ship)s alive. One of the latest OSNs to take center stage is Twitter. Twitter is II. SOCIAL NETWORK SITES more commonly known as a microblogging service. These kinds of services deviate from typical blog sites and OSNs such as Users, links and groups are the usual building blocks of a Facebook, in that its approach is very minimalistic. Generally, social network site. Users are registered participants of the light-weight network applications such as Twitter require less network, and make up the main actor of the site. To sign up to time and effort from its users. a social network site, one usually most provide a username, e- This paper consists of two parts. The first part contains an mail and password. The user may add additional information analysis of Twitter’s network properties. In the second part we’ll about himself/herself, which makes up his or her profile. The analyze user intention and why these kinds of networks matter. links represents the relationship between users. These can vary within different networks, for example: business contacts, Index Terms—social network sites, Twitter, microblogging, network properties. family, friends and others with whom one shares similar interests. It’s also quite common for similarly interested users to create groups. Members of such groups can share various media related content such as pictures, videos and I. INTRODUCTION communication within their own sub-network [1]. While microblogging and social network sites (SNS) are fairly new concepts, their building blocks are not, but rather a collection of popular functionality from other sites. A few of A. Definition these functionalities including being able to share content, According to [4] a social network site is a web-based maintain social relations and keep up to date on what’s service enabling its users to: happening. SNSs have become immensely popular with sites 1. Create a profile in the network. The profile can be either like MySpace, Facebook and Twitter leading the way. In this public or semi-public. paper, the latter will be our point of focus. All these sites have 2. Add other users as their friends, resulting in what’s a seen a rapid growth over some period of time, with Twitter referred to as a “friends list”. expanding over 2000% in 2007 to 2008 [1]. 3. Go through their friends list as well other users’ friends Sites like Twitter are linked to Web 2.0, and are centered lists. around its users. Users share content, converse, and seek out Boyd et al. [4] also makes a distinction between social and provide information to those of similar interest. With network sites and social networking sites, with the latter Twitter also being used by celebrities, major corporations and having a greater focus on creating new relations. This is not new sites, it provides a perfect ground for keeping up to date necessarily the primary function of SNSs, where members on what the current trends and hot topics are. Twitter, as a typically seek out and communicate with real life friends light-weight and fast communication tool, provides an easy rather than strangers. way of sharing what one is currently doing. Typical content is The most vital part of any SNS is the user profile. Each user created by posting messages of one’s daily activities, opinions has a unique page, containing personal information, as well as and status. Users can choose whether or not these messages, the already aforementioned friends list. The most basic known as tweets, are to be made public or not. SNSs also information displayed is usually gathered when registering. provide a home for discussion for various social groups Afterwards, user have the option of adding additional content online, be it racial, sexual, religious, or nationality based. such as interests, profile photo, nationality etc. SNS usually With SNSs being such an important tool in online differ from each by their technical features. Facebook, interaction, it’s important to understand why people use SNSs probably the most popular SNS in time of writing, is also and how they go about it. SNSs are believed to have to a great known for having a wide range of different applications such impact on the future of the Internet. Their popularity and as games and quizzes [4]. familiarity alone provide a good starting bases for further Different SNS have different policies on profile visibility. evolving such systems, their clients and structure. Thusly, this User profiles on sites such as Friendster and Tribe.net are paper will present some of the user intentions in Twitter, one visible to anyone, due to search engine crawling. Visibility on of the most popular SNSs to date. We will also present how LinkedIn is determined upon account type. MySpace profiles one might go about identifying “important” people on said are public or restricted to friends only. Facebook changes their site, as well as some of Twitter’s network properties to try to policy ever so often, but by default users in the same network 29 2 can view each other’s profiles [2]. had to endure having their employers and former classmates Upon joining a SNS, the service allows identification of online as well. Friendster also began imposing restriction upon friends. Typically a bi-directional approval is used, i.e. both its fan base. Since profile viewing were restricted to four- users must confirm the relationship. However, this is not degrees, users started collecting friends to view additional always the case. Twitter uses a one-directional tie for what is profiles. The biggest collectors were fake profiles, referred to called “following”, which we discuss later in this paper [2]. as Fakesters. Fakesters represented famous people and other Another key part of any SNS is the friends list. These lists, abstract and general ideas. This did not go over well with the often public, visualize and make up social networks. Not only company, and it increased the rapture between Friendster and does this allow for users to iterate through a friends list, but its users. one can also view which contacts they share, or have in SNSs became widely popular in around 2000 and other common with another user. Not all friends list are public, content oriented sites like Flickr, Last.FM and YouTube began although, if user A can view user B’s profile, A will more often adding SNS features to their sites. With Friendster deleting than not also be able to see B’s friends list. There are Fakesters and moving more towards a fee-based service, its exceptions however, as most SNS define their own set of users encouraged the user of other SNSs. At that time, privacy settings [2]. MySpace was the most popular choice. MySpace welcomed Users can also communicate via comments and private bands, especially indie rock bands. The same group of people messages. Comments are publicly visible, depending on the were among the majority of those kicked out of Friendster privacy settings of the users interacting, messages on the because of breaching profile regulations. MySpace openly recipient’s profile. Private messages are SNSs’ equivalent to e- supported musicians, and promoters used the service for mail. SNSs also provide numerous of popular features and advertising events at clubs etc. Another thing that made services like photo and video sharing (as well as tagging), IM MySpace popular were that the site added features according (instant messaging), and mobile specific interactions [2]. to user demand. But MySpace was not without its controversy. In 2005 safety issues, related to sexual predators, resulted in a exaggerated panic. B. History The most popular site as of this writing is Facebook. It Given the SNS definition in the previous section, Boyd et started out as Harvard-only college network. However, as its al. [4] lists SixDegrees.com as the first true social network popularity increased, other colleges and institutions were site. It featured the three main features of creating a profile, invited to join, and eventually Facebook opened up to listing friends and being able to browse through other users’ everyone. One of the main attractions of Facebook in the early friends list. Note that these features were not new and days, with the college/institution email requirement, was that introduced with the birth of SNSs, but rather gathered from it had a feeling of a “intimate, private community.” [4] other online services such as dating and community sites. Another distinct feature of Facebook is that it allows However, these sites did not combine all these features in one developers to write applications. Applications can be anything package. As an example, AIM and ICQ users could browse from a multiplayer game to a profile editing tool. through their friends lists, but these contacts were not visible Research shows that SNSs are growing ever so popular all to other users. around the world. While investors and corporations sees this SixDegrees closed in 2000. The creator has later said that he as a business opportunity, other companies blocks their thought the service was ahead of its time. Other possibilities employees from accessing these sites at work. For example, that may have contributed to its demise was that people did the US military blocked MySpace, the Canadian government not have not enough friends online to keep the network restricted Facebook access and the US Congress is also relevant, and after the initial registration, there wasn’t that considering blocking access to SNSs altogether in schools and much to do, as people did not want to meet nor reach out to libraries [4]. strangers. In and around 2000, a number of SNS launched. Friendster, LinkedIn, Ryze and Tribe.net all tried to work together, C. Privacy specifically focusing on different aspects of the social In the media privacy is often raised in conjunction with the network. However, Ryze never attracted any major attention, concern of protecting young SNS users. More generally, Tribe.net attained niche fan base, LinkedIn became a business privacy concerns could arise with any user unaware of the oriented network and Friendster, although the most important “public nature of the Internet" [4]. In 2005, Gross and of the four in SNS history, was called “one of the biggest Acquisiti conducted research which showed that information disappointments in Internet history” [4]. Friendster focused on stored in users’ profiles can be used to potentially figure out the idea of matchmaking friends of friends, with the one’s social security number. hypothesis that it would make for better relationships Dwyer et al. [11] implicated that people are more willing to compared to that of strangers. share content on a SNS if they feel that they can trust the site However, as the service grew and more users flocked to the and depending on why they are using the site. Jagatic et al. service, Friendster experienced serious difficulties, both [12] pointed out another point of concern in SNSs in 2007. technically and socially. Technically, the hardware (servers) Publicly available profile data can be used for phishing, since and databases used were not intended to handle the enormous friends (of “exploited” profiles) will be more likely to share amount of traffic. Socially, as the number of users increased, information, especially if said friends think it originates from a online relations grew beyond that of close friends as people trustworthy source. 29 3

Boyd et al. [4] reports that 55% of teenagers have online biggest growth was recorded between the period of December profiles. Approximately two thirds of those profiles are not 2008 and April 2009, with a peak in March 2009 [1][6]. In public. And roughly half of all the public profiles (associated January 2010 Twitter had 75 million users [6]. Erick with teenagers) contains some incorrect data. Schonfield of TechCrunch reported on July 8th 2010 that Preibusch et al. [13] states that since people have different Twitter saw a 190 million visitor count on a monthly basis, opinions and conceptions of what privacy is, the flexibility with 65 million tweets posted every day [8]. currently offered by SNSs to handle security concerns are not Java et al. [3] gives us the following definitions: should complications arise. Besides the user aspect of privacy, 1. “A user is considered active during a week if he or she there’s also the legal aspect. Hodge [14] claimed the Fourth has posted at least one post during that week.” Amendment to the United States Constitution being ill- 2. “An active user is considered retained for the given week, equipped to handle privacy concerns related to SNSs. if he or she reposts at least once in the following X weeks.”

III. TWITTER IV. NETWORK PROPERTIES Twitter was created in San Francisco by a 10-man team [5]. The Internet, Twitter and other SNSs are all part of a class It’s a free social network site and a microblogging service. known as scale-free networks [1]. These networks are prone to Users communicate via messages limited to 140 characters, exhibiting “a small world phenomenon”, and their degree more commonly referred to as “tweets”. Tweets are status distribution follows a power law [1]. updates, answering the question “what’s happening?”. Tuetle [1] asserts that a network with high degree However, according to [5] only 58.5% of the tweets in their correlation and high reciprocity equals mutual acquaintances. test data answered the question. They are displayed on a user’s Most of Twitter’s new users are a result of friend invitations. profile in reverse chronological order, and optionally New acquaintances are added by browsing through the addressed via the @ notation [5]. Forwarding messages, or to “followers” and “following” lists on other user’s profiles. “re-tweet”, makes said messages more publicly available. Statistically the power law exponent for the in and out degree Twitter differentiates itself from other SNS as it’s not as is approximately -2.4, similar to that of the Web (-2.1) [1]. focused on social relations as it is on interest [6]. Users can choose to “follow” one another, be it of common interests or A. Reasons for measurement friendship. Tweets of the user being followed are then sent to all of their “followers”. A list of followers and those being As social networks evolve and expand, they play an followed by a specific user is accessible from that user’s important part of online interactions, be it either on a personal profile on Twitter.com. or commercial level. Due to this fact, it’s important to Most profiles on Twitter are public, a conscience design understand the building blocks of SNSs, as they will most decision made by Twitter’s developers since the site is based likely affect the future of the Internet. Trend-wise, they on informing people of current events. This makes Twitter account for one of the biggest and most rapid growths in relevant in real time. An example of this is CNN, which tweets Internet traffic. SNSs also provide a platform for sociologist breaking news on the site [6]. Honeycutt et al. [5] reports that and marketeers performing research on various behavioral and Twitter activity is higher during weekdays compared to relationship patterns. Twitter’s one year growth as reported by weekends, and that the typical user is between 25 and 44 years [1] from 2008 to 2009 can be seen below: of age. There are more than 100 third-party clients [5]. Twitter is a simple service. The structure can be divided into Site March 2008 March 2009 Growth nodes (users) and directed links (relations), making it an directed graph, ideal for analysis [1]. Twitter 520,000 13,858,000 2565% Twitter’s front end is built on Ruby on Rails. Their development favors and primarily uses Rails, however, as they Ning 1,463,000 5,609,000 283% like its capabilities, and use of AJAX, to handle user interfaces. Their back end saw a shift from Ruby to Scala and Facebook 24,940,000 69,151,000 177% JVM, as reliability and heavy processing became a major Bebo 2,483,000 6,149,000 148% concern [10]. LinkedIn 787,700 15,815,000 101%

A. Growth Multiply 780,000 1,527,000 96% Twitter employs a sequential identification for both their users (user ID) and tweets. This data is provided by the Twitter API. Thusly an estimate of the service’s growth can be made, by following these numbers during a period of time. Historical B. Geographical distribution data is not provided [3]. For those who choose to, Twitter can incorporate location Twitter received attention in 2007 when it won the SXSW into tweets. Boyd et al. [3] performed a test that included conference’s Web Awards. According to [3], new tweets are 76,000 users. Among these, 39,000 users’ location could be approximately experiencing a twofold increase monthly. Its parsed using Twitter’s API to access the user information, and Yahoo!’s Geocoding API to determine the geographic location. 29 4

American, European and Japanese users dominated the test connected. In Twitter, this would mean that tweets would group, with Tokyo, New York and San Francisco being the become more widely available. highest ranked cities. Network density: A network’s edge count, divided by the Twitter enjoys a great following worldwide, and its social network’s greatest conceivable number of edges. In other network is intercontinental. International relations were words, a network density of 1 is the densest possible network. verified by extracting the start and end points’ geographic Betweenness: If a vertex is a part of a great number of coordinates (latitude and longitude, excluding elevation) for shortest paths between a pair of random nodes, it has a high every edge in the graph. The intracontinent connections betweenness. exceed the intercontinent. This could partially be explained by Closeness: Defined as “the mean of all shortest paths the fact that language plays a vital role in communication. between a vertex and all vertices reachable from it.” [1] People are also more prone to becoming acquainted if they are Following ratio: In-degree divided by out-degree. If this geographically close by. Body et al. [3] also concluded that value is less than 1, it means that the user’s following count is the Asian and European communities exhibit a “higher degree greater than their follower count. This kind of account is correlation and reciprocity than their North American typical for information seekers, and possibly bots. If it’s close counterparts”. Approximately half of Twitter’s social network to 1, the degrees are similar. If it’s above 1, the user’s follower is based in the US. The following table represents Twitter’s counter exceeds their following count. geographical distribution in 2007 as reported by [3]: Typically popular users, whom themselves are active users, sharing valuable information to their followers. A user with a following ratio above 10 is quite possibly someone whom the Continent User Count media takes interest in, or exhibits too big a fan base to return North America 21064 the favor of following back [1].

Europe 7442 V. USER INTENTION Asia 6753 Based on a blogger survey, Nardi et al. [15] reports that the Oceania 910 reason why people create and share content online is because they want to talk about their everyday experiences and South America 816 opinions. Bloggers can also provide support for those going through a similar difficult time or situation as the blogger Africa 120 themselves. This support goes both ways, as bloggers whom get feedback by their readers, and other users of the same Others 78 community, are more likely to keep on posting content. A user Unknown 38994 often joins a network because of friends already present on said network. Without familiar faces, and no one to converse with, a network is more prone to becoming unimportant and C. Descriptive measures irrelevant. Analyzing social networks means using social psychology To explore user intention, Java et al. [3] used the HITS, or methods rooted in the 1960s and 1970s. These methods were hyperlink-induced topic search algorithm. This algorithm the result of a collaboration between sociologists and presents hubs and authorities in a network, calculated as researches. Social network analysis techniques today also follows: incorporate mathematics, statistics and graph theory amongst others. Nodes and links play an important part of these analysis techniques. Consider the following: Strongly connected components: Directed graphs in which every vertex is connected by a path to all other vertices, are called strongly connected. A decrease (in a sub-graph) Regarding a specific topic, an authoritative web pages are increases the probability of re-tweets being available to a an “authoritative sources of information” [7]. A hub page is a larger number of users. web page that contains links to other web pages of authority. In-degree: The number of incoming edges for a vertex. A hub is considered good if it contains links to several good Corresponds to the follower count of a user. authorities. An authority is considered good if it’s contained in Out-degree: The opposite of in-degree, i.e. number of many good hubs [7]. outgoing edges for a vertex. Corresponds to the following Users with high authority and hub values have many count of a user. followers and friends. Those that have a high authority but low Complete graph: A graph in which there is an edge between hub score have many followers but not as many friends. Low every pair of vertices. authority and high hub scores indicate a low friend count but a Clustering coefficient: A measure comparing the relative high following count. In [3], these three types are divided into similarity between a vertex and its neighbors to a 1) information sharing, 2) information seeking and 3) corresponding complete graph. If the coefficient sees an friendship-wise relationship. increase, the network affected becomes more internally After having identified hubs and authorities, communities 29 5 are sought after. Boyd et al. [3] defines a network community Messages Capture 1 Capture 2 Capture 3 Capture 4 as “a group of nodes more densely connected to each other posted than to nodes outside the group”. Since only friendship-wise communities were considered, Java et al. [3] used CPM, or the Max 173,846 176,690 177,004 178,374 clique percolation method, to detect communities that are dense and overlap one another. A community member is not Mean 1,144 1,243 1,254 1,300 necessarily connected to all other community members. Min 0 0 0 0 However, the vast majority of members are connected to several other members in a community. Identifying communities using CPM is based upon finding every k-clique Per user, 80% have contributed with a post count over 1500, union reachable from one another by adjacent k-cliques. A pair whilst 5% had a post count over 5000. Using a cumulative of k-cliques are adjacent when sharing k-1 nodes [3]. distribution function, Teutle [1] determined that “overall Communities are generally brought together by people that distribution maintains its shape” meaning “most of the active share an interest or are looking for a certain topic. Needless to users post regularly.” say, users are more prone to engaging in community active where they share similar interests as the rest of the community In-degree Capture 1 Capture 2 Capture 3 Capture 4 members. Some of the members function as information providers, while others are simply searching for new or Max 884,631 1,318,608 1,350,777 1,504,914 information of interest. Based on their analysis and test data, using HITS and CPM, Java et al. [3] concluded that most Mean 4,925 6,420 6,584 7,293 popular of user intentions of Twitter are the following: 1) Daily chatter, which remains the most popular use of Min 0 0 0 0 Twitter. 2) Conversations, corresponding to approximate 13% of all In capture 1, approximately 25% of the users have 50 posts by 21% of users. followers or more. Changes in the in-degree CDF were a result 3) Information sharing, 13% of all posts, usually containing of increased number of follower, and in capture 2, the group of URLs. people with 50 or more followers only made up an 4) News reporting, which has seen an increase due to the approximate 20%. It’s also noteworthy that famous people, or availability of the Twitter API [3]. people with more than 600 followers, tend to have a very fast increase in their follower count. Many Twitter users register only to follow their favorite “celebrity”. The user aplusk VI. TWITTER CRAWLING & EVALUATION (Ashton Kutcher) saw an increase from 812,697 followers in There are at least two ways of crawling Twitter for data; one capture 1, to 1,504,914 followers in capture 2. of them is a web-based strategy whilst the other uses the Twitter API to gather information. A web-based crawling Out- Capture 1 Capture 2 Capture 3 Capture 4 method simply parses the HTML tag data present in a web degree page. Not only is this method time consuming, but can also return a lot of unimportant and unessential data. API based Max 628,151 628,151 763,578 765,408 crawling is much more specific, and there needs to be very little, or close to no, overhead in the parsing. Twitter4J or Mean 1,492 1,819 1,846 1,954 Twitter for Java is a Java library for the Twitter API, released under the Apache License 2.0. Twitter moved from Basic Min 0 0 0 0 Authentication to OAuth in August 2010, claiming an improvement in security and experience. Examples of how to Changes in out-degree is not as eventful as in-degree. CDF set up OAuth and Twitter4J is available at [9]. To gather changes could be the result of the following: numerical data [1] used Breadth First Search to create a 1. Newly registered users that initially follow a great snapshot of Twitter’s network. The snapshot, represented as a number of people fine-tunes their following. tree, included 14,148 nodes [1]. 2. Highly followed people begins to follow their subscribers. A. Numbers 3. Bots collecting data. Using four different captures between April 11th 2009 to April 29th 2009, the following descriptive measures were Following Capture 1 Capture 2 Capture 3 Capture 4 captured: messages posted, in-degree (followers), out-degree ratio (following), following ratio, clustering coefficient, Max 884,631 183,579 183,579 211,879 betweenness, closeness, network density and strongly connected components. Tuetle [1] reported the following data: Mean 194 194 197 215

Min 0 0 0 0 29 6

Laboratory, HP Labs, First Monday, Volume 14, Number 1 - 5 January Approximately 50% of users have the same in and out 2009 [3] Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng, "Why We degree. The following ratio’s proportion did not experience Twitter: Understanding Microblogging Usage and Communities", any major changes over time. Changes in in-degree (also Procedings of the Joint 9th WEBKDD and 1st SNA-KDD Workshop affecting the following ratio) are more likely the result of a 2007 [4] Danah M. Boyd and Nicole B. Ellison, "Social Network Sites: certain user experiencing a follower increase, rather than one Definition, History and Scholarship", Journal of Computer-Mediated user increasing their following count drastically. Communication, 13 (1), article 11 The clustering coefficient was only recorded during a one [5] Courtenay Honeycutt and Susan C. Herring, "Beyond Microblogging: day span, with no change in value. Teutle [1] also notes a Conversation and Collaboration via Twitter", Proceedings of the Forty- Second Hawai’i International Conference on System Sciences possibility that of 45% of the users have a zero or an almost (HICSS-42), p. 1-10 non-existent out-degree. Users no longer using Twitter would [6] Pieter Noordhuis, Michiel Heijkoop and Alexander Lazovik, "Mining also affect this number. A user with a strong clustering Twitter in the Cloud: A Cast Study", 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD), p. 107-114 coefficient “deploy stronger communities that exchange [7] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, messages among them.” Compared to users with a lower "Introduction to Information Retrieval", Cambridge University Press. clustering coefficient, they will receive network updates faster. 2008, chapter 21 The betweenness CDF in [1] snapshot indicated that almost [8] Erick Schonfeld, “Costolo: Twitter Now Has 190 Million Users Tweeting 65 Million Times A Day”, URL: http://techcrunch.com/2010/06/08/ 20% of users have small in and out-degree values, playing an twitter-190-million-users/, retrieved 2nd March 2011 irrelevant part in the network. Those with a high betweenness [9] Twitter4J, URL: http://twitter4j.org/en/index.html, retrieved 2nd March value usually have a following ratio close to 1, i.e. they both 2011 [10] Bill Venners, “Twitter on Scala”, URL: http://www.artima.com/ follow other users and have subscribers. scalazine/articles/twitter_on_scala.html, retrieved 2nd March 2011 The closeness value was between 0.22 and 0.3 for 70% of [11] Catherine Dwyer, Starr R. Hiltz and Katia Passerini, “Trust and privacy the users in the study. Re-tweets or trending topics have a concern within social networking sites: A comparison of Facebook and higher tendency of being broadcast in a rapid fashion for users MySpace”, URL: http://csis.pace.edu/~dwyer/research/ DwyerAMCIS2007.pdf, retrieved 2nd March 2011 with a high closeness value. [12] Tom Jagatic, Nathaniel Johnson, Markus Jakobsson and Filippo On a day-to-day basis, the network’s strongly connected Menczer, "Social phishing", URL: http://www.indiana.edu/~phishing/ communities saw a very subtle change. This should be social-network-experiment/phishing-preprint.pdf, retrieved 2nd March 2011 interpreted in such a way that communities are not easily and [13] Sören Preibusch, Bettina Hoser, Seda Gürses and Bettina Berendt, quickly built. The network density also didn’t experience a "Ubiquitous social networks – opportunities and challenges for privacy- radical change in value, as saturation sets in to people’s aware user modelling", URL: http://vasarely.wiwi.hu-berlin.de/ subscriber count. As Twitter keeps on growing however, this DM.UM07/Proceedings/05-Preibusch.pdf, retrieved 2nd March 2011 [14] Matthew J. Hodge, "The Fourth Amendment and privacy issues on the value will likely see a small decrease. ‘‘new’’ Internet: Facebook.com and MySpace.com", http:// www.law.siu.edu/research/31fallpdf/fourthamendment.pdf, retrieved 2nd March 2011 [15] Bonnie A. Nardi, Diane J. Schiano, Michelle Gumbrecht and Luke VII. CONCLUSION Swartz, "Why We Blog", URL: http://www.darrouzet-nardi.net/bonnie/ There’s no denying that SNSs are an important part of pdf/Nardi_why_we_blog.pdf, retrieved 2nd March 2011 today’s society and social online interaction. Their user numbers alone indicate their status. With such a vast group of people using them, SNSs contribute and provide a familiarity to not only their UI, but functionality as well. This familiarity will most likely dictate and shape the future of the Internet. This paper presented an overview of key terms in online social networks, its history, security concerns, algorithms used to determine user intention, descriptive measures for analyzing social graphs as well as network properties, while focusing in on Twitter as our prime example of a rapidly growing SNS. Twitter is an ideal example because its idea is fairly simple and analyzing techniques work well with its structure. Studies have shown that Twitter might also be a good platform for online research collaboration. This would bring in another element to SNSs, as it would not only be the platform for keeping up to date with friends, various news sites and celebrities, but to researchers and collaborators as well.

REFERENCES [1] Abraham Ronel Martínez Teutle, “Twitter: Network Properties Analysis”, Communications and Computer (CONIELECOMP), 2010 20th International Conference on Electronics, p. 180-186 [2] Bernardo A. Huberman, Daniel M. Romero and Fang Wu, “Social networks that matter: Twitter under the microscope”, Social Computing

Host Identity Protocol

Network Software

Haider Raza 34160 Åbo Akademi 2011

Abstract:-

This essay will cover the main aspects of Host identity protocol. This essay will cover the topics related to the Host identity protocol e.g. the structure of it, the security measurements, Multihoming and comparison of host identity protocol with other protocols. This essay will tell why the HIP is needed for secure communication and why it should be implemented in current internet environment. The essay covers some advantages of Host identity protocol and its usage in wireless networks.

Table of contents:-

Introduction

Structure

Security

Multihoming

Comparison

Summary

References

Introduction:-

Nowadays, there are two namespaces used in the internet which are provided by the Internet Protocol and Domain Name Service. The role of these namespaces is to provide essential packet transport infrastructure and services. The IP addresses provide both the location name and the networking interface name. Hence IP addresses are responsible for the identification of the hosts and for delivering the data packets safely to their destination.

The DNS name space is used for assigning the DNS names to IP addresses. The DNS names can be easily acquired from the IP addresses.

There are several problems that occur in relation to these namespaces. They can be overloaded by adding up more functionality extensions such as WWW protocol, dynamic re-addressing is not possible as the IP address works both as identifier and locator, The DNS names and the IP addresses are public and registered, hence insecure.

In order to address these problems, a new namespace has been introduced which provide a solution for the above mentioned problems, HOST IDENTITY PROTOCOL namespace was introduced. HIP was placed in between the transport layer and the networking layer.

The Host Identity Protocol separates the location and host identity location. It is used for host identification in internet protocol networks on the internet. The host identity protocol provides a new cryptographic namespace. It accommodates the security, mobility and multihoming. The IPv4 and IPv6 does not work together it provides the interoperability between the IPv4 and IPv6. A concept of host identifiers is used in host identity protocol.

We will describe some more aspects of HIP Protocol in next sections like Structure of HIP, Security of HIP, comparison of HIP with other protocols and Multihoming in HIP.

STRUCTURE:-

HIP protocol stack consist of HIP layer which has been inserted in between the transport layer and network layer. Like in old IP protocol stack the transport layer is accessed by the standard sockets and it remain unchanged.

The host identifier will initialized the sockets instead of IP Address in this new stack. The host identifiers will be translated to the IP addresses locally when the transport layer gets interacted with the HIP layer and finds the correct IP route and send it to the network layer.

Two security Associations (SA) will be created by using the IPSec in transport mode when the HIP layer orders the network layer to start the session. After the establishment of Security Association they will control the rest of the communication.

The figure below shows the architecture of the HIP protocol stack.

SECURITY:-

The host identity is a public key and proves the ownership by using the private key. It is used for setting up HIP association and host authentication in Host identity protocol. The public key cryptography involves the use of the asymmetric algorithm. The public key cryptography includes the digital signatures.

In Host Identity Protocol the traffic is secured with the IPsec Encapsulating security payload (ESP). The purpose of the ESP is to keep different Security Associations distinct by using security parameter index field in the header of ESP. During the HIP Base Exchange ESP SA is established and these ESP SAs are bound to the Host Identity Tags (HIT). The HIT is produced by taking a cryptographic hash of the public key within the Host Identity. Host Identity Tag is represented as 128 or 64 bit Host Identity.

MULTIHOMING:-

The Host Identity Protocol is divided in to two parts Host mobility and Multi homing. Host mobility is phenomena in which a host is changing its location inside a network using different access technologies or between the IP addresses.

The HIP host Multihoming is a technique in which there are several communication paths are available for HIP host. The HIP host has a several options to deliver packets to a peer node as it has several IP addresses build on different Interface which are linked with different access networks.

The HIP host has to inform to its peer nodes that it has multiple IP addresses that are representing its location in network. The LOCATOR parameter has to be introduced in HIP protocol to contain all the IP addresses that can be reached.

COMPARISON:-

In this paragraph there is a comparison of HIP with two other protocols, MIP and SHIM6. First we will define both MIP and SHIM6 before comparing them with HIP.

MIP

The purpose of the MIP is to provide the location-independent routing of IP datagrams. It provides the mechanism for the efficient roaming in the internet. By using this protocol the nodes can change their point of attachment without changing the home IP addresses. It allows maintaining the connection of transport and higher layer connections while roaming.

SHIM6

It is a functional module at layer 3. Upper layer identifiers are translated into currently active forwarding layer locators by SHIM6 module. In order to signal the SHIM6 context it use the IPv6 end to end header. There are some control elements in the SHIM6 initial handshake (4-way) and locator set exchange, these are, locator list updates, explicit locator switch request, keepalive, reachability probe exchange, No-Context error exchange.

Below is the comparison among the HIP, MIP and SHIM6. [4]

SUMMARY:-

In this essay we tried to give an overview of the Host identity protocol, by discussing several aspects related to its architecture, security, multihoming, and made some comparisons of Host Identity Protocol (HIP) to two other protocols MIP and SHIM6.

HIP gives good support for the simultaneously host mobility. It provides both network security and data security. It provides good support for Multihoming scenarios.

REFERENCES:-

1. Patrik Salmela, “Host Identity Protocol”, M2NM, Sydney, 17 October 2007

2. Pekka Nikander, “Host Identity Protocol”, Ericsson Research Nomadic lab and Helsinki Institute for Information Technology. (Date not available)

3. Henning Schulzrinne , “HIP and identifiers”. (Date not available)

4. Dave Thaler , A Comparison of Mobility-Related Protocols: MIP6,SHIM6, and HIP, draft-thaler-mobility-comparison-01.txt. (Date not available)

5. Lars jessen roost, Per Nesager Toft, Gustav Haraldsson, “The Host Identity Protocol An Experimental Evaluation” CommunicationNetworks2005, Spring 2005, AALBORG UNIVERSITY.

6. Teemu Koponen Janne Lindqvist Niklas Karlsson, Essi Vehmersalo Miika Komu Mika Kousa, Dmitry Korzun Andrei Gurtov, “Overview and Comparison Criteria for the Host Identity Protocol and Related Technologies” 22.11.2005, Helsinki Institute for Information Technology, Helsinki University of Technology.

> Only for School Course Network Software Use < 1

Energy-aware Networking

Guopeng Yu

Åbo Akademi University Department of Information Systems

It has been found that the networking devices such as data center, sensors have been consuming an enormous amount of energy, and this is one of the reasons that cause the IT industry today weak in ROI. Given the facts that the IT infrastructure has been experiencing rising operational costs, many researchers and professionals are calling for introducing the energy-awareness in the operation and design of network. In this paper, we focus on characterizing the terminology of 'energy-aware networking' by adopting the experiments' results from the HP labs and Shah et. al (Energy Aware Routing for Low Energy Ad Hoe Senor Networks). Additionally, we found that the energy-aware networking can be achieved by adopting different algorithms.

Key Words—energy-aware network, senor, algorithms,

[5] process and their results, in section 4, we provide the I. INTRODUCTION answers to the research questions, and section 5 is reserved for HE DIMINISHING of fossil resources drives energy conclusion. T efficiency to become the key issue that rises in both our daily life and all the industries including information II. RESEARCH METHODOLOGY AND RESEARCH QUESTIONS technology. People start to be aware of the importance of Research methodology energy resources. Additionally, researchers have been marking efforts to provide solutions to save the usage of energy. For Secondary data is adopted in this paper. According to instance, for our daily life, Loove Broms et. al [1], the Frankfort-Nachmias and Nachmias [6], secondary data Swedish designers created 'Energy Aware Clock' with the analysis refers to research findings based on data collected by main purpose to alert people how much energy their home others. Bishop [7] states that 'in the secondary data analysis of uses throughout a whole day. The clock is designed with 24 qualitative data, good documentation cannot be hour circular pattern to measure the use of energy. While the underestimated as it provides necessary back ground and pattern becomes larger, it alerts people more energy is used. much needed context both which make re-use a more In the IT industry, with the advent of the Cloud Computing worthwhile and systematic endeavor'. Additionally, a research model, large data centers are built to provide a great number finding will gain more credibility if it appears in a number of of services access over the Internet or enterprise networks. For studies. The primary data can be compared with the data instance, the 10 year electricity cost for each kW of IT load in collected in earlier studies providing a follow-up. Adopting data center is maximally $ 20,000 [2]. In Germany [3], the secondary data analysis in this paper can enhance information communication technology consumes 55.4TWh, measurement, thereby gaining new insights into the and this is nearly 10.5% of the complete energy consumption terminology of energy-aware networking. Additionally, with of the nation. Hence, it is far more important to make the secondary data analysis, the samples size, the number of networking more energy efficient in both enterprises' networks representativeness of the samples, and the number of and data centers. observations can be enlarged, and this can lead to more Additionally, there has been a rise of interest in building encompassing generalizations in the analysis of energy-aware and deploying sensors networks [4], and there are dense network operations. wireless networks of heterogeneous nodes collecting and Research questions disseminating environment data. These sensor networks are Given the facts this paper is majorly used for the course based on small, lightweight, low-cost network elements, arrangement, we primarily aim at exploring two research named PicoNodes, and they must adopt ultra low power to questions: eliminate frequent battery replacement.  What is energy-aware networking in the context of In this paper, by discussing two research papers from HP data center and senor networks? Labs [5] and Shah [4], we load our aim at exploring what  How the energy-aware networking is used in the energy aware networking is and how it is used in networking network operations? operations. The structure of the paper is arranged as follows: in section III. ENERGY AWARE NETWORK OPERATIONS AND ENERGY AWARE ROUTING FOR LOW ENERGY AD HOC SENOR NETWORKS 2, we present our research methodology, and research questions, in section 3, we discuss about the two research [4] The two studies [4] [5] are chosen due to the reasons that they represent the basic use of energy-aware networks in IT Manuscript received April, 2011 (date on which paper was submitted for review). Corresponding author: Guopeng Yu (e-mail: [email protected]). industry. In the following part, we are going to discuss how

> Only for School Course Network Software Use < 2 the energy awareness in these two research studies is adopted separately . A. Energy Aware Network Operations The research [5] is carried out in the HP Labs. Researchers have found that the networking devices nowadays occupies 15% of a data center's entire energy consumption. However, little attention has been paid to ensure the networks in enterprise and data center energy efficient. Additionally, the majority device deployed today are being far away from energy proportional and there is limited packs of knobs to control the energy consumption. Therefore, the researches in the HP Labs provide some energy saving algorithms for the networking components in enterprise and data center networks to enable the adoption of network wide energy saving schemes. In order to figure out various control knobs that can be disabled to save networking power, the research group conducted a detailed power instrumentation study in their own (Figure 2: Data center architecture) data center. The following figures shows the power As presented in the pictures, every single server holds two consumption for the enterprise switches: see figure 1:, network interface cards which are connected to different witches in the rack, named rack switches. However, in any given time, only one rack switch is in charge of forwarding all the traffic to and from every single server. Under this circumstance, the server ensures load balancing among all rack switches. 'At the next level, each rack switch is joint via a trunk of four 1 Gbps links each to two tire-2 switches. Hence, these tier-2 switches connected to tier 3 that perform both L2 forwarding and L3 routing.' This typology, according to the research group, is typical of most data center, though there will be differences among the exact link bandwidths.

(Figure 1: Power consumption for enterprise switches [5] ) By adopting the energy aware job-allocation algorithms, the Here, powerchassis is the power used by the switch's research group declare that networks can be designed more chassis; powerlinecard is the linecard's power without active energy efficient. Additionally, before assigning different kinds ports; and numlinecards is the number of the cards that have of services for a job, the placement algorithms take been plugged into the switch. Variable configs in the responsible for network traffic specifications of the job, the summation indicates the possible configurations for any single utilization of the current network. The three energy saving port's speed, which typically could be 10 Mbps, 100 Mbps, or schemes can be summarized as follows: 1 Gbps.  Link State Adaption (LSA): each link can operate in Given the nature of the enterprise networks, they should be total in four stages, namely, disabled, 10Mbps, 100 always on. Therefore, the network topologies are created to be Mbps, and 1 Gbps. This scheme merely makes sure the over provisioned and highly redundant. The research group traffic can be accommodate without regard for provide a picture (see Figure 2) that represents the architecture performance and availability. of the data center in their lab, and they name it 1-redundant  Network Traffic Consolidation (NTC): a traffic tree as the over redundancy that it holds at each level of the engineering approach to route traffic can be adopted , tree. so that the traffic is transmitted on fewer links, while some of non utilized links and switches can be disabled. This will reduce energy consumption significantly as it removes redundancy in the network. For instance, ' the 1-redundancy at every reduce to 0, while exactly 1 switch can be operational for a rack, with all the servers on a rack transmitting their traffic to the one operational switch while the other wsitch can be turned off.'  Server Load Consolidation (SLC): an indirect way to consolidate network traffic on to fewer links and allow the controller to turn off non-utilized ports and switches. Therefore, fewer numbers of servers now are > Only for School Course Network Software Use < 3

being used on the condition that server resources such as CPU and memory are adequate to handle the assigned jobs. The three scheme above achieve energy savings and enable the cost efficient. For instance, while a link with capacity 1 Gbps needs 5 Mbps traffic to flow through it, rata using this link from 1 Gbps to 10 Mbps will enable the server save energy, though this will lead to increase in latency as the queuing delay. However, a constraint can be added to make sure that a link's utilization never exceeds a certain scope while using this rate. Hence, Service Level awareness can be (Figure 4: Comparisons of availability and power consumed incorporated by giving more constraints to make sure that a for various schemes ) minimum performance is achieved, namely, SL-aware LSA, SL-aware NTC and SL-aware SLC schemes. For example, a B. Energy Aware Routing for Low Energy Ad Hoc Sensor SL-aware LSA enables each link's delay is being kept under a Networks threshold if it incorporates performance guarantees, and this Research done by Shah and Rabaey [5] aimed at proposing ensures that the redundancy is the system to counter failure. a new scheme, energy aware routing, that uses sub-optimal Additionally, this level of awareness can be also applied to paths to provide substantial obtains. The research majorly NTC and SLC schemes as well. focuses on the sensor networks, and the uses of these sensor To test the practical of these schemes, the research group networks can be found in environmental control of office examine the effects of the algorithms a real web 2.0 workload buildings, robot control, interactive toys, and etc. However, to from an operational data center that hosts an e-commerce achieve the success these networks, the PicoNodes are needed, application. In the testing, they use system activity reporter which means the sensors must on the condition that they are toolkit which is ready to use on Linux to observe the CPU, small, lightweight, low cost in network elements. Additionally, memory and network statistics which include the amount of these nodes must be no more than 100 grams in weight, and bytes transmitted and obtained from 292 servers. According to smaller than 1 cubic centimeter, and the price cannot be their findings, 16% network energy savings can be achieved exceeded 1 dollar. What is more importantly, these nodes need with no performance penalty and slight decrease in system to be ultra low power, thus the frequent batter battery availability. By using traffic management and server workload replacement can be eliminated. consolidation the data center, 75% significant additional The research heavy on the performance of routing protocol saving can be achieved. Additionally, they gather traffic data performance, they find a very useful metric for it, named from data center using SNMP or other tools. Therefore, they network survivability. The network survivability means the could compute the utilization of each port on all the switches. protocol need to make sure the connectivity in the network is The controller communicates with all the switches and maintained for as long as it could be, and the condition of the performs actions, for instance, turning off unused switches, network health of the entire network must be of the same order. disabling unused ports and adapting link capacity. The According to their findings, the communication in the sensor research results are also vividly showed in Figure 3 and Figure networks should be use different path at different times but a 4 below: single path to makes any of the single path is not depleted. In addition, this makes the quick respond to the nodes which move in and out of the network and less routing overhead. Shah and Rabaey focus on three main layers in designing the PicoNode, which are, physical, media access control and network layers. 1. Physical Layer: a physical link between two radios needs to be created for the communications between two nodes. The physical layer takes responsibility for the communication across the physical link, which includes the modulating and coding the data. 2. Media Access Control (MAC) Layer: the functionality of the MAC layer lies in the access control, channel assignment, neighbor list management and power control. A location sub systems is embedded in this (Figure 3: Average power consumed by adopting different layer to compute the x, y, z , coordinates referred to schemes, and in the SL-aware version of the three schemes, the signal strength of neighboring nodes they receive. each link's utilization is under 70%) In the process of MAC coordinating, each node will obtain a locally unique channel for transmission, and these channels are globally reused. By sending a > Only for School Course Network Software Use < 4

wake-up signal on the broadcast channel, the MAC IV. ANSWERS TO THE RESEARCH QUESTIONS can send data as well. Additionally, the layer of MAC What is the energy-aware networking in the context of data also reserves a records of its neighbors and metrics center and sensor networks? which contains the positions of the neighbors and In the data center networks, energy saving algorithms are energy needed to reach it. adopted for the networking components in enterprise and data 3. Network layer: the main functions of the network center networks to enable the adoption of network wide layers lies in the routing and addressing of the nodes. energy saving schemes. In this context, energy- aware In the sensor networks, nodes can be addressed networking is implementing placement algorithms to take according to the geographical position. This responsible for network traffic specifications of the job, the guarantees the routing protocol to direct the utilization of the current network to ensure the network has communication in the right place. For the PicoRadio, low cost, and low energy consuming. Shah and Rabaey adopt class-based addressing, and In the sensor networks, energy aware routing uses sub- these addresses are in the form of location, node type, optimal paths to provide substantial obtains. To achieve the and node subtype. The location indicates the point or success these networks, the PicoNodes are needed, which region in particular space; node type chooses the nodes means the sensors must on the condition that they are small, that is needed; while the node subtype narrows down lightweight, low cost in network elements. Additionally, these the scope of the address. nodes must be no more than 100 grams in weight, and smaller than 1 cubic centimeter, and the price cannot be exceeded 1 As the protocols are one of the emphasis of the research, dollar. What is more importantly, these nodes need to be ultra Shah and Rabaey categorize the protocols into two types, low power, thus the frequent batter battery replacement can be namely, proactive and reactive protocols. The proactive eliminated. Therefore, the energy-aware networking means routing protocols focus on maintaining the consistent up-to- use technical methods to reduce the energy difference between date routing information from every single node to every other different nodes. node in the network, and Destination-Sequenced Distance- How energy-aware networking is used in the network Vector Routing [9] protocol is one of the typical examples. operations? The reactive routing protocols create the route only when there In the testing of three algorithms, research use system is need for, and this can be source initiated or destination activity reporter toolkit which is ready to use on Linux to initiated. The Ad-Hoc On-Demand Distance Vector Routing observe the CPU, memory and network statistics which [10] can be one of the excellent example of the reactive include the amount of bytes transmitted and obtained from 292 routing protocols. servers. After the testing, 16% network energy savings can be As introduced previously, the main aims of the research achieved with no performance penalty and slight decrease in done by Shah and Rabaey is to propose a new protocol, which system availability. By using traffic management and server is named energy aware routing, to enhance the survivability of workload consolidation the data center, 75% significant the networks. According to the definition provided by the additional saving can be achieved. Additionally, they gather authors of this paper, the energy aware routing is also a traffic data from data center using SNMP or other tools. reactive routing protocol and a destination-initiated protocol. Therefore, they could compute the utilization of each port on In the protocol, the consumer of the data initiates the needs of all the switches. The controller communicates with all the the route and maintains the route simultaneously. The protocol switches and performs actions, for instance, turning off unused has three phases: switches, disabling unused ports and adapting link capacity. 1. Setup phases: also named interest propagation, explore In the sensor network, the simulations were conducted in all the routes include source to destination and also the the Opnet. In a typical office setup where are 76 nodes, 65 cost of the energy. It happens while the routing or sensors and 7 controllers that were static and 4 nodes were interests tables are established. mobile. The results of the simulations indicates energy aware 2. Data communication phases: also named data routing reduces the average energy consumption per node propagation, by adopting the information from earlier from 14,99 mJ to 11.76 mJ, and with an improvement of phase, the data is sent from source to destination, and 21,5 %. Additionally, it reduced the energy difference between this is based on the calculation of the energy cost different nodes. 3. Route maintenance: to keep all the path alive, the localized flooding is conducted infrequently. V. CONCLUSIONS The simulations of the new protocol network were Research in this paper has found the terminology of conducted in the Opnet. In a typical office setup where are 76 energy-aware networking and how it is used in the real nodes, 65 sensors and 7 controllers that were static and 4 networking operations. Both of the studies we introduced nodes were mobile. The results of the simulations indicates emphasize the importance of network survivability, and this is energy aware routing reduces the average energy consumption the key criterion for deciding the key efficacy of networking. per node from 14,99 mJ to 11.76 mJ, and with an While designing the schemes, both studies use technological improvement of 21,5 %. Additionally, it reduced the energy approaches to make the findings logical. The networks are difference between different nodes. > Only for School Course Network Software Use < 5 constituting the important of the IT infrastructure, and this also can be found in these two studies that energy efficiency has become a high priority objective in most IT operational environment. Last but not least, the enterprises are always expecting low cost and long term benefits in return, it is reasonable and necessary for them to check the protocols to ensure they are aware of the network performance, as redundancy always trades off for energy savings.

REFERENCES [1]Doug Osborne, Energy Aware Clock Concept displays home energy use in style. Retrieved on 14th May, 2011, from http://www.geek.com/articles/gadgets/energy-aware-clock-concept-displays- home-energy-use-in-style-20090910/ [2] Neil Rasmussen, Implementing Energy Efficient Data Centers. White Paper # 114, APC, Legendary Reliability [3] L. Stobbe, N. Nissen, M. Proske et al., Abschätzung des Energiebedarfs der weiteren Entwicklung der Informations-gesellschaft, Survey, Fraunhofer IZM and ISI, 2009. [4] Energy Aware Routing for Low Energy Ad Hoc Sensor Networks. This research was supported by DARPA on grant no. F29601-99-1-0169 entitled, "Communication/Computation Piconodes for Sensor Networks” [5] Priya Mahadevan, Puneet Sharma, Sujata Banerjee, Parthasarathy Ranganathan, Energy Aware Network Operations, HP Labs [6] Frankfort- Nachmias. Research Methods in the Social Sciences. The Fourth Edition pp151-152 [7] Bishop, L. (May 2007) 'A reflexive account of reusing qualitative data: beyond primary/secondary dualism', Sociological Research Online [Online], Special Section on Reusing Qualitative Data, 12(3) http://www.socresonline.org.uk/12/3/2.html [8] G. Chen,W. He, J. Liu, S. Nath, L. Rigas, L. Xiao, and F. Zhao, “Energy- aware server provisioning and load dispatching for connection-intensive internet services,” in Proceedings Of NSDI, April 2008. [9] C. E. Perkins and P. Bhagwat, “Highly dynamic destination sequenced distance vector routing (DSDV) for mobile computers”, Comp. Commun.Rev., Oct. 1994, pp. 234-244. [10] C. Perkins and E. Royer, “Ad hoc On demand distance vector routing”, Proc. 2nd IEEE Wksp Mobile Comp. Sys. & Apps., Feb 199 called passive tag which means it has no power Microchip implantations in source of it´s own, but it reacts when reacted on. animals By: Kasper Välimäki, 23.5.2011 II

The technology behind microchip implants Table of contents: are widely used in pet identification around the world. RFID technology is spreading to 1. Introduction other uses as well, for example buss-tickets 2. Microchip technology, uses and components could be microchips implanted into your 2.1. Uses wallet. The uses are virtually limitless. 2.2. RFID technology

2.3. Worldwide use 2.4. Components of the chip and implant Uses for RFID tags or radio-frequency locations identification tags are virtually limitless. From 3. Chip standards and scanners grocery products all the way to Las Vegas 3.1. Standard issues casino chips and paying for your buss-ride is 3.2. microchip scanners the range were they are, and more 4. Conclusion 5. List of references importantly could be used. They are also used more and more by pet-owners around the world. But why go through all that trouble? Microchip pet identification is often I used in cases where collars and identification tags have been lost. For example: cats often Abstract: lose their collars and the implanted microchip is a device that can't be lost. A lot This paper is about microchip implantations of breeders and private persons and families in animals and the technology behind them. have started having microchips implanted in Microchip implantations inside animals are their puppies and kittens before they come mainly used for identification purposes, and are a very effective way of recognizing a pet or other to new homes as the primary mean of animal and instantly knowing of any illnesses or identification. Also at airports, harbors and other medical conditions. The chip will also tell borders the microchips can save you from a you the owners name, address and insurance lot of trouble since the authorities there can policies of the animal. The microchips use a scan you pets chip, which contains it´s race, technology called RFID technology, which is short illnesses, allergies, and of course basic for Radio-frequency identification. It identification. communicates via radio waves, and the information can be accessed with a microchip The technology behind the chip is not too scanner. The microchip, also known as “RFID- complicated. The basic idea is that you have a tag”, used in the implantation purpose is a so RFID tag (in this case an implantable one) which uses communication via radio waves to a capacitor. The silicone chip itself contains exchange data between a reader and an the identification numbers and electronic electronic tag attached to an object (e.g. circuits that passes the information on to the implanted in your puppy), for the purpose of scanner. The inductor is basically an antenna identification and tracking. There are three that receives electric power from the scanner types of RFID tags: passive RFID tags, active used. The capacitor and the inductor acts RFID tags and battery assisted passive (BAP) like a tuner from a radio forming a LC circuit. RFID tags. Passive RFID tags have no power source and require an external The scanner then presents an inductive field electromagnetic field to initiate a signal that enables the coil and charges the transmission. Active RFID tags contain capacitor, which in turn energizes and a battery and transmits signals once an powers the IC(inductor, capacitor). The IC external source ('Interrogator') has been then transmits the data via the coil to the successfully identified. Battery assisted scanner. The components are made of a passive (BAP) RFID tags on the other hand special biocompatible glass that is made from requires an external source to wake them up soda , and is sealed to prevent any but have significantly higher forwarding moisture or fluid from getting to the chip and capability providing greater range for the is known for good compatibility with living scanner. RFID technology makes it possible to tissue. give each product in a grocery store an own one of a kind identifying number. You could III compare this to the fact that today grocery stores use bar codes, which is far more The greatest issue with the standards is the complicated to use since you have to actually multiple standards and their compatibility “show” the bar code to the reader. And it is with each other only possible to identify the brand and type In most countries, ID microchips used in pets of package using them. The thing is that RFID goes by an international standard, enabling tags can be read if passed within close wide compatibility between the chips and the enough range to an RFID tag scanner. Some scanners. But in the U.S. there are three RFID tags can be read from several meters types of chips competing for market share, as away. well as the international type. Scanner The implantable chips use the passive RFID models coming to the U.S. vets were able to read at best three of the four types. (These technology. The chip itself is about the size of "types" are also known as transmission a large rice grain and as told earlier they protocols or standards). Now there are contain no internal power source so they are scanners that are able to read all these four designed so that they do not act until acted different standards or types. The most upon. The chip is built of three basic recognized and accepted standard elements: A silicon chip (with an integrated internationally is the ISO circuit), a coil inductor (or a core of ferrite wrapped in copper wire), and Full Duplex. The ISO type is common in many IV countries including European countries since the late 1990s, and now widely used in The RFID technology is a very sophisticated Canada. It is one of two chip standard types science that, I believe, will replace the pet (along with the "Half Duplex" type that is identification as we know it (referring to a used in farm and ranch animals) which collar with a name tag) as well as the bar conform to International Organization for codes in grocery stores and buss-cards and Standardization (or ISO) standards 11784 and train tickets. My conclusion would have to be 11785. To support international application, that from any angel you look at it, the each of these chips contains either a smartest thing you can do for your pet is to manufacturer code or a country code along get it chipped right away, and we´ll just have with its identifying serial number. The other to wait for the rest to come to us. three standards used (mainly in the US) are Trovan Unique, FECAVA type (or Destron type) and AVID brand Friendchip type. These types are all good, working chip-standards but still the ISO standard is advised and is taking over the market.

Microchip scanners:

Microchip scanners are small and relatively cheap devices to read your pet´s RFID tag with. It's very simple to use a scanner, it´s just a matter of positioning the scanner close to the animal's fur, obviously where the chip is implanted, and pressing a button. The scanner then interrogates the chip to capture the identification number and then it simply displays that number on the LED screen. The passive ID tags used in implantatory chips gets their electricity from the scanner. Different scanners understand different standards, but most of them know as well as the international ISO standard also some combination of the three other standards.

V IX. Microchip Scanners, Pet Chip Company Ltd. 2010 I. Microchip implantation sites, 2011, WASAVA ( http://www.pet-detect.com/microchip- scanners.html) ( http://www.wsava.org/site1099.html)

II. Grain-sized microchip can be Fido's ticket home, Norma Bennet Woolf, 2011 (http://www.canismajor.com/dog/microchp. html)

III. Microchip implants to ID pets, Charla Dawson,2011

(http://www.suite101.com/content/microchi ppetidentification-a1328)

IV. Microchips Everywhere: a Future vision, Todd Lewan, 29.1.2008 (http://seattletimes.nwsource.com/html/busines stechnology/2004151388_apchippingamericaiii29 .html)

V. (http://en.wikipedia.org/wiki/Microchip_i mplant_(animal))

VI. Animal identification won´t be mandatory, Libby Quaid, 22.11.2006()

VII. Chip Implants Linked to Animal Tumors, Todd Lewan, 8.9.2007 (http://www.washingtonpost.com/wp- dyn/content/article/2007/09/08/AR2007090 800997_pf.html)

VIII. GPS Collars vs. Microchip Implants for Pets, Michele McDonough, 5.9.2009

http://www.brighthub.com/electronics/gps/ articles/43365.aspx