Universit`adegli Studi di Roma “Tor Vergata” Facolt`adi Ingegneria

Dottorato di Ricerca in Informatica ed Ingegneria dell’Automazione Ciclo XXIII

Open BAR A New Approach to Mobile And Restore

Vittorio Ottaviani

A.A. 2010/2011

Docente Guida/Tutor: Prof. Giuseppe F. Italiano Coordinatore: Prof. Daniel P. Bovet

to my parents because an example is worth a thousand words

Abstract

Smartphone owners use to save always more information, and more impor- tant data into the internal memory of their devices. Mobile devices are prone to be lost, stolen or broken; this causes the loss of all the information contained in it if these data are not backed up. While many solutions for making back- ups and restoring data are known for servers and desktops, mobile devices pose several challenges, mainly due to the plethora of devices, vendors, oper- ating systems and versions available in the mobile market. In this thesis, we propose a new backup and restores approach for mobile devices, which helps to reduce the effort in saving and restoring personal data and migrate from a device to another. Our approach is platform independent: in particular, we present some prototypes based on different mobile operating systems: Google Android, 5 and 6 and Symbian S60. The approach grants the security of the information backed up and restored using novel cryptographic techniques optimized for mobile. Another feature of our approach lies in the capability of offering additional services to the final user or to administrator of the system. As an example, for users, we provide a service enabling the shar- ing of information in mobile devices among a group of selected persons. This can be useful in many situations e.g., in creating a mobile business network among a group of people. For administrators we offer a social network extrac- tor which, starting from information contained into the smartphone and data publicly available on the web generates a social graph of the backup network. This can be useful in situations like creating teams into an enterprise.

i

Acknowledgements

During the years of my PhD several persons have passed into my life, some of these persons have leaved a sign that will never be deleted.

First of all I want to thank Pino: your way to approach things, always search- ing for the best, inspires me everyday; I learned some of the most important sessions of my life thanks to you.

I want to thank all the colleagues and friends who believed in me during hard times and who enjoyed with me successes; Emanuele, Cristina, Danilo and Paolo thank you guys for the support and for sharing with me your experience.

Special thanks go to Fabio and to Ermanno. . . I will not write another thesis to explain this thanks: each one of you knows. . .

Thanks to my family for your unconditioned love and trust in me. Words can- not fully express how important you are to me.

Finally thank you Ramona, you are my love, my best friend and the reason why every morning I wake up and do my best to be a better person. . .

iii

Table of Contents

1 Introduction 1 1.1 Motivation ...... 1 1.1.1 How much does data loss cost? ...... 2 1.1.2 Focusing on mobile ...... 5 1.2 Our solution ...... 9 1.3 Contributions ...... 10 1.4 Thesis Outline ...... 11

2 Backup & restore in the third millennium 13 2.1 Backup features ...... 14 2.1.1 Full backup ...... 15 2.1.2 Incremental backup ...... 15 2.1.3 Differential backup ...... 16 2.1.4 File-based vs. device-based ...... 17 2.1.5 Scheduled backup vs continuous data protection . . . . 18 2.1.6 Local backup vs. remote backup ...... 19 2.2 Mobile ...... 21 2.3 Local backup for mobile device ...... 21 2.4 Remote backup for mobile device ...... 22

3 Our approach to backup 25 3.1 A new approach to backup & restore ...... 26 3.1.1 Server ...... 28 3.1.2 Client ...... 29

v TABLE OF CONTENTS

3.2 Sharing backup data ...... 31 3.3 Social network analysis ...... 32 3.4 Security ...... 33

4 Data extraction 35 4.1 Forensic Style Approach ...... 37 4.1.1 Our methodology ...... 38 4.1.2 Symbian implementation ...... 39 4.1.3 Windows Mobile implementation ...... 41 4.1.4 Some remarks on this approach ...... 47 4.2 Selection of interesting data ...... 49 4.2.1 Symbian ...... 51 4.2.2 Android ...... 52 4.3 Performances ...... 53 4.4 Concluding remarks ...... 53

5 Data elaboration 55 5.1 Remote elaboration ...... 57 5.2 Our step-by-step Methodology ...... 59 5.2.1 Stage 0: Choice of the objective ...... 62 5.2.2 Stage 1: Files of interest identification ...... 62 5.2.3 Stage 2: Data hypotheses and entities injection ...... 64 5.2.4 Stage 3: Sequences similarity discovery ...... 67 5.2.5 Stage 4: Data interpretation ...... 68 5.2.6 Stage 5: Meta-format building ...... 70 5.2.7 Stage 6: Error correction ...... 72 5.2.8 Stage 7: Parser building ...... 74

vi TABLE OF CONTENTS

5.2.9 Stage 8: Testing and debugging ...... 74 5.3 Remote elaboration results ...... 75 5.4 Local elaboration ...... 77

6 Protecting saved data 81 6.1 Key agreement algorithm ...... 82 6.1.1 Mathematical setting: key agreement protocol ...... 83 6.1.2 J2ME implementation ...... 85 6.1.3 Performance testing methodology ...... 87 6.1.4 Performance evaluation ...... 89 6.1.5 Experimental results ...... 91 6.1.6 Concluding remarks ...... 93 6.2 Encryption algorithm ...... 93 6.2.1 Performances ...... 94 6.2.2 Statistically testing QP-DYN and RC4 ...... 98 6.3 Protecting inter process communication ...... 100 6.3.1 State of the art ...... 101 6.3.2 The framework ...... 103 6.3.3 The framework implementation ...... 108 6.3.4 On a real device ...... 112

7 Value added services on backup data 115 7.1 Sharing backup data with closed groups ...... 116 7.1.1 Social backup in business environment ...... 116 7.1.2 Sharing conference data ...... 117 7.1.3 Shared backup for smartphone ...... 118 7.1.4 Running the application ...... 119

vii TABLE OF CONTENTS

7.2 Extracting social network ...... 120 7.2.1 Introduction ...... 121 7.2.2 Related work ...... 122 7.2.3 Smartphone Data Analysis (SDA) ...... 124 7.2.4 Web Data Analysis (WDA) ...... 126 7.2.5 Clustering Analysis (CA) ...... 129 7.2.6 The Final Result: The Social Network ...... 132 7.3 Conclusions ...... 133

8 Conclusions and Future Work 135

A The Symbian S60 format 139 A.1 Address book ...... 139 A.2 Calendar ...... 141 A.3 Events log ...... 147 A.4 SMS ...... 150

B The Backup communication protocol 157 B.1 Backup item ...... 157 B.2 Contact item ...... 158 B.3 Calendar item ...... 159 B.4 Message item ...... 160 B.5 Generic file item ...... 161 B.6 Setting item ...... 162 B.7 List methods ...... 162 B.8 Restore ...... 164 B.8.1 Listing items on the server ...... 164

viii TABLE OF CONTENTS

B.8.2 Choosing data to be restored ...... 164

C The Sharing communication protocol 167 C.1 Sharing methods ...... 167 C.1.1 Item listing ...... 167 C.1.2 Share a item ...... 168 C.1.3 Location based sharing ...... 169 C.1.4 Listing shared data ...... 170 C.2 Groups methods ...... 172 C.2.1 Creating group ...... 172 C.2.2 Listing groups ...... 172 C.2.3 Handling invitations ...... 173

Bibliography 189

ix

List of Figures

1.1 Costs of data loss per industry sector (values are in million $ per year) ...... 3 1.2 Smartphone and PC sales prevision in Million of units ...... 5 1.3 2007 - 2010 trend mobile operating systems market share. . . . .6 1.4 Mobile cellular, subscriptions per 100 people, 2009...... 7

3.1 Backup and Restore system architecture...... 27 3.2 Example of data model for a contact...... 28 3.3 Example of a request of a contact...... 29 3.4 Example of client server interactions...... 30

4.1 Data collection workflow ...... 39 4.2 Windows Mobile 5.0 memory architecture...... 42 4.3 (a) Symbian S60 tool’s screenshot, (b) Windows Mobile tool’s screenshot...... 48

5.1 The methodology flow ...... 59 5.2 The format of the Ω operations sequence. In this figure is shown an example with contacts discovery as objective ...... 64 5.3 These figures show an example of a DBMS binary file before and after the Stage 3. In (a) the sample file after making pairs of calls of the same duration (Stage 2). In (b) equal sequences highlighted. In (c) the formatted file Φˆ0 ...... 68 5.4 This three figures depict an example of the application of Stage 5 on a file containing the phone’s address book...... 71

xi LIST OF FIGURES

5.5 The architecture of the backup server ...... 78

6.1 Key Agreement process using conjugate...... 84 6.2 Public data and Key Agreement generation time: all tests . . . . 89 6.3 Public data and Key Agreement generation time: results with an upper bound of 1 sec...... 90 6.4 Overall encryption and decryption time comparison between (sizes in bytes) (a) RC4 512-bit and QP4, (b) RC4 768-bit and QP5, (c) RC4 1024-bit and QP6...... 95 6.5 Overall encryption and decryption time comparison between AES CFB 256-bit and QP3 (sizes in bytes)...... 96 6.6 Overall encryption and decryption time comparison between AES 256-bit and QP3 (sizes in bytes)...... 97 6.7 Mutual Authentication phase...... 105 6.8 Session Authentication phase...... 106 6.9 Session Encryption phase...... 107 6.10 SAVED framework main packages...... 109

7.1 Use case of meeting backup and share...... 117 7.2 Android Backup and Restore client...... 118 7.3 Android Backup and Restore client...... 119 7.4 The graph representation of contacts (a) and their relationships with the phone’s owner (b), which are revealed by the number of calls and

number of SMS/MMS. In (c) is shown the graph after the execution of SESORR; the edges represent the relationships extracted from the Web (web-edges)...... 125 7.5 Frequency distribution of URLs (domains) providing relationships. .. 128

xii LIST OF FIGURES

7.6 Contact-to-cluster assignment...... 130 7.7 Clustering metrics trends. The profile graph, used in the example, has 218 contacts and 1242 Web edges; the black vertical line is relative to k = 10, the chosen value for the input parameter k...... 131 7.8 The final result of the whole process: the social network clusters. ... 132

B.1 Example of XML payload for a backup item...... 157 B.2 Example of XML payload for a contact item...... 158 B.3 Example of XML payload for a calendar item...... 159 B.4 Example of XML payload for a message item...... 160 B.5 Example of XML payload for a generic file item...... 161 B.6 Example of XML payload for a contact list response...... 163 B.7 Example of XML payload for a setting item...... 165 B.8 Restore method response...... 166

C.1 Example of XML payload for a list of items...... 167 C.2 Example of XML payload to share an item with a group. . . . . 168 C.3 Example of XML payload to share an item with a group using location...... 169 C.4 Example of XML payload for a list of items...... 170 C.5 Example of XML payload to create a group...... 172 C.6 Example of XML payload of a list of groups...... 173 C.7 Example of XML payload to invite users to a group...... 174 C.8 Example of XML payload of invitations received by the user. . . 175

xiii

List of Tables

1.1 Cost and causes of data loss ...... 2

2.1 Comparison of backup approaches ...... 16

4.1 Files generated during the Extraction Process ...... 40 4.2 Windows Mobile 5.0 relevant files ...... 47 4.3 Extraction tool consistency analisys ...... 48 4.4 Time overhead of the backup operation per data type ...... 53

5.1 Symbian files of interest ...... 75

6.1 Time used from algorithms to generate the secret to agree a SSK. 92 6.2 Time overhead for the framework phases...... 113

A.1 Possible values for the rows of table “DATA TYPE TABLE”. They describe the type of attributes present in the “DATA BLOCK”. (Symbian S60 v2) ...... 139 A.2 This table lists all contact’s data which can be found in the Con- tacts.cdb. Since data are located in three logical file areas, the table is split in three parts...... 153 A.3 This table lists all calendar entries such as Notes Meetings An- niversaries stored in the Calendar file...... 154 A.4 This table lists all event entries such as SMS, MMS, voice and data calls, SIM change...... 155 A.5 This table lists all fields characterizing an SMS...... 156

xv

Serva me, servabo te Save me and I will save you Petronius Arbiter 1 Introduction

1.1 Motivation

Backup is a crucial task, since hardware failures and software or human errors can lead to the loss of important information. In addition to failures, are even more important for devices such as laptops and smartphones, since they are more prone to loss or theft. Currently, smartphones are used more as handheld computers than as mobile phones, and consequently a lot of data is stored in those devices. This makes the need to keep data stored on those devices safe from losses more critical. In addition, the rapid technological evo- lution in mobile devices makes it more difficult to restore data saved from old devices to new ones. Thus, mobile devices pose new challenges in the backup and restore problem. Backing up data on external memory devices, such as on Secure Digital (SD) cards or on laptop disks, suffers from the same risks of failure or loss. Moreover, the growth of cloud services, and the capability of modern smart- phones to be always online without consuming too much power is pushing backup systems to save data on line using cloud services. Unfortunately the plethora of devices, operating systems and vendors available on the market

1 CHAPTER 1. INTRODUCTION

Cause Percent cost hardware or system failure 78% $9.36 billion human errors 11% $1.32 billion software corruptions 7% $0.84 billion natural disasters 1% $0.12 billion other 3% $0.36 billion

Table 1.1: Cost and causes of data loss

causes interoperability problems and often the user loses his/her information in case the device fails, is lost or stolen and even in case of migration to a new device.

1.1.1 How much does data loss cost?

Some studies report that a company that experiences a computer outage lasting for more than 10 days will never fully recover financially and that 50 percent of companies suffering such incident will be out of business within 5 years [1], [2]. Other studies by National Archives & Records Administration in Washington show that 93% of companies that lost their data center for 10 days or more due to a disaster, filed for bankruptcy within one year from the disaster; 50% of businesses that found themselves without data management for this same time period filed for bankruptcy immediately. Statistics about data recovery [3], [4], [5] say that U.S. businesses lose over $12 billion per year because of data loss. This loss is due primarily to hardware or system failure, which accounts for 78%, human error accounts for 11%, soft- ware corruptions for 7% and natural disasters represent only 1% of all data loss; Table 1.1 summarizes how each factor economically affects the loss. Moreover, disaster prevention and recovery plans are often overlooked or

2 1.1. MOTIVATION

3.0 M$ 3.0 M$ 2.5 M$ 2.5 M$ 2.0 M$ 2.0 M$ 1.5 M$ 1.5 M$ 1.0 M$ 1.0 M$ 0.5 M$ 0.5 M$ 0.0 M$ 0.0 M$

l i y s g s y e s g l g n n g c a n t n r i a i o r o o n e c e i i l k t t i u a t n t o r R n a u u c t n a E c i u i t e a h s B f n s c c n u u n I a I e n l m T m a r a i m n a M c o o i h n t c P a a e l n i e m r F T o f

n I

Figure 1.1: Costs of data loss per industry sector (values are in million $ per year)

outdated, and more often are considered a boring and time wasting activity, also because users have the perception that backup tools and techniques are not 100% reliable. The 7th Annual ICSA Lab’s Virus Prevalence Survey [6] says that file corruption and data loss are becoming much more common as usually users cooperate on shared documents or resources, although loss of productivity continues to be the major cost associated with a virus disaster. Ontrack statistics calculate how much data loss costs for each industry sector. The chart in Figure 1.1 resumes costs of data loss per industry sector [7]. The cost of losing data depends on the type of data. If an enterprise loses historical data about room cleaning the loss does not represent a huge prob- lem for the business. On the other hand, if archives containing contracts and invoices data, architectural drawings, or the source code of a mission critical software that should be rewritten by high skilled developers, are lost, then the

3 CHAPTER 1. INTRODUCTION loss is huge. In the first case the institution will have to face legal problems, due to law that regulates official data management; in the second the architect will have to inspect all the areas interested by the lost drawings and redo the work, in the third case the enterprise, will have to spend time and money to re- implement the software or will have to face a migration to a similar software. As far as mobile environments are concerned, if a manager’s smartphone fails and she loses her family pictures, this is not a huge problem; while if in the failure she loses the address book, containing all her business contacts, this represents days of work to recover part of these data; probably she will never recover all the information lost and for a manager this is a great loss.

Moreover, in the last years that users store always more important data such as pin codes or bank account numbers in their mobile phones or laptops as they trust the reliability of such devices. For the users it is really comfortable for day-to-day business to store private information into their mobile device as it allows to access the information instantly. Unfortunately these devices are subject to be lost or stolen. Cpp Fonesafe sets that, in Italy, every four minutes a mobile phone is lost or stolen; AXA insurance in a report states that the ma- jority of stolen devices are smartphones [8]. The phenomenon is even bigger in other countries; in the UK, for example, 228 mobile phones are reported to be stolen every hour [9]. In case the device is lost, usually the information con- tained in it is not interesting for the one who finds it; he will just reset and use it. On the other hand if the device is stolen the information can be used by the thief as he/she may know the owner and can exploit such information more easily. A security layer must protect these data.

In any case when somebody loses his/her device, the most valuable thing he/she loses is the information within the device so there is a need for a reliable

4 1.1. MOTIVATION

400

350

300

250 Smartphones 200 PC 150

100

50

0 2005 A 2006 A 2007 A 2008 A 2009 A 2010 E 2011 E

Figure 1.2: Smartphone and PC sales prevision in Million of units

mobile backup system.

1.1.2 Focusing on mobile

According to RBC analysts [10], the 2011 shipments of smartphone devices will approach 400 M units equalizing PC sales. Figure 1.2 illustrates the trend for 2005–2011, “A” indicates actual values, “E” indicates estimated values. Nokia is still the mobile device market leader, probably thanks to his policy on low cost devices. Apple IOS and Android equipped devices are gaining market share on Windows Mobile, Palm and OS, while RIM Blackberry is quite stable, probably because of its focus to business customers. Smartphone OS’s diffusion changed in the last year; in 2009 Symbian was leading the market with 52% followed by RIM 17%, Windows Mobile 12%, iPhone 8%, Palm 2%, Android 1% and others 9% [11]. In Q4 of 2010 Android gained a huge part of the market growing 886% year-over-year [12]; 2010 OS market is still lead by Symbian with 38% of the market share, RIM has grown

5 CHAPTER 1. INTRODUCTION

100%

90% Other (Palm, Linux) 80% Google Android 70% Mobile Apple iPhone 60% RIM Blackberry Symbian 50%

40%

30%

20%

10%

0% Share Q3 2007 Share Q3 2008 Share Q3 2009 Share Q4 2010

Figure 1.3: 2007 - 2010 trend mobile operating systems market share.

reaching 16%, Apple IOS after iPhone4 launch gained 5 points holding 16% of the market, but the fastest growing OS is Android, having 23% of the whole market. Android’s growth is driven by key products from HTC, Motorola, Samsung, Sony Ericsson and LG, among others, as they provide smartphones running Android as [13]. Figure 1.3 shows the trend of distri- bution for smartphone operating systems over the last 4 years, the figure alson reflects the trend of sales for vendors.

The map in Figure 1.4 shows the spread of mobile devices in the world at the end of 2009 [14], when more or less each person has a mobile device. In some countries, such as the United Arab Emirates, a person uses two or more mobile devices in everyday life. In the most cases, using more than a device, forces the user to switch from a vendor, operating system or version of the

6 1.1. MOTIVATION

Figure 1.4: Mobile cellular, subscriptions per 100 people, 2009. same operating system to another continuously. Moreover the usage of more than a device spreads personal data on all the devices, making it more difficult to search the information in all his/her de- vices. The solution is to keep devices synchronized but it is really hard to do. It is even harder, if not impossible, if these devices are from different vendors and if they run different versions or different operating systems. Currently, in some cases, the easiest way to synchronize two devices is to manually copy data from one to the other. For example if a user wants to switch from a Symbian equipped Nokia smartphone to an Android device she can synchronize her device with her gmail account (if she has one), in order to have her address book copied to the

7 CHAPTER 1. INTRODUCTION new device. One way to copy messages (SMS, MMS) is to use a migration tool like SPB Migration Tool available on the market for 9.95$. Unfortunately, from the users’ comments, it looks like the application does not work properly on every source device; even Android OS versions are not fully supported (only 2.0 and higher). Moreover the application migrates Address book, SMS, MMS and gallery data to Android and does not work if the user wants to migrate from Android to another operating system. Another way to move SMS from Symbian to Android is to install Nokia OVI on a laptop, synchronize messages from the smartphone with OVI, download and run Nokia2AndroidSMS.exe which should automatically find all datastores created by Nokia OVI and automatically select the first one and generate an XML file. Then the user should install SMS Backup & Restore on his Android device, connect the phone to the PC and select “Disk drive” as connection type. Now the user should copy the XML file into the SMSBackupRestore folder on the phone and run SMS Backup & Restore to import messages. Even such a “straightforward” procedure is one way, it works just from Symbian to Android. We explained here some examples on how to migrate from a device to an- other. To keep different devices synchronized, Microsoft Exchange can be used, but the devices are just partially synchronized, are not updated. It is clear that saving personal information and restore these information to a new device, is not as simple as it should be. In some cases it is impossible to save some kinds of data. In this introduction we did not mention application settings, but it would be a huge save of time and pain to restore those settings to a new device and have all applications, if available on the new platform, already installed and configured.

8 1.2. OUR SOLUTION

Currently there is no solution which allows the user to backup data from a device and restore it to a new device having all contacts, calendars, email, SMS, MMS and even application settings available on the new device without wasting too much time and with a painless procedure.

1.2 Our solution

The solution proposed in this thesis is to provide a common interface to ex- change data between the plethora of devices present in the market. Such com- mon interface is based on the structure of the data to be exchanged. The mobile phone self-extracts the information to be backed up using the API provided by the mobile operating system or saving the whole content of the device and extracting the useful information in an ad-hoc server application. As smartphones tend to be always connected to the Internet, it seems natu- ral to move the information online and to provide backup and restore services based on the cloud computing paradigm, which is considered to be more re- liable and less expensive by end users [15], [16], [17]. This approach reduces also the risk of data loss and decouples the data from a specific device. Once backup information moves online, it can be used in several ways, for example in a shared application or to extract social networks and profile users. In an enterprise scenario, for example, it can be useful for users to share business or personal data contained in their mobile’s backups, such as calendar or business cards, with some selected contacts of their choice. At the same time, the management could be interested in analyzing social relations which naturally grow between employees, and exploit these relationships to build workgroups. In such a scenario, it is easy to imagine a community of people willing to

9 CHAPTER 1. INTRODUCTION share some of their data within their mobile network. A backup that allows data sharing, however, can suffer the same security and privacy issues present in social networks [18]; such limitations can be approached in different ways depending on the environment where the system is used. In an enterprise sce- nario, data sharing can be monitored by administrators which can enforce the company privacy policies. In a general purpose environment, like a mobile so- cial network, ownership of data must be verified and sharing must be allowed only by the data owner.

1.3 Contributions

The goal of this thesis is to present a backup system for smartphones that al- lows users to share part of their personal backup data with a selected set of contacts. In order to be platform independent, our approach is based on a novel kind of management of data, and hinges on a data model which abstracts from the underlying platform and focuses on the data type. The same backup and restore method can be applied both on mobile and on desktop or intercon- nected TV platforms. With such system, users can manage different devices, using different operating systems, and keep data synchronized across differ- ent platforms. In order to assess the feasibility and impact of our approach in a real scenario, we built three prototypes of our backup and restore system for Android, Windows Mobile OS (version 5 and 6) and for Symbian OS, and tested them on actual mobile devices. Our contribution to this project covers the following areas:

Smartphone data extraction : we proposed two different approaches to ex- tract internal data from a mobile device and send these data to a remote server using a common format, based on the structure of the data type to

10 1.4. THESIS OUTLINE

be exchanged between the mobile client and the server.

Smartphone data elaboration : we designed a methodology to reverse engi- neer raw data, coming from mobile devices, implement specific parsers able to extract personal information and elaborate such information to make it compatible with other devices.

Securing the system : we proposed a brand new key agreement algorithm based on matrix conjugation method, a new model to implement secure inter-process communication into the Android OS, and we verified the usability of new encryption algorithms compared to standard ones in mobile environments.

Services on backup data : we proposed some services using stored data to be provided to users or to administrators of the backup system. Such ser- vices are just a starting point for other possible uses of data. The services implemented are a shared backup and a social network extractor.

1.4 Thesis Outline

This thesis is organized into three parts. The first part introduces to the prob- lem and describes the solution proposed.

First part is composed by Chapter 2 and Chapter 3. Chapter 2 is a survey on backup techniques both on desktop and on mobile environments; Chapter 3 summarizes how we approach the backup problem, showing the proposed idea. In this chapter we show the components of the system implemented to allow users to backup and restore data granting interoperability between ven- dors, operating systems and versions.

11 CHAPTER 1. INTRODUCTION

The second part deals with the operations on data. This part is composed by Chapter 4 and Chapter 5. Chapter 4 details the two methods, forensic style and selection of interesting data, proposed to extract data from the device and shows how these tasks are performed and integrated. Chapter 5 explains the data reverse engineering methodology proposed to extract personal informa- tion from raw backup data, and how such data are managed to be made in- teroperable between vendors, operating systems and versions. This section describes architecture of the server side and how the server stores backup data coming directly from a device or from a raw backup.

Chapter 6 and Chapter 7 compose the third part. This part proposes some ser- vices to be provided to the users of the system. Chapter 6 describes the security services deployed to secure the information. In Chapter 6 we explain the new key agreement algorithm, the approach proposed to protect inter-process com- munication in Android and some considerations on the opportunity to use new encryption algorithms or the standard ones on mobile environments. Chapter 7 shows some possible Value Added Services on backup data like the opportu- nity to share part of the backup with some selected contacts and the possibility of extracting users’ social network using data from the backup and information available on the web.

The thesis ends with a chapter which summarises the findings of the thesis and considers directions for future work.

12 2 Backup & restore in the third millennium

Introduction

According to [19], backups can be classified in several types; it is possible to distinguish the data repository model in full backups, incremental backups and differential backups, data can be stored in a file-based or a device-based style, and the data repository management can be classified as online vs. off- line; those approaches can be combined in different ways, according to accessi- bility, security and cost needs. In case of failure, a full backup is able to restore the entire content of a device: this process is slow in the backup phase, intro- duces a huge overhead in the data stored, but allows for faster restores. On the other hand, incremental backups reduce backup times and sizes but im- ply higher restore times. Backups can operate on files (file-based approach) or on data physically saved on the disk (device-based approach): although a file-based approach tends to be slower than a device-based backup, it allows for more flexibility and it is easier to manage. Online backups permit to save and restore data while the system is running, while off-line backups require the

13 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM system to be idle: online backups are more convenient, as they do not interfere with the users’ work, but are more complex to handle, as the system needs to deal with updates carried out during the backup. In all cases, backups can be stored locally, e.g., on an external device , or remotely, e.g., on a remote server. Backups for mobile devices can be stored locally on a SD card, or on a personal computer, or remotely on a server accessible via network connectivity. Several synchronization protocols have been proposed for mobile devices, including Microsoft’s ActiveSync, HotSync for Palm OS devices, Pumatech’s Intellisync, SyncML and CPISync. We refer the interested reader to in [20] for a detailed analysis of these protocols. Google’s Android Sync, Google Sync and Apple’s MobileMe are examples of applications enforcing data synchronization among different devices through cloud services. The main problem in the existing mo- bile backup solutions is that they are usually bound to specific platforms and vendors. Even SyncML [21], which was launched to provide an open standard to synchronize devices with different OS, is confined inside the Open Mobile Alliance companies’ products. Data sharing, like business contacts or calendar events, among different users is spreading, but currently available solutions (e.g., VCard [22] via SMS or ) are still too complicated to use, as they require physical proximity and suffer from lack of portability across different platforms.

2.1 Backup features

In the following subsections we will describe in more detail the main features of backups; i.e., full vs. incremental backups; file-based vs. device-based schemes; support for online backups; the use of snapshots and copy-on-write mechanisms; local and remote storage. All these features will be analyzed both

14 2.1. BACKUP FEATURES for the mobile and for the desktop/server environments.

2.1.1 Full backup

The simplest way to protect a file system against disk failures or file corruption is to copy the entire contents of the file system to a backup device. The resulting archive is called a full backup. If a file system is later lost due to a disk failure, it can be reconstructed from the full backup onto a replacement disk. Individual lost files can also be retrieved. Full backups have two disadvantages: reading and writing the entire file system is slow, and storing a copy of the file system consumes significant capacity on the backup medium. Full backup is designed to allow the entire device to be recovered without any installation of operating system, application software and data. This kind of approach allows the user to avoid the time expense in a full system recovery, the hours needed to rebuild the device to the point of restoring the last data backup. So, a full system backup makes a complete image of the device so that if needed, it can be copied back to the device. To restore the system in such cases there is the need of some specific software, such as for example Ghost.

2.1.2 Incremental backup

Faster and smaller backups can be achieved using an incremental backup scheme, which copies only those files that have been created or modified since a pre- vious backup. Incremental schemes reduce the size of backups, since only a small percentage of files change on a given day. A typical incremental scheme performs occasional full backups supplemented by frequent incremental back- ups. Restoring a deleted file or an entire file system is slower in an incremental backup system; recovery may require consulting a chain of backup files, begin-

15 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM

Backup speed Restore speed Information saved MORE Incremental Full Full l Differential Differential Differential LESS Full Incremental Incremental

Table 2.1: Comparison of backup approaches

ning with the last full backup and applying changes recorded in one or more incremental backups.

2.1.3 Differential backup

A third schema between incremental and full backup is the differential backup, the differential backup schema performs a full backup and later saves all files modified since the last full backup. The main difference between incremen- tal and differential backup style is that incremental backup saves all files that have been changed since the last backup, whether it is a full, an incremental or a differential backup, while differential checks for the type of backup per- formed and saves all files modified since the last full backup. This style per- forms faster than incremental backup, but slower than full backup to restore a compromised device. In backup phase differential approaches faster than full but slower than incremental. The storage needed to save data of backups is less than full backup and more than incremental backup. Table 2.1 resumes comparison between backup and restore techniques. Incremental and differ- ential backup can be considered reverse delta approaches; in these schemata the backup system stores only the differences between current and previous ver- sions. Such kind of backups start with a full backup and periodically synchro- nize data with the live copy; data between live copy and full backup can be

16 2.1. BACKUP FEATURES archived or erased depending if the system wants to allow to recover to inter- mediate versions. Backup systems using suck approach are rdiff-backup and Time Machine.

2.1.4 File-based vs. device-based

Files are saved on disk in logical blocks, these blocks are usually all with the same size (e.g., 8 KiloBytes). A file in a working system will usually be saved in blocks which are not contiguous. Backup software can operate either on files or on physical disk blocks. File-based backup systems understand the struc- ture of files and copy entire files and directories to the storage media; such approach is really powerful in case one wants to recover or backup a single file, unfortunately on huge backup operation on hard disks such approach is slowed down by the seek times to reach file parts contained in non-contiguous blocks. A file-based backup scheme even suffers the problem that even a small change to a file requires the entire file to be backed up. In small files the prob- lem is negligible but in multimedia files performances are strongly affected. On the other hand, device-based backup systems make a low-level copy of the content of the drive block-by-block; this improves backup performance on hard disks, since backup software performs fewer seek operations. Device-based backup, if performed with a reverse delta approach, performs better even on bigger files as small modifications, even on big files, cost at most the size of the modification more 7 KiloBytes. Unfortunately, this approach complicates and slows file restores, since files may not be stored contiguously on the backup medium. Moreover to allow file recovery, backups must include information on how files and directories are organized on disks to correlate blocks on the backup medium with particular files. This carries that device-based programs

17 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM are usually specific for a particular file system not easily portable.

2.1.5 Scheduled backup vs continuous data protection

Backup software can require the file system to be quiescent during backups, usually these systems perform scheduled backups, online or active backup sys- tems allow users to continue accessing files during backup. These kind of systems can perform both scheduled backups and continuous data protection. In continuous data protection data are continuously saved from the device to the backup medium in a transparent way for the user. Online backup systems offer higher availability but introduce consistency problems; another problem intro- duced by such kind of systems is that device resources are consumed by the backup system continuously performing operations in background. In a server or desktop environment resources consumed by the backup system do not af- fect usability of the system but,for example, on a mobile device with limited capability, it is really important to save resources for the user interaction, and battery to grant the device autonomy. By contrast, scheduling backups save operations are performed in given moments (e.g., once a week). This approach do not grant that data are continuously protected, but have the advantage to be less resource consuming, as operations are performed rarely with respect to continuous data protection. Another advantage is that operations do not interact with user’s activity, if the backups are scheduled in a smart way. In mobile devices for example operations can be performed when the device is idle, for example during the night. In the describer cases we talk about backup performed when the device can run programs, so file system can be modified during the backup, this can lead the backup create inconsistency in files saved, a possible solution of such problem can be performing a “snapshot” of the

18 2.1. BACKUP FEATURES

filesystem in a consistent time and make the backup of the snapshot. There can be the need to create a snapshot when the approach is full backup style or is the first execution of a reverse delta approach, in other execution of incremental or differential backup can be followed a copy-on-write scheme; in this scheme each time a file is modified the snapshot is updated and kept consistent with the live copy.

2.1.6 Local backup vs. remote backup

Backup data can be stored in several locations; historically backups were saved on magnetic tapes labeled both internally and externally to avoid losing backup data. Unfortunately magnetic tapes are not really reliable as tapes are prone to wear and to magnetic capacity loss. Currently backup data are stored on hard disks or other media. A backup can be considered local when the media where data are saved is locally connected to the device backed up (e.g., a second hard disk mounted on the same computer where reside data to be saved). A re- mote backup is the case when data are saved in another computer; this remote storage can be an ftp server inside the LAN or a server accessible through the Internet. Saving data locally the backup and restore processes are performed faster than using a remote resource as transmission times, the real bottleneck in such kind of operations, are saved. On the other hand performing backup operations remotely grants a better level of safety in case of theft of a PC for example if data are saved locally, in a second hard disk installed on the de- vice, the second hard disk will be stolen with the device. The same problem can happen with an external hard disk, for example for laptops, if the laptop is stolen or lost it is really probable that the external hard disk is contained in the laptop’s bag. If data are saved remotely it is really improbable that both

19 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM the backed up device and the data container are lost or stolen in the same time. Remote approach grants a better level of reliability for the user saving the data. Cloud backup systems [23], [24] are the last frontier of the remote backup, such kind of system grant the best reliability even if times to backup and restore are slowed down by connectivity that is the bottleneck of remote backup systems. Unfortunately using these approaches the final user is locked to the particu- lar provider; the cloud solutions available need a specific software installed both on the client and on the server. The backup software optimizes backup operations but reduces the portability.

In [25] authors propose cloud backup approach based on simple operations available in every remote storage system:

Get: Given a pathname, retrieve the contents of a file from the server. Put: Store a complete file on the server with the given pathname. List: Get the names of files stored on the server. Delete: Remove the given file from the server, reclaiming its space.

The method proposed moves all critical operations to the clients, the server must provide just the interfaces to perform the four operations listed above. Such approach should ease migration to new costless and more powerful solu- tions, moreover the backups could be stored exploiting more providers located in different geographic areas to increase fault tolerance even in case of natural disasters. A similar approach can be used to backup data stored into mobile devices.

20 2.2. MOBILE

2.2 Mobile

Data stored on a mobile device are usually critical for the device’s user, when data are lost the effort required to recover all the information saved on the de- vice is really high, and sometimes it is impossible to recover all the information stored on the device. Moreover mobile devices are subject to be lost, or stolen, even more than laptops and desktop devices; furthermore they suffer storage and performance problems. For these reasons usually backups are performed on remote devices such as the device owner’s laptop.

2.3 Local backup for mobile device

Following the desktop idea a local backup should be performed on a storage media directly connected to the device, e.g., a memory card. For mobile device storing backups on the user’s laptop can be considered as a local backup; this kind of storage suffers, more or less, the same problems of local backups in laptop environments. Saving data on the device memory card offers a good level of usability, the backup can be done transparently for the user. Backup process can run in background saving data when are modified on the device. Saving data on the device memory card can be useful in case of migration, but unfortunately gives no reliability in the case the device is lost or stolen. Saving backup data on a laptop increases the reliability of the backup system; usability is affected by the need to connect the device to the laptop. Usually backup from mobile device to laptop is performed using Bluetooth or ca- ble connections, this makes necessary that mobile device is near or connected to the laptop, connection operations for some classes of devices (basically old devices) is not usual for users, so backups are performed rarely with the conse-

21 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM quence that saved data are not updated when restore is performed. For other classes of devices, Android or iPhones connecting the device to the laptop can carry malware infection propagation [26].

2.4 Remote backup for mobile device

Improved and 4G connectivity features provided for mobile devices have opened in the last years the possibility to perform backup remotely using the Internet. Such approach is characterized by issues and advantages described in Section 2.1.6 for desktop devices. Due to the reduced hardware capability typ- ical of mobile devices performance problems are increased. Furthermore mo- bile devices suffer battery autonomy problems, and, for a mobile device, using connectivity features increases significantly the battery consumption. On the other hand saving remotely backup data using network connectivity allows the backup system to run as a background application and transparently keep updated backup data with device data. Moreover saving data remotely can in- crease usability of the system; there is no tedious need to connect the device to a laptop using cables or via Bluetooth. Obviously there is no need of proximity between device and storage media; this allows the system to perform backup more frequently when the device is idle, freeing resources when needed by the user. If backup system storage media is based on a cloud architecture, the re- liability of the whole system is increased; cloud backup systems allow users to access their personal data in a secure way from different platforms. Cloud based approach even free the user from managing his/her personal backup files avoiding potential errors or data loss due to user’s errors. [27] proposes a collaborative mobile backup approach based on a peer-to- peer architecture, the approach is interesting but opens several security and

22 2.4. REMOTE BACKUP FOR MOBILE DEVICE privacy problems. The information stored on mobile devices is usually really personal. Owners are used to store pin codes, private messages and other information which suffers of privacy issues on their mobile device. Backups stored on others’ peer memory could be analyzed or modified by the owner of the peer where data are stored making these data useless or unavailable when needed.

23

3 Our approach to backup

Introduction

Several solutions to solve the backup and restore problem for desktop and server systems are available. The heterogeneity of mobile environments makes the backup problem harder. New vendors, operating systems, versions and new devices frequently appear in a changing market and, new solution are proposed continuously to solve the problem vertically for each new device/- platform launched.

In this Chapter we show the new approach we proposed to handle back- ups from heterogeneous mobile devices and grant interoperability with new devices. The main results obtained applying the proposed approach have been recently presented in the 4th IFIP International Conference on New Technolo- gies, Mobility and Security [28].

Some of the ideas presented in this chapter, and more in general in this thesis are being applied in the Telecom Italia CuboVision backup project.

25 CHAPTER 3. OUR APPROACH TO BACKUP

3.1 A new approach to backup & restore

Our approach tries to overcome the limitations in saving and restoring data from mobile devices, by using online remote backups as a uniform interface for sharing data among different users and multiple platforms. In particular, we present an online remote backup system based on a Service Oriented Archi- tecture (SOA): the services offered by our solution allow to backup and restore not only files but also more structured data such as contacts, calendar events, and text messages (SMS). In order to be able to access those services, a mo- bile device must be equipped with a client capable of retrieving internal data from the device and sending them to the server via a common interface. This interface is designed so as to exploit the common features of mobile data mod- els: e.g., independently of the platform used, a contact in an address book is always identified by fields such as first name, last name, address, phone num- bers, etc. . . etc. . . All the communication exchanged between the client and the server is based on an extensible standard language (i.e., XML). The communication format for each kind of data is detailed in Appendix B. Thanks to the data common interface, data saved on the server are avail- able for all types of devices, mobile or not, equipped with the client applica- tion. This grants interoperability between vendors, platforms, and operating systems. Using our general data model, backup data can be shared among dif- ferent users: this allows to share part of the backups that transparently are kept updated on all the devices that can access the information. In our architecture (see Figure 3.1), the server provides his services using a representational state transfer (REST) architecture [29], [30]. For each type of

26 3.1. A NEW APPROACH TO BACKUP & RESTORE

Persistence: DBMS

Application: Business Logic

Web Services: REST API

Internet

Secure connection

Common Format: XML

Figure 3.1: Backup and Restore system architecture.

platform, a different client is implemented: each client connects to the server using the HTTP protocol to exchange information in XML format. In the fol- lowing, we describe in more detail the main functionalities implemented by the server and the clients.

27 CHAPTER 3. OUR APPROACH TO BACKUP

[email protected] NAME ********* 2 ******** 1 2010-07-07 12:20:12.997 new

Figure 3.2: Example of data model for a contact.

3.1.1 Server

The server has been designed as RESTful: in a REST architecture requests and responses are built around the transfer of representations of resources. In our case, a resource is the XML representation of its state, for example a contact (e.g., in Figure 3.2) or a contact list. A REST architecture is based on the HTTP protocol and uses all the HTTP facilities, such as the security layer provided by HTTPS in a transparent way. The server allows mobile clients to perform full backups and incremental back- ups. When a user performs a backup, all the user’s data previously stored on the server are still accessible from the mobile client; old data are kept on the server, and made accessible to the mobile client, to allow the user to revert to

28 3.1. A NEW APPROACH TO BACKUP & RESTORE https://someserver.com/backup/{backupType} /device/{imei}/contacts/{contactItemName}

Figure 3.3: Example of a request of a contact.

old backups in case of loss or failure. Our server implementation offers two REST methods: PUT, used to insert new entries on the server’s database, and GET that allows the mobile client to perform queries for a single entry or for entry lists. Figure 3.2 shows a typical body of a PUT request to the server: the body will contain the XML represen- tation of the serialized object (in this specific case the entity saved is a contact item). In Figure 3.3 we show a typical URI of a PUT/ GET request (this specific case shows a request for a contact). When receiving a GET request at a URI as shown in Figure 3.3, the server will answer with the “contactItemName”, for the “imei“ device from the “back- upType” resource using the XML shown in Figure 3.2; otherwise, if a PUT re- quest is received, the server expects in the HTTP(S) request, the contact details to be processed.

3.1.2 Client

The client can be implemented for different types of devices (mobile, desktop, game console, Internet TV etc. . . ). The software should be implemented to ac- cess private data residing on the device and to send such data on a remote server which will store these data. Clients must be able to handle HTTP mes- sages bodies, get data sent by the server and store them into the device, for example in the address book, in the device specific format.

29 CHAPTER 3. OUR APPROACH TO BACKUP

GET(id_list)

id_list

PUT(id_x1)

PUT(id_xn)

Figure 3.4: Example of client server interactions.

Usually devices need to be built on purpose to interact with a backup server; in some cases they need to handle dirty flags in order to manage the status of the resources to be saved. In our approach, in order to interact with the server, clients need only to be able to read and write resources to be saved and to implement just some basic HTTP methods.

To improve the performance in incremental backup operations the client may handle the list of items to be sent to the server. Figure 3.4 shows a typical interactions for an incremental backup. First, the client asks the list of identi- fiers of the items in the last backup, the server sends the list of the identifiers to the client in a XML format with the last backup date. At this point, the client computes its internal list of identifiers and compares the two lists: now, the client knows all the data that have been updated in the device, and can build

30 3.2. SHARING BACKUP DATA the list of modified contents. If last modification date in the client’s list is most recent than the one in the server, then the client adds the item to the list of data to backup. Note that our approach ensures compatibility with all old devices that can run third party applications able to access private data. The restore process gets the list of items on the server and saves all contents on the client. If there are some contents that appear on the client but not on the server, such contents are preserved in the client. In case of migration to a new device, or re- store after a hard reset, the device is empty so the device contents after restore will be those contained in the last backup.

3.2 Sharing backup data

Following a user-centric idea of the collaborative Web, the proposed approach for sharing data among different users and different devices can be often use- ful. It is easy to imagine a community of people willing to share some of their data within their mobile network. In a closed group of people, such as friends or work colleagues, usually some class of data stored in mobile devices are the same for the entire group. Collaborating people usually share each others’ mobile phone numbers, emails, calendar, addresses documents and so on. In an enterprise scenario, for example, it can be useful for people to share busi- ness cards or calendar events contained in their mobile’s backups, with some selected contacts of their working group. At the same time, in such a collabo- rative backup, it will be easier to recover data loss even if these data were not saved in a personal backup; in fact in a closed group that collaborates, it will be easier to asks one of the members for some data that a member lost and another still owns. Members of the same social community tend to share more or less the same data [31, 32]: if a member of the group changes his/her mo-

31 CHAPTER 3. OUR APPROACH TO BACKUP bile phone number he/she will have to spread his/her new number to all the network; in the same way if a new member joins the group other members will have to save his/her contacts in their mobile device. The approach proposed her aims at speeding up the sharing of updated information between project teams, study groups or more in general social communities.

3.3 Social network analysis

As a side effect of building a shared backup system, we have data allowing us to analyze the social network of the users participating the shared backup. The available information is not only that available into the backup’s data; we can access a lot of information that the user spreads on the web both consciously, using services where the user wants to give information about himself (i.e., linkedIn, myspace, flickr), and unconsciously signing up to other services, mailing lists, and so on. Even college web sites sometimes give information about students such as matriculation number or even notes. In such context we are able to access a lot of information which have no sense or is quite use- less if not filtered.

We can use shared backup data to filter this information and profile the user building his/her social network and crossing such social network with the other user’s social network. Once we have built the user’s social network, we can use such network to build workgroups, use this information for marketing purpose or Customer Relationship Management. Section 7.2 details how we can build a social network getting data from a mobile device backup and from the web.

32 3.4. SECURITY

3.4 Security

A backup that allows data sharing, however, can suffer the same security and privacy issues present in social networks [18]; as personal data are more af- fected from privacy issues than common ones, both for interest they arose and for the problems a data theft can carry to the user, some additional measures to grant privacy should be applied. Depending on the size of the sharing group, privacy issues can be approached differently. In small and medium groups an administrator can handle permissions and grants access to data to users. For example, in a medium scenario like an enterprise, data sharing can be monitored by administrators which can enforce the company privacy poli- cies. For bigger groups like a widespread social communities, privacy cannot be demanded to security managers or privileged users; each user must prove his/her ownership on data he/she wants to share. For example, if a user wants to share an email contact, the system will send a verification code to this email address and the user will have to prove ownership replying to the challenge. For such approach privacy issues are more challenging than security; a security layer is provided deploying secure connections and data encryption. Commu- nication security is provided using HTTP over TLS connections while data en- cryption can be transparently done via common DBMS encryption functions. For some classes of information (e.g., calendars) time limited sharing could be an improvement to grant privacy to user that are interested to share data just for a limited period with someone. Since mobile devices introduce a new feature applicable to backup and sharing: the geographic position, location limited sharing could solve the problem of a user that wants to share data just with person in a certain geographic area. Obviously all these solutions should

33 CHAPTER 3. OUR APPROACH TO BACKUP be combined to grant different levels and configuration of privacy settings. In Chapter 6 are detailed some results that can be applied to grant an higher level of security to the whole system.

34 4 Data extraction

Introduction

The backup process is basically divided into three steps; the first is to get the information from the memory of the device, the second is to save such infor- mation in a different store and the third is to restore the information from the backup into the device. Extraction, for mobile devices, is one of the most chal- lenging problems due to the differences between devices and operating sys- tems. In this Chapter we describe how we solved the extraction problem. We approached the problem in two different ways; a forensic style extrac- tion and a smarter approach based on an extraction performed using the un- derlying OS’s APIs. The main results of the application of the forensic approach have been pre- sented in the 2008 High Performance Computing & Simulation Conference [33]; the improvements of the extraction methodology have been later pub- lished on the International Journal of Electronic Security and Digital Forensics [34]. The generalization of the extraction methodology and the testing results

35 CHAPTER 4. DATA EXTRACTION on Windows Mobile and Symbian S60 operating systems have been published as Chapter 19 of the Handbook of Electronic Security and Digital Forensics [35]. Currently Italian Carabinieri are experimenting the MIAT tool (a tool based on the extraction approach presented in this chapter) to forensically extract infor- mation from seized devices. The extraction approach which exploits the operating system’s API has been used to extract information from more powerful devices, such approach has been presented in the 4th IFIP International Conference on New Technolo- gies, Mobility and Security [28].

Different ways for information extraction

In the plethora of devices, vendors and operating systems present in the mobile market, and as continuously new devices, implementing new technologies, giving more and more capabilities to the user are deployed, getting data from the device internal memory and restoring these data should be approached in different ways for each different case. This is due to limitations specific for each platform, operating system and even version of the operating system. In such scenario the easier way is to approach the device backup is to im- plement a differential (see Section 2.1.3), file based (see Section 2.1.4) scheduled backup (see Section 2.1.5) with a snapshot approach. Such approach is really powerful in case the user wants to restore all the device, replacing even con- figurations and system files as they were at the backup moment. A detailed description on how such approach can be performed is given in Section 4.1. Such approach suffers the problem that the backups are not easily portable from one model to another and it is impossible to restore a backup performed with such approach to a different vendor’s device.

36 4.1. FORENSIC STYLE APPROACH

Another approach is to extract information just from some selected files, these files containing personal information such as contacts, calendars, text messages (SMS), multimedia messages (MMS) etc... Such approach needs to put more logic on the mobile device but allows the server side to manage in a easier way data coming from the devices. Filtering and pre-formatting infor- mation on the smartphone even allows the server to manage data in a more flexible way. In this way server is enabled to handle backup data indepen- dently from the client where data are backed up. Such approach to backup grants a higher interoperability, allows the user to migrate easily from a de- vice/vendor/operating system to another. Moreover applications can use fa- cilities given by modern operating systems to access personal information in- side the device. This approach is described in Section 4.2 with further details.

4.1 Forensic Style Approach

In this section it is described how the backup can be performed iterating recur- sively on the filesystem and performing a snapshot of the state of the device. Such approach is near to the forensic technique described in [33], [34], [35], [36], due to the reduced capability of the smartphone internal memory such memory can be copied to the external memory (i.e., the SD card or the MMC). After the internal data, the most critical, have been saved to the external mem- ory such data can be sent to a remote server, described in Section 5.4 via WIFI or 3G connectivity and later elaborated (see Section 5.1) as data contained in the snapshot can be used for the restore. We developed a forensic tool, available for Windows Mobile 5 and 6 and Symbian, to extract data from inside the memory of a smartphone granting the non modification of the content of the files. Such tool is able to create a

37 CHAPTER 4. DATA EXTRACTION

logical dump of the internal memory of the device into the external memory. In some cases the tool modifies some files inside the device memory or was not able to save all system files from the internal memory. Luckily these files are not key files in the backup process we propose. Moreover to perform the extraction forensically the device in some cases must to be restarted to allow the application to gain privileges to unlock system files. In the backup case there is no need to access these files, so the device does not need to be restarted and the tool can run in background on the device. The external memory content does not contain locked files, so contents can be sent to the server without considerable problems, and without the need of performing a snapshot.

4.1.1 Our methodology

The approach we propose focuses on acquiring data from a mobile device’s internal storage memory, copying data to an external removable memory (like SD, mini SD, etc.). Such task is performed without the need of connecting the device to PC. Thanks to this the backup process is really easy for the user; when the device is idle it performs the logical dump and when the dump is complete it can be sent to a remote server. The complete data extraction process is shown in Figure 4.1. The extraction tool spiders all the mobile device filesystem recursively, for each file performs a hashing of each file before and after the copy, to ensure acquired information integrity. The report containing file hashes is saved in a log file (checksum.xml). The extraction tool also compiles a log file named info.xml with all remarkable events and another log file summarizing the er-

38 4.1. FORENSIC STYLE APPROACH

no Start more files? Stop

yes

MD5

no copying using open opened? specific OS apis

yes

normal chunked copy

check integrity MD5

Figure 4.1: Data collection workflow

rors encountered namely errors.xml (Table 4.1 shows log files produced by the extraction tool). Log files are saved using an XML format. Data stored in the original memory card can be even acquired using a MMC or SD reader (USB or integrated): binary data are read from source, then stored as an image file, representing all the single bytes, including file system’s meta- data. After that, it is possible to analyse the file allocation table to recover, in some cases, even deleted data.

4.1.2 Symbian implementation

Extraction tool for Symbian, was developed to support and to test the method- ology described above. Symbian is an operating system derived from the Epoc

39 CHAPTER 4. DATA EXTRACTION

File Contents checksum.xml File size, file typology, file name, MD5 hash, extraction, duration, and cre- ation, access and modification time. info.xml Information about the device (IMEI, device ID, platform type, model, man- ufacturer), and about the extraction process (duration, battery consumption date of extraction). errors.xml Information about errors that may happen during the process.

Table 4.1: Files generated during the Extraction Process

operating system; Symbian OS supports a wide range of device categories with several user interfaces, including Nokia S60, UIQ and the NTT DoCoMo com- mon software platform for 3G FOMATM handsets. The commonality of Sym- bian OS APIs enables development that targets all of these phone platforms and categories. In order to produce executable code which does not need of any other software layer (e.g., a JVM to interpret the bytecode) The application was originally developed in C++, the native language of the Symbian OS.

Most relevant files are locked by system processes, many files on the sys- tem are always open and locked by system processes. For example the file Con- tacts.cdb, which contains the database of contacts, is locked by PhoneBook that is the address book process. In the past ([36]) we made use of the OS Backup service to perform seizure of locked data. Such service is an utility allowing the backup of the memory contents, even if these contents are locked. An ap- plication or a service can register itself and the files which locks. The Backup Server notifies a backup request to registered applications, so they can release the lock temporarily. Once the file had been saved, the application could no- tify this to Backup Server and then the system process could re-acquire the lock. In a recent work ([37]), we adopted a further alternative way to get ac- cess to locked files. This way is accomplished by the Symbian RFs API method

40 4.1. FORENSIC STYLE APPROACH

ReadFileSection that allows a file to be read without opening it. By this method it is possible to seize the entire file system tree including files which have a persistent lock on; furthermore this strategy preserves integrity because the access is established in read-only mode, guaranteed by the OS. There are some files and folders which are more relevant for the backup specific case, in Symbian S60 case these files are:

• Calendar, containing the memo, daynotes, meetings, anniversaries;

• Contacts.cdb, containing the contacts available from the address book.

• Mail, is a folder containing all SMS/MMS/Email files with Sender, re- ceiver and body.

• Images, is a folder containing all pictures taken by the user or available on the gallery application.

• Video clips, is a folder with user’s video recordings or video down- loaded or received.

• Sound clips, is a folder where the system saves user’s audio recordings, ringtones and received audio files.

4.1.3 Windows Mobile implementation

The tool implementing the methodology described above has been realized and tested even for Windows Mobile 5 and 6. Due to the differences between the two environments, the realization of the tool for Windows Mobile is not a porting of the Symbian version. Implementing the Windows Mobile version required a design phase as problems to be faced where different from problems faced implementing the Symbian version.

41 CHAPTER 4. DATA EXTRACTION

PocketPC internal memory and storage architecture

In Windows Mobile 2003 PocketPC and earlier, device’s memory was split in two sections: a ROM section, containing all operating system core files, and a RAM section aimed in keeping the user storage (Storage Memory) and the memory space for running applications and their data (Program Memory). The user can choose the amount of memory to be reserved to Storage Memory and then to the Program Memory. The RAM chip was built on a volatile memory scheme, so a backup battery was required to keep the RAM circuitry powered up, even if the device was just suspended. In case battery power supply went down, all user’s data were lost. Such scenario forced user to recharge battery within a time limit of 72 hours (as mandatory by Microsoft to devices manu- facturers).

RAM ROM

64M 64M

Core OS Stuff User Storage

32M 32M

Memory sizes reported could change among different PPC models.

Figure 4.2: Windows Mobile 5.0 memory architecture.

Since Windows Mobile 5, memory architecture was redesigned to imple- ment a non-volatile user storage. Currently, the memory is split in two section (see Figure 4.2): the RAM is aimed to hold running processes data, whereas the

42 4.1. FORENSIC STYLE APPROACH

ROM keeps core OS code and libraries (called modules), the registry, databases and user’s files. Such memory, also called Persistent Storage and contained within a flash memory chip, can be built using many different technologies [38]:

• XIP model, based on NOR memory and volatile memory, this technology enables device to store modules and executables in XIP (execute-in-place) format and allows the operating system to run applications directly from ROM, avoiding to copy them first in the RAM section. NOR memory has poor write performance.

• Shadow model, which boots the system from NOR and uses a NAND for the storage. This model is power-expensive, because the volatile memory requires to be constantly powered on.

• NAND store and download model, which reduces costs replacing NOR with OTP (one-time programmable) memory model.

• Hybrid store and download model, which mixes SRAM and NAND, covering them with a NOR-like access interface (to support XIP model).

Windows Mobile 5 and above place the great part of the applications and system data in the Persistent Storage. Core OS files, user’s files, databases and registry are seen by applications and users in the same file system tree, which is hold and controlled by the FileSys.exe process. Such process is also responsible for handling the Object Store, which maps objects like databases, registry and user’s files in a contiguous heap space. The Object Store’s role is to manage the stack and the heap memory, to compress and to expand files, to integrate ROM-based applications and RAM-based data. For a comprehensive

43 CHAPTER 4. DATA EXTRACTION

explanation about how Windows Mobile uses the Object Store and manages linear flash memory, see [39] and [40]. The strategy for storing data is based on a transactional model, which en- sures that store is never corrupted after a power down while data is being written. Finally, the Storage Manager manages storage devices and their file systems, offering a high-level layer over storage drivers, partition drivers, file system drivers and file system filters.

Algorithm 1 Extraction Input: A path p. Output: none. for all objects obj (files and directories) in p do if obj is a directory then Create a directory named p in the SD Card Recursively call Extraction(p/obj) else if obj is a file then Compute MD5 hash of obj Copy obj in path p on the SD Card if obj has not been copied then Access to obj with CEDB APIs if obj could be accessed then recreate a similar database in path p on the SD Card end if end if Compute MD5 hash of the copied obj on the SD Card end if end for

Implementation details

We have chosen to develop the application using a native C++ approach, ful- filling the requirement of having a tool to be launched from an external mem-

44 4.1. FORENSIC STYLE APPROACH ory card, without the need of a pre-installed runtime environment (like java virtual machine), neither the need to install the tool on the device. The appli- cation runs in stand-alone mode, and it does not require any third party’s dll. Since the tool uses the standard Windows Mobile APIs to access the file system (like Open, Read and Write, FileCopy), we can reasonably think that these APIs will not change in future versions of OS: then the forward compatibility can be assured. In Algorithm 1 is depicted the pseudo-code of the seizure pro- cess, that starts after the main application killed all the other non-vital running processes. Such algorithm performs two main tasks:

• the copy task, which copies all internal memory’s files of the mobile de- vice on the memory card;

• the hash task, which ensures the integrity of the copied files and allows to discover which files have been modified during the seizure process.

The Extraction algorithm works using APIs like CopyFile, Open, Close, and it copies recursively every internal file system entry on the memory card. This task preserves the directory structure, copying files according to their original position. The hash task computes the MD5 hash of each file found in the device internal memory. Hashes are written in a log file saved in a separate directory. The hash task can be launched as a separate function, and it surfs the whole filesystem to compute hash of every file. The Extraction algorithm invokes the hash function before and after the copy of every single file, allowing to understand if changes happen during the copy from the internal filesystem to the Storage Card. As reported in Section 4.1.3 talking about internal memory and storage

45 CHAPTER 4. DATA EXTRACTION

architecture, Windows Mobile places OS’s stuff in a lot of file-like objects in the same file system seen by the user (under /Windows directory). Most of these files are inaccessible by the standard file system APIs because they are objects that are in XIP format: most of the headers are removed and the ad- dresses are fixed up so that the programs are able to run with no need to be loaded into RAM first. The binary has been stripped down and customized for that particular device [41]. Such files are also flagged with file attributes like FILE ATTRIBUTE INROM and FILE ATTRIBUTE ROMMODULE. Our appli- cation skips these files: there is no reason to look for a method to access such files because they are firmware’s modules and they could be replaced with new ones only by an advanced user (using the ROM flashing technique - e.g., if she is willing to upgrade her firmware with a new version of the operating system or she want to modify things like bootsplash). Moreover, there is another set of files that cannot be accessed by standard APIs: these files are database ob- jects locked by operating system processes which cannot be killed. We reach to access their data using CEDB APIs and we are able to recreate such files in the external memory card. In Table 4.2 it is shown where most relevant data about user and system are stored in the file system.

Experimental results

The Windows Mobile extraction tool has been tested on a physical HTC device and on a emulated one (on a Windows XP computer). The extraction tool saves all the files containing the user’s information to be backed up. We noticed from hashes that some files have been modified, this is due to the fact that for some files it was necessary to create a new file and refill it with the original

46 4.1. FORENSIC STYLE APPROACH

Filename Location Description System.hv /Documents And Settings/system.hv System registry hive. User.hv /Documents And Settings/default User registry hive for default /user.hv user. Default.vol /Documents And Settings/default.vol Object store replacement vol- ume for persistent CEDB databases. This file contains MSN contacts Mxip system.vol, / Metabase volumes, includ- Mxip lang.vol, ing language-specific data Mxip notify.vol, and storage for notifications. Mxip initdb.vol Cemail.vol / Default SMS and e-mail stor- age. Pim.vol / Personal Information Man- ager (PIM) data, such as ad- dress book, schedules, SIM entries, call logs.

Table 4.2: Windows Mobile 5.0 relevant files

file contents using CEDB APIs. In Table 4.3 are shown these files encountering problems in saving phase, in the right column it is possible to see if the final file √ has been saved (−), differs (?) or not ( ). As previously described OS’s core files were not saved because these files are just virtual files. The testing phase, have been performed on a AMD Athlon64 X2 Dual 1GB Ram PC and a QTEK9000 PDA (HTC Universal), equipped with a Kingston SD 2GB.

4.1.4 Some remarks on this approach

In this section has been discussed a methodology to extract data from a smart- phone recursively copying the internal memory filesystem content to the ex- ternal memory. To prove the effectiveness of the solution two prototypes have been implemented, one for Symbian S60 (Figure 4.3 (a)) and another for Win-

47 CHAPTER 4. DATA EXTRACTION

File Cosistency /Documents And Settings/default.vol ? /Documents And Settings/system.hv − /Documents And Settings/default/user.hv − /Windows/*.dll − /mxip notify.vol ? /cemail.vol √? /mxip system.vol √ /mxip lang.vol √ /pim.vol − file not copied √? file copied but its hash does not match file copied and hash matches

Table 4.3: Extraction tool consistency analisys

(a) (b)

Figure 4.3: (a) Symbian S60 tool’s screenshot, (b) Windows Mobile tool’s screenshot.

48 4.2. SELECTION OF INTERESTING DATA dows Mobile 5 and 6 (Figure 4.3 (b)). The prototypes have been tested on a set of real devices, and results of the testing prove that the solution is able to extract internal device’s files containing personal user’s information and settings. For sure the application could be improved to support more recent devices, such as the brand new Windows Mobile 7. Unfortunately this approach is not sufficient to have an interoperable backup and restore system; the logical dump can, and have been, used to restore de- vices of the same vendor and model from where the dump have been extracted. The logical dump can be analyzed using the methodology proposed in Chap- ter 5 to extract interesting data that would allow to abstract from the specific device and focus on data.

4.2 Selection of interesting data

Better interoperability, between devices from different vendors, can be granted delegating the extraction and part of analysis of data to the mobile client. In our approach the application focuses on how data are structured into the device memory than on the internal system structure. Such application is installed on the mobile device and acts as a client that filters personal data and configura- tions present on the device, formats it following the common format proposed in Chapter 3 and sends that information to a remote server which interprets the format and saves the data into a common database. All smartphone’s operating system provide APIs to access internal databases containing personal data such as address books, calendars, notes, messages (SMS, MMS, emails); such APIs can be used to collect data to be sent to a server to perform a remote backup (see Section 2.1.6) of the mobile device.

49 CHAPTER 4. DATA EXTRACTION

Unfortunately these API are usually full featured developing the appli- cation in the operating system’s native programming language (i.e., for IOS, Objective-C; for Android, Dalvik Virtual Machine Java interface; for Symbian, Symbian C++; for Windows Mobile, Visual C++ or .NET). Portable source code such as J2ME cannot access some contents or has writing limits for some oth- ers. Moreover Java virtual machine is not available for all operating systems (e.g., IOS) or has a different implementation (e.g., Android’s Dalvik) and the J2ME code is not fully portable. Another problem of implementing mobile ap- plications in non native languages is due to the execution speed and resources consumption due to the virtual machine effort.

Considering the limitations due to develop a, more or less, portable client in J2ME, and the difficulties due to implement a client for each operating system using the specific native programming language. The better approach to follow is the second which with a little bit of developing effort offers a more stable, performing and powerful backup client application.

The implemented applications will retrieve data from the internal databases of the mobile device. These data will be sent using the REST web services pro- vided by our backup server using the proposed data model.

Our data model allows different OS to communicate, in particular we de- scribe how it is possible to backup data on a Symbian S60 device and store them in a remote server and then restore them in an Android 2.1 device. We choose to implement firstly our clients on Symbian and Android to show how older and newer devices can easily cooperate with our approach.

50 4.2. SELECTION OF INTERESTING DATA

4.2.1 Symbian

We realized the backup and restore client for Symbian, with a basic user inter- face just to show how collaboration was possible. The Symbian Socket frame- work was used to establish a TLS connection with the server. Symbian’s CActive allows to perform long running task in background and realize an asynchronous communication with the server, this behaviour is similar to Android’s Async- Task class. Asynchronous communication between client and server grants that the user can perform other operations while the application is running, this allows to run the application in background while the user continues us- ing the device. To access data it was necessary, for each data type, to open a session with the respective servers, which manage the communication with underlying databases or files:

• to extract address book data from Contacts.cdb, the CContactDatabase class has been used. This class gives access to all the contacts databases.

• to handle Calendar data a client server session is necessary, to get access to calendar data CCalSession object must to be used;

• to get access to messages, (SMS, MMS and Emails) it is necessary to es- tablish a communication channel with the Message Server through the CMsvSession::OpenSyncL() method;

• all the other files present in the multimedia folders (see Section 4.1.2) such as pictures or videos can be accessed directly as files using the approach described in Section 4.1, and sent to the server.

51 CHAPTER 4. DATA EXTRACTION

4.2.2 Android

On Android we developed a complete prototype, we designed a user inter- face that allows to choose the type of data to backup (i.e., contacts, calendars, files, SMS, application settings) and the backup’s type (full or incremental). Before restoring it, it is possible to select the backup to restore. Backup and restore tasks have been realized through asynchronous tasks in background using the AsyncTask class provided by the framework itself, which allows to notify the UI thread with results without the need of specific handlers. HTTP requests/responses have been managed using the well known HTTP Client of Apache’s Jakarta Commons project. To extract data it was necessary to bypass the Android’s access policies. Each Android application has its own sandbox which the other applications cannot invade, but for explicitly declaring some permissions. It is possible to access applications’ private data only if the appli- cation provide a Content Provider, which makes possible to access to private data of applications in a uniform manner. So private data of contact, calendar, SMS and media file applications have been accessed through the respective content providers. Calendar data have been accessed directly from the SQLite database. Each application stores its own persistent settings in a XML file con- tained in the shared prefs private directory; there is no way to access such information if application does not implement a Content Provider. This limi- tation was overcame elevating access permissions of the backup application as root (http://www.koushikdutta.com).

52 4.3. PERFORMANCES

Data Type msec units msec/unit contact 81315 150 542 SMS 80877 170 476 calendar event 5544 14 396 file 278980 3 92993

Table 4.4: Time overhead of the backup operation per data type

4.3 Performances

The system developed have been tested on a HTC Legend device connected to a 54Mbps WIFI network on a secure HTTPS channel. We aimed at measuring the time overhead introduced by our system, and thus we measured the time needed to execute single backup functions. Table 4.4 shows the times needed to backup a commonly used smartphone, with 150 contacts, 170 text messages, 14 calendar events and 3 files of size 104.796 KB, 5.659 KB and 161.166 KB. Clearly, the most expansive operations are on files; to save the 3 files, the application needs 278 seconds, which is 71% of the total time needed for the backup. The total overhead to perform a full backup of the device amounts to 447 seconds (about 7 minutes), preserving the usability for real use cases. The most common operations are on incremental backup, hence, in the last column of Table 4.4

4.4 Concluding remarks

In this chapter two different approaches and implementations of backup sys- tems for mobile devices have been described. These approaches must be com- bined to realize a powerful backup system implementing a differential, re- source based, scheduled backup with a snapshot approach. Focusing the backup target on mobile devices, for some classes of data

53 CHAPTER 4. DATA EXTRACTION backup can be performed online monitoring resources such as address book, calendar and other resources updated frequently and keeping the most impor- tant data, for the device’s user, up to date. Combining the two approaches described with the proposed data model, the backup system coming as output will take advantage from the first ap- proach to maintain a snapshot of the system’s status for a fast restore; moreover the first approach can be used to handle resources such as multimedia files and non structured databases. The second approach is really powerful to handle structured data. Such kind of information is sent over the network exploiting the proposed data model. The server will take advantage of data “formatted” using the data model to build an interoperable data structure accessible from all classes of mobile devices and mobile operating system. Even if we use the first approach to save data residing on the device on the external memory and send these data to the backup server, personal informa- tion are contained into the backup in raw format. The reader will see in the first part of Chapter 5 a methodology proposed to extract personal data from raw backup files.

54 5 Data elaboration

Introduction

The second step to be performed in a backup process is to save data to a loca- tion different from the location being backed up. This phase can be performed in several ways and, obviously, results obtained will be different. The most use- ful approach is not to backup the device as is, with all the system files, but get only the information useful for the user. Mobile device operating system usu- ally can be restored using some key combination or specific commands. For example for Symbian devices typing *#7780# resets the device without eras- ing user’s data, *#7370# deep resets the device even erasing user data. Saving system data is completely useless, on the other hand the user is interested in restore his/her personal data i.e., contacts, messages, calendars, files . . .

The first part of this Chapter shows how personal data can be extracted from the raw backup of a mobile device; the extraction is performed using the methodology we proposed in the International Conference on Ultra Modern Telecommunications 2009 [42].

55 CHAPTER 5. DATA ELABORATION

In the second part of the chapter we describe how data can be managed both on device and in a server in a smarter way, using the approach published in the 4th IFIP International Conference on New Technologies, Mobility and Security [28]. Personal data is extracted directly on the device, using the second approach described in Chapter 4, or from a raw backup, using the approach described in Section 5.1, and it’s saved in a common database to grant interoperability between different devices.

Data are contained inside smartphones as files. These files can be accessed in several ways, the first approach described in Chapter 4 handles each file in the same way, whether it contains a picture or the address book. The second approach focuses its interest, exploiting operating system’s API whenever pos- sible, into data contained into the databases, and not into files containing data. These two methodologies approach the problem in a too much different way, and obviously the second approach cannot be used to extract logical data, such as contacts, from a backup stored using the first approach. Information con- tained in some files, such as Contacts for Symbian, backed up using the first approach, is useless if not processed to extract interesting data from these files. Files containing data stored in a smartphone can be divided mainly into two classes:

Non structured files are files such as multimedia files, text files or PDF docu- ments saved on the device during its usage, these files are accessible and can be restored on other devices without any further processing;

Structured files these files usually are databases containing information used

56 5.1. REMOTE ELABORATION

by device applications such as address book, calendar or text messag- ing. To save the information contained within a structured file is more important than to save the file itself; as it enables the backup system to store such information into a common data structure, to allow different devices to interoperate (see Section 4.4).

5.1 Remote elaboration

After extracting the file system’s logical dump from a smartphone (see first approach described in Section 4.1), the dump is sent to the backup server de- scribed in Section 5.4. When the logical dump is received from the client the server needs a method to decode personal data stored within several mobile DBMS files and to make them available to other applications. Such DBMS files contain actual and obsolete data, i.e., old or deleted entities; this occurs be- cause the mobile OS, for performance reasons, defers the deletion as long as possible, e.g., when the free space available in the file system is not enough. It could be useful for a backup system to recover even erased information even if this information has not been backed up. Unfortunately deleted information is not accessible using DBMS APIs provided by manufacturers (when avail- able). Therefore we chose a Data Reverse Engineering (DRE) approach to re- trieve and decode the storing format. In the traditional architectures (PCs and mainframes) the DRE was studied as business solution either for the control of data handled via legacy applications or in order to reconstruct deteriorated data. Developed models are too generic for mobile environments [43], or they aims at discovering mainly the data model [44], [45], [46], or have been stud- ied to address vertical problems like extracting data from COBOL, DB/2 [47] or Access. For our scope, we are not interested in discovering the data model

57 CHAPTER 5. DATA ELABORATION

because we know a priori which data we are looking for (e.g., all the user con- trollable data attributes like contact’s name and surname or SMS text), and we do not care about the relational structure. Moreover, a great facility given by a methodological DRE application, is that, when file formats change, after re- applying the methodology we are able to update our knowledge about how data are stored.

In this section we propose a methodology allowing smartphone’s DRE op- erators to be more flexible in the mobile file formats knowledge. As a matter of fact, the mobile phone environment is composed of a plethora of manu- facturers and operating systems, each of them is released in several versions which stores data in different formats. Handling such heterogeneity through a methodological approach is an important asset to allow the system to decode different platform’s databases.

The DRE methodology has been proposed to solve mobile forensic prob- lems due to lack of standards; we applied it to the backup case with success. As a case study we applied these methods to the Symbian OS, and we obtained several results, including the mapping between a given data and its location into the file system, the obsolete data recovering, and the Symbian personal databases format reversed. The obtained results (see Section 5.3) show that our methodology can be successfully applied to environments which are different from the forensic starting point. The methodology helps to decode databases files and to develop ad-hoc parsers; data extracted by such parsers can be eas- ily converted and used to perform tasks such as backup, user profiling, device syncing and data recovery. A flow-chart of the methodology is shown in Figure 5.1.

58 5.2. OUR STEP-BY-STEP METHODOLOGY

Stage 2: Stage 0: Stage 1: Stage 3: Data hypothesis Choice of the Files of interest Sequences similarity and entities objective identification discovery injection

Stage 4: Data interpretation

No Yes

Is it sufficient Stage 6: Goal to modify the Error correction reached? hypothesis?

Yes

No Objective No Stage 5: reached? Meta-format building

Yes Stage 8: Testing Stage 7: & Parser building debugging

Figure 5.1: The methodology flow

5.2 Our step-by-step Methodology

Smartphone’s operating systems save personal data in many DBMS tables which are stored in binary files. Often the format of such files is not public and the tools available to read them rely on an operating system native API (if they run on the device) or on a porting of their code (if they run on a PC), and, when available, they can not retrieve deleted or modified data. Therefore, a solution is to interpret the binary file directly in order to give a structure to the internal data. Initially the problem was addressed through the comparison of multiple files of the same type, relying on the analyst’s ability in the intuitive interpretation of the data content. In such way the analysis of data was of-

59 CHAPTER 5. DATA ELABORATION ten confused and led to performing redundant operations without any result. Therefore, in order to preserve obsolete data, we chose to design a methodol- ogy for the binary file interpretation, which was able to decode the information required without performing redundant operations. Furthermore, the method- ology will help to retrieve the data alterations and deletions. Our main contribution is to propose a wisdom-driven DRE methodologi- cal approach to decode smartphone’s personal data, that are stored in several DBMS-managed files; with the contribution of this chapter we provide the tools to reach the following targets:

• understand where information is stored in the mobile device’s file sys- tem;

• retrieve and decode personal actual and obsolete data;

• develop a suitable parser.

Stage 0 aims at choosing which kind of information (the objective) we want to find and how it can be decoded. An objective is composed by one or more goals. We may think of an objective as an entity (e.g., a contact, or a call log, or a SMS) composed of one or more fields (e.g., for a contact, the first name, the last name, the phone number, etc.), which are the goals of our objective. Stage 1 aims at identifying which files (file of interest) could contain data (our goal) we wish to decode. With Stage 1, the methodology enters in a iterative process which allows to understand the binary format of data by comparing different versions of it. In Stage 2 some assumptions about the data type are made. Such assump- tions lead the choice of sample instances of entities to be inserted into the de- vice’s databases. Instances are stored as records which are contained in one or

60 5.2. OUR STEP-BY-STEP METHODOLOGY more binary files. If required, the hypotheses made in Stage 2 will be refined in Stage 6 and the instances may change. The number of instances inserted will determine the number of comparisons among binary records, that will affect the precision of next Stage. Stage 3 deals with the binary files’ content format- ting, in order to make the data instances inserted in Stage 2 identifiable and comparable. Usually, we try to group similar zones within the same sample binary file, and among different sample binary files, and then proceed to the interpretation.

Formatting must take into account the data interpreted successfully in pre- vious iterations, in order to cut them off (i.e., data already analyzed) from the study of a new format. The Stage 4 comprises two sub-tasks: the first deals with identifying candidate bytes sequences, and the second aims at decoding the candidate bytes sequences. The identification of candidate bytes sequences is performed by removing all the sequences that do not match with the hy- pothesis of the Stage 2. The second task tries to find the connection between the data inserted in Stage 2 (the instances) and its binary representation. As depicted in Figure 5.1, the methodology iterates through Stage 1, 2, 3, 4, and 6 (error correction) until a goal is reached, i.e., the information about the format of a entity’s field is exhaustive and a mapping between the field and its binary storing format is found. The fifth Stage simply annotates in a meta-format all mapping information found. If the joining of all meta-format found allows the decoding of the entire objective (the information needed) identified in Stage 0, the methodology goes to Stage 7. At this Stage a piece of software able to decode automatically the now-exposed file format will be designed and imple- mented. All collected knowledge about the format turns into a set of software requirements. This process must be repeated for each file marked as file of in-

61 CHAPTER 5. DATA ELABORATION terest. Such a piece of software will be tested at Stage 8. In the following sections we will describe each methodology Stage.

5.2.1 Stage 0: Choice of the objective

Before starting, we must to choose from which data we want to start the de- coding process. We define as objective the type of personal data (e.g., contacts, SMS, email, calendar, events log, etc.) we want to find into the device’s file system and to decode the binary format. An objective can be seen as the set of “atomic” goals that must be completed in order to reach the objective. For instance, in order to decode the contacts (the objective), after having detected in which file (or files) they are stored, we have to find how the contact’s data elements (goals) are encoded. Such goals are attributes such as name, surname, mobile phone number, e-mail, street address, etc.

Definition 1 Let an objective Γ be a set such that it contains the list of goals we want to reach.

Γ = γ1 . . . γn

In this Stage we can only define roughly an approximation of Γ: thanks to information about the objective’s data format that we will learn progressively in the next Stages, we will be able to refine Γ with more accurate goals.

5.2.2 Stage 1: Files of interest identification

Given the objective chosen in the previous Stage, this step aims at identifying files to be analyzed and decoded in next Stages. Mobile devices save personal data in database files stored persistently in the file system. To identify the files containing the information we are looking for, we first need to cause a lot of

62 5.2. OUR STEP-BY-STEP METHODOLOGY

changes inside these files in order to make them identifiable. These changes are objective-dependent: if we are looking for contacts, we will generate activ- ity like contact insertion; if we are looking for events log, we will make calls, sim-changes, and send and receive SMS. Each of these operations generates an entity (E) which will be stored as one or more records in the file system. Each entity E is a set composed by m ∈ N attributes ().

Definition 2 For each goal γi ∈ Γ there is a set of attributes j ∈ E such that, after discovering the encoding of each j in the set, the goal γi will be reached.

Definition 3 We define Ω as the sequence {E1,...,En} of entities we have to insert in the device in order to modify all the files involved in the given objective.

The value of n depends on the objective’s type and on how its entities are stored. Then, n can only be supposed as the process starts, but it could be refined over the methodology’s iterations if needed.

For instance, let E be a contact’s card: each i ∈ E will be an attribute such as name, surname, date of birth, phone number, email address, and so on.

As a best practice, there is the need to fill every i ∈ E attribute in order to modify all possible files involved in Γ.

Definition 4 Let A be the fileset (in our case, the whole device’s file system) before performing the Ω operation set on the device. Let B br the fileset after performing Ω operation set. The application T of operations set Ω on the device is:

TΩ : A → B

Definition 5 Let diff denote the function which computes the differences between two

63 CHAPTER 5. DATA ELABORATION

ε1 John ε1 Peter

ε2 Brown ε2 White

ε3 +123423456 ε3 +19280023

ε4 [email protected] ε4 [email protected]

εm some_info εm some_info

E E Ω ={ E1 2 n }

Figure 5.2: The format of the Ω operations sequence. In this figure is shown an example with contacts discovery as objective

filesets. The fileset C, which contains only files modified by the T application, is:

C = diff (B,A)

C may contain garbage data, since other operations may occur when the user performs T . Then, we must “clean” C, searching and deleting all irrele- vant data.

Definition 6 Let clean denote the function which cleans a fileset of garbage data. The fileset Φ is: Φ = clean(C)

5.2.3 Stage 2: Data hypotheses and entities injection

After the insertion of the Ω entities, the Φ set tells us which files have been modified, but it still does not give us information about how the i are encoded

64 5.2. OUR STEP-BY-STEP METHODOLOGY in the storage. In Stage 2 we will perform three tasks:

1. We make assumptions about the possible i format. The Λ set represents the collection of assumptions we made at this Stage. Λ is composed by assumption about data type, size and predictability. The latter indicates if

we can control the value of i. Possible values of predictability can be the following:

• controllable: the attribute corresponds to any input field and the user can fully control its value. An important property, for the method- ology application, is that controllable attributes can be stored more than once in the device, and the corresponding byte sequence is al- ways the same. In contacts case, controllable attributes are input fields like name, surname, phone number, etc. If we hit the right type and size, we will be able to predict the binary (hexadecimal) version of the data.

• uncontrollable: the attribute does not correspond to any input field and the user is prevented from handling its value; there is no way to predict the binary version of the data. In the contacts case, the contact’s ID is an uncontrollable attribute, because it is transparently assigned by the system.

• pseudo-controllable: the attribute does not correspond to any input field and the user is prevented from handling its value, but it can be partially predictable in its binary version. For instance, if we store two contacts in the same day, the year/month/day part of the inser- tion date (the 6 most meaningful bytes, for 8-bytes date format) will

65 CHAPTER 5. DATA ELABORATION

be the same for both of them.

2. Once the assumptions at the previous point have been made, we generate a set Ω0 of sample entities which have all attributes but the i-th set to NULL:     1 = NULL 1 = NULL    ...   ...  0     Ω =  i = v1  ,...,  i = vk       ...   ...     m = NULL m = NULL 

0 where |Ω | = k, i = va ∈ {v1 . . . vk}, j = NULL, ∀ j 6= i. Values

va will be chosen as they will be easily identified trough all file bytes

in the next Stages. A good choice for va values critically influences the

subsequent steps; in the early iterations of the methodology, va should be chosen with values that, disposed in the Ω0 entity sequence, follow a periodical repetitive pattern (e.g., AABB, ABAB, AAAA, etc.). Thanks to this approach, in the next Stages we will be able to retrieve them through a pattern similarity matching, avoiding ambiguities in the modified file’s zones caused by insertion side effects.

3. Finally, we have to insert Ω0 entities into the device through an applica-

tion TΩ0 , and then we have to perform a new file system dump in order to analyze the files generated via the insertion performed in the previous task.

The output of this Stage is Λ and Φ0, the set composed by all files containing Ω0 entities.

66 5.2. OUR STEP-BY-STEP METHODOLOGY

5.2.4 Stage 3: Sequences similarity discovery

The goal of this Stage is to get the Φ0 fileset, containing the sample entities, and to find all sequences of bytes which present the same similarities as the attributes of Ω0 entity set inserted in Stage 2. In the previous Stage, we injected entities which shared one or more attributes among them. The attributes of entities was injected following a pattern, like pairs of calls with the same du- ration or contacts with the same fields. In this Stage we have to highlight the file’s byte sequences which are equal among them. In the call duration exam- ple, if we made c pairs of calls with the same duration, we will find c equal pairs of byte’s sequences in the events log file. Therefore, if the assumptions of the previous Stage were correct, the current step simplifies the interpretation tasks in the next Stages reducing the file’s complexity. The Stage 3 process iterates through the following steps:

1. Discard file zones which are not directly affected by the operations in Stage 2;

2. Identify attribute separation flags;

3. Identify, highlight and separate similar byte sequences.

In Figure 5.3 is shown an example in which we are going to format the event log file to detect the storage format of the voice call duration. In the previous Stage we made pairs of calls of the same duration (Figure 5.3a). All the useless information (metadata, index and tables) was discarded and similar zones were looked for in accordance with the methodology described. Once similar zones are identified (Figure 5.3b), they have to be formatted in the same way to enable the next Stage to refine the identification and to

67 CHAPTER 5. DATA ELABORATION

00 11 F2 66 B8 58 2A E1 00 01 00 11 F2 66 B8 58 2A E1 00 01 00 11 F2 66 B8 60 02 00 00 00 00 00 1B 00 00 60 02 00 00 00 00 00 1B 00 00 58 2A E1 00 01 60 02 00 63 02 30 06 31 31 39 02 0C 00 63 02 30 06 31 31 39 02 0C 00 00 00 00 00 1B 00 00 00 46 00 DD 6E 82 BD 58 00 00 00 46 00 DD 6E 82 BD 58 00 00 00 63 02 30 06 31 31 39 2A E1 00 01 60 03 00 00 00 00 2A E1 00 01 60 03 00 00 00 00 02 0C 00 00 00 46 00 1B 00 00 00 63 02 30 06 31 00 1B 00 00 00 63 02 30 06 31 00 DD 6E 82 BD 31 39 02 0C 00 00 00 46 00 9C 31 39 02 0C 00 00 00 46 00 9C 58 2A E1 00 01 60 03 7F 98 C3 58 2A E1 00 01 60 04 7F 98 C3 58 2A E1 00 01 60 04 00 00 00 00 00 1B 00 00 00 00 00 46 00 00 00 63 00 00 00 00 00 46 00 00 00 63 00 00 00 63 02 30 06 31 31 39 02 30 06 31 31 39 02 0C 00 00 02 30 06 31 31 39 02 0C 00 00 02 0C 00 00 00 46 00 46 00 C1 EB 9C C9 58 2A E1 00 46 00 C1 EB 9C C9 58 2A E1 00 9C 7F 98 C3 00 01 60 05 00 00 00 00 00 46 00 01 60 05 00 00 00 00 00 46 58 2A E1 00 01 60 04 00 00 00 63 02 30 06 31 31 39 00 00 00 63 02 30 06 31 31 39 00 00 00 00 00 46 02 0C 00 00 00 46 00 FB 96 CA 02 0C 00 00 00 46 00 FB 96 CA 00 00 00 63 02 30 06 31 31 39 CF 58 2A E1 00 01 60 06 00 00 CF 58 2A E1 00 01 60 06 00 00 02 0C 00 00 00 46 00 00 00 C8 00 00 00 63 02 30 00 00 00 C8 00 00 00 63 02 30 ... 06 31 31 39 02 0C 00 00 00 46 06 31 31 39 02 0C 00 00 00 46 ... (a)(b)(c)

Figure 5.3: These figures show an example of a DBMS binary file before and after the Stage 3. In (a) the sample file after making pairs of calls of the same duration (Stage 2). In (b) equal sequences highlighted. In (c) the formatted file Φˆ0 understand which file parts were changed after the Ω0 entities insertion. We must separate the user added information from other file data (Figure 5.3c). A good file formatting is given by isolating different file zones from similar ones, and then by isolating flags. The output of this Stage is the Φˆ0 containing the formatted fileset.

5.2.5 Stage 4: Data interpretation

Stage 4 is composed of two steps; the candidate sequence identification and the candidate sequence interpretation.

Definition 7 The candidate sequences are sequences of bytes, stored in the Φˆ0 fileset, in which we are likely to find the data we are looking for. ΣΓ,Λ is the set of candidate sequences for a given objective Γ, and under a given assumption Λ.

The candidate sequence identification relies on the hypothesis about at- tribute data properties made in Stage 2, and it deals with simplifying the se-

68 5.2. OUR STEP-BY-STEP METHODOLOGY quence, deleting all non-relevant data. In particular:

• If the data is constant it is always stored in the same format, so the for- matted files containing the data can be simplified by removing all the different bytes; if the data’s size is equal to the size in the assumptions

made, such data is added to ΣΓ,Λ;

• If the data is variable probably the storing format will be always differ- ent, so all the equal formatted files parts can be removed to simplify. If the data size is equal to the size in the assumptions, such data is added

to ΣΓ,Λ;

• If the data is pseudo-variable the storing format will be partially constant and partially variable; we have to look for the constant parts of the file and, then, we can look at the proximity of the constant zone in an area

with its size equal to the hypothesis. Then the sequence is added to ΣΓ,Λ.

If ΣΓ,Λ = ∅ or |ΣΓ,Λ| is large (unmanageable quantity), in order to reduce the number of resulting candidate sequences, we have to analyse the results and understand how to change the Λ assumptions made in Stage 2 (through Stage 6). Once the assumptions are modified and the new Ω0 entities are inserted in the device (reiteration through Stages 2, 3 and 4), the precision of this Stage will improve.

When we reach a manageable size of |ΣΓ,Λ|, the candidate sequence inter- pretation task can start. In this step we consider Λ to better understand which part of the candidate sequence represents the data we are interested in. We look at the Ω0 sequence of operations and check if the sequence does match in the candidate sequence set. If the sequence of attended values of attributes in Ω0 is the same in the ΣΓ,Λ, the sequence is ready to be interpreted. As the database

69 CHAPTER 5. DATA ELABORATION

files are usually in hexadecimal format and the target data are in a different for- mat (e.g., string, decimal format), it is necessary to transform data in a common format (e.g., decimal). The last step to be performed is to compare data contained in the database with the data inserted in Ω0 entity sequence and, if those match, the storage format is saved and the next Stage starts.

5.2.6 Stage 5: Meta-format building

After the data decoding in Stage 4, we need to store the information collected in a intermediate format. This Stage should be seen as a “methodology inter- mediate status saving”, which helps the operator to choose the next γ goal to process, and to refine it if required. Before compiling the meta-format, this Stage requires the compilation of a “formats table”. In such a table a list of data discovered at Stage 4 is reported, and for each data the following metadata are shown:

Field Name Is a text placeholder associated with the data. This label will be substituted to the data value in the Φˆ0, in order to make its retrieval easier.

Size The size of data, expressed in bytes.

Description Other information, useful to the parser building Stage, like: which information is held by the field, type of data, endianess, suggestions for the automatic data localization, etc.

Example An example value of the field.

Each discovered data needs a row in the table. An example of formats table is shown in 5.4a.

70 5.2. OUR STEP-BY-STEP METHODOLOGY

Field Name Size Description Example ID 4 Int, Bigend B6 03 00 00 NAME LEN 1 Int, Littleend 0E NAME LEN NAME ( 2 ) String 43 6C 61 75 64 ......

(a) A table with pseudo data type, got as output by Stage 4.

B6 03 00 00 0E 43 6C 61 75 64 69 61 0A 44 72 61 67 6F 10 55 6E 69 72 6F 6D 61 32 09 13 00 10

(b) The meta-format file before Stage 5.

ID NAME LENGHT NAME SURNAME LENGHT SURNAME COMPANY NAME LENGHT COMPANY NAME CXF1

(c) The meta-format file after Stage 5.

Figure 5.4: This three figures depict an example of the application of Stage 5 on a file containing the phone’s address book.

71 CHAPTER 5. DATA ELABORATION

After compiling the formats table, the meta-format file will be equivalent to the sample binary file purged from non-relevant bytes. Data such as headers, indexes, etc, can be deleted if they are not relevant for the purposes of the objective. The first step to be performed is to identify, for each entry in the table, the values with which the data is manifested into the meta-format file (Figure 5.4b) and to replace them with the related labels in the table (figure 5.4c). In this way all relevant data in the meta-format file will be replaced by placeholders that will be easily detected at the parser building Stage. The example shown in Figure 5.4 takes into account a contacts file contain- ing two records with following fields: name, surname and company. After this Stage, the given binary file could be automatically interpretable, if all the following conditions are satisfied:

1. The meta-format’s data and values not yet identified have a static size, so they can be ignored. In this case the parser is able to skip them automat- ically;

2. All required meta-format’s data and values are identified;

3. If after having tried different hypotheses of Ω0, the identified zones in the meta-format did not change at all, then the meta-format file and the formats table are stable.

5.2.7 Stage 6: Error correction

This Stage will be performed if the current γi was not reached (e.g., Stage 4 was unable to find a correct interpretation for the i representation) and it is mandatory to re-iterate the methodology. The error leading to this Stage can be

72 5.2. OUR STEP-BY-STEP METHODOLOGY caused by two cases. In the following list we show the actions to be performed in the next iteration:

1. ΣΓ,Λ = ∅ or |ΣΓ,Λ| is high (unmanageable quantity): if there are no can- didate sequences or there are too many, some backtracking needs to be performed to obtain a manageable number of candidate sequences. Some actions may be useful to do this:

(a) Changing the assumed data size. This implies reformatting the Φ0, 0 building up a new Φˆ . If ΣΓ,Λ = ∅ and we are looking for matching sequences, we need to decrease the size. Two different big sequences might contain two matching smaller sequences. On the other hand, if we are looking for non-matching sequences, the size needs to be

increased. In the case where |ΣΓ,Λ| being high, if we are looking for matching sequences we need to increase the size, and decrease it for non-matching sequences.

0 (b) Modifying Ω , adding or deleting entities, or changing the i values. A new Ω0 could give as output more accurate results. The changes should be done according to the feeling of the operator, this is the hardest part of the whole process and the operator’s skills play the starring role.

(c) Verifying Φ0 correctness. Verify that the file we are looking into is the right one (the required information may reside in another file).

2. If the interpretation of candidate sequence did not decode any informa- tion about the storing format:

(a) Changing the assumed data size.

73 CHAPTER 5. DATA ELABORATION

(b) Modifying Ω0. If an ambiguity among different candidate sequences happened, modify Ω0 in order to restrict the change to less bytes;

(c) Changing the data type. Changing the data type might help the decoding from hex.

If none of the above cases apply, or the suggested changes did not lead to a correct data interpretation, we need to review the current γi goal in Stage 1. Each reached γ reduces the space of assumptions we are free to choose to build Λ (and Ω0 as a consequence) for other γ.

5.2.8 Stage 7: Parser building

This Stage takes as input all collected knowledge about the given binary file format. The operator should be able to write a program that reads data from the logical dump of the smartphone and converts them in a XML format. It is mandatory to implement a quality monitor that measures the number of F entries in which the parser encounters problems. The ratio r = T between the number of failures (F ) and the total number of entries (T ) will be an indicator of the need to perform additional methodology’s iterations. The threshold below which r is acceptable depends on the required accuracy.

5.2.9 Stage 8: Testing and debugging

In this phase the parser produced in the last Stage will be applied on several logical dumps, in order to test it and to debug it over real cases. In this Stage the r values of the current parser it will be verified and will be established if the implementation precision is sufficient or not.

74 5.3. REMOTE ELABORATION RESULTS

Case Study Information Detailed Information Logdbu.dat Event Log SMS previews, MMSs, e-mails, calls, video calls, PRSConnection, SIM/MC change. Calendar Memo Daynotes, meetings, anniversaries Contacts.cdb Contacts Contacts information Mail folder SMS/MMS/Email Sender, receiver and body

Table 5.1: Symbian files of interest

5.3 Remote elaboration results

In order to verify and to refine the methodology’s Stages, we took the Symbian S60 operating system as a case study. Applying the methodology produced the results we are going to show in this section.

File of interest - Stage 1 helped us to find a list of files containing SMS, MMS, contacts, and all user’s personal data, which are shown in Table 5.1.

Symbian personal data files format - Thanks to the methodology we have been able to reverse engineer the Symbian S60 DBMS file format. We applied the methodology to the contacts list, to the calendar, to the text/multime- dia messages and to the phone’s event log (which contains calls, sent and received SMS/MMS preview and SD card ad SIM changes). The com- plete format is explained in Appendix A.

Obsolete data - Among information identified and retrieved in the case study, we were able to find obsolete data which were not purged from the file system. The DBMS resources optimization strategy, in fact, reduces the high-cost of DB’s modify/delete operations by flagging them as “obso- lete”: for these reasons the modify/delete operations are scheduled as late as possible, and the circumstance when they are performed varies

75 CHAPTER 5. DATA ELABORATION

depending the kind of file. For instance, in the Symbian case, in Con- tacts.cdb the deleting operations are performed when the Compress() syscall is invoked. Operating system tasks and third-party software as well can invoke this function, and they are able to know whether or not to perform compression by invoking CompressRequired() (see [48]). Let S the disk total space, F the free disk space, and W the amount of disk space wasted; the boolean function returns true if:

1 (W > 64K) ∨ (W > 16K ∧ W > 2S )∨ 1 (W > 16K ∧ F < 20S )∨ (W > 16K ∧ F < 16K)

After a compression is performed, the contacts are rearranged, the space wasted by obsolete records is recovered and there is no way to recover obsolete data. If the extraction operation occurs before the compression was invoked, we will find a database file that will contain all data since last compression. For case studies related to the contacts, calendar and event log, enough information was decoded in order to reconstruct the owner communication history. In the case study of messages (SMS, MMS and emails, stored in the /System/Mail folder) we were not able to find erased data, because OS purges immediately deleted messages to optimize the available storage.

Unexpected information - A part of data attributes are not controllable by the user, i.e., she can not insert them into the system explicitly, thus we were not conscious of their presence. During Stage 4, the nature of our method- ology helped us to retrieve such “hidden” information, as the record’s ID and its creation date. Such an important result enforces the methodology effectiveness, since it is able to detect more goals than the identified ones

76 5.4. LOCAL ELABORATION

in Stage 0. In our case study, some unexpected information helped us to better understand the data model, thus the application’s behaviour.

We applied the methodology to more than 50 device dumps. At the be- ginning, the first dumps we studied came from Nokia N70 devices1, but we realized that the knowledge we had about the S60 format was still incomplete since the parser was unable to decode an older phone’s dump (Nokia 7610). After applying a few iterations of the methodology, we built a parser able to interpret the new format.

5.4 Local elaboration

Local elaboration requires less work to be performed by the server part of the system. Differently from the remote elaboration case, the most part of local elaboration is performed on the mobile client (see Section 4.2). The server side of the system in this case must provide to the clients the API to communicate, save and restore backups data. Following the cloud paradigm these API are provided as web services. A set of REST web services have been implemented, we choose to implement the web services using a REST archi- tectural style because we wanted to exploit the HTTP protocol facilities. HTTP grants the system to be scalable, easy to be maintained and provides a secure transmission level (HTTPS) without implementation effort. Mainly REST ar- chitectural style is suitable for our purpose as the server’s tasks can be per- formed using PUT and GET requests. Figure 5.5 shows the server architecture. We designed a tree level architec- ture following the Model-View-Controller (MVC) pattern [49]; on the top view

1Equipped with Symbian OS v8.1a, S60 Platform Second Edition, Feature Pack 3

77 CHAPTER 5. DATA ELABORATION

Apache Tomcat Integration

View Restlet XStream - XML (de)serialization HTTP GET/POST/PUT/DELETE

Control Business logic

Model ORM (Active Objects) DAOs - Java POJO

MySQL

Figure 5.5: The architecture of the backup server layer is contained the interface with clients. The interface has been realized using the RESTlet framework [50], [51]. REST web services are exposed using the Apache Tomcat [52] application server and accessed via standard HTTP(S) GET/PUT/POST/DELETE methods. Data are sent over the network via XML, object representations are serialized and deserialized via the XStream library. The control layer shown in the center of the figure implements the business logic of the backup system. Business logic does not contain only functions to handle data to be saved into the database; this layer contains even parsers im- plemented as result of the application of the methodology proposed in Section 5.2.

78 5.4. LOCAL ELABORATION

The model level has been developed using the Active Objects ORM [53] which allowed us to interact with the MySql database directly using standard Java objects (POJO) [54]. The server provides REST API to perform full and an incremental backup. The server provide an interface to backup and restore contacts, calendars, text messages, multimedia messages, emails, application and system settings. To access the backup and restore services the client must authenticate through username and password. On the first interaction between client and server the server expects a full backup, the server will store all the data sent by the client to the database. Each time the client and the server interact the server on the bootstrap phase of the communication the server sends the list of flies, saved into the server, representing the last version of the information; the client creates the list of files to be sent and later using the proper methods will store/update/delete the files from the last version of the backup. To create the list of files to be backed up the client uses the MD5 hash and the last modification date of the files processed using the first method de- scribed in Chapter 4, while for the contents inside the databases, uses the list, given by the server, containing the modification date of each entry, extracts the new/updated/deleted data and creates the lists to be processed to synchronize client and server. A typical interaction between client and server starts with the insertion on the server, through the /backup/{backupType}/device/{imei} method, of a backup item; in this way the server can identify the type of backup per- formed from the backupType parameter and the device by the imei parameter. Identify the device is fundamental in case the user holds more than one device

79 CHAPTER 5. DATA ELABORATION and ho uses the system to synchronize these devices. The server answers with the list of resources composing the last backup. Subsequently the client sends all data using the proper methods and the server updates the date of the last backup performed the operation is straight- forward, and similar for all kind of data. The full XML communication protocol is detailed in Appendix B.

80 6 Protecting saved data

Introduction

Personal data are probably the most valuable to a user in today’s world. Some- body says that “data is the new oil” [55]. This information needs to be kept safe and accessible only to people explicitly authorized by the data owner. This can be achieved using authentication, security and privacy techniques; these tech- niques are usually based on cryptography. Unfortunately cryptography adds a lot of overhead to operations performed on data, and, even if mobile devices are becoming more powerful they still encounter performance and battery life problems. Cryptographic operations affect both by requiring to the device to execute more operation to achieve the same task.

In this chapter we show a novel key agreement algorithm based on the matrix conjugation method we presented in the 2010 SECRYPT International Conference on Security and Cryptography [56]. The algorithm has been imple- mented in J2ME and tested on real mobile devices. We also show the results of some performance test executed on a new encryption algorithm compared to

81 CHAPTER 6. PROTECTING SAVED DATA standard ones, presented in [57].

In the end of the chapter we present the framework to manage securely inter-process communication under Android. The framework is detailed in Grillo’s PhD thesis [58] and has been presented in the 2nd International ICST Conference on Mobile Computing, Applications, and Services [59].

6.1 Key agreement algorithm

In many cases a key agreement is needed to send/exchange private data/infor- mation by coding them with a specific algorithm. Some mobile cryptography use examples are [60], in which elliptic curves are efficiently used, and [61], [62], concerning trusted text messaging. All these works focus more on cod- ing/signing part than on key agreement, but of course a key agreement phase is needed before encrypting or signing. In this section we present a JavaME implementation of a new key agreement protocol – a particular case of a class recently proposed in [63] – and compare our implementation performance [57] against standard and Elliptic Curve Diffie-Hellman protocol [64].

In the next Section we explain the mathematical problem to be solved to exploit the key agreement, and some consideration upon possible attacks and why these attacks are not effective on such algorithm. In Section 6.1.2 the im- plementation choices are presented, analyzing why they do not affect security, optimizing performances. In Section 6.1.3 we analyze the testing methodology explaining each step of the testing phase. Section 6.1.4 shows the testing phase results. Section 6.1.5 analyzes with more detail the Section 6.1.4 data. Section 6.1.6 resumes all results proposing possible improvements and applications of the algorithm.

82 6.1. KEY AGREEMENT ALGORITHM

6.1.1 Mathematical setting: key agreement protocol

We consider GL(d, Zp) = M, where p is a prime number. Fix G ∈ M and let ϕ be the conjugation isomorphism associated to G

−1 ϕG : M 3 M 7→ ϕG(M) = GMG ∈ M

The following public key agreement between Alice (A) and Bob (B) – see [63] n n for a more general setting – exploits the property [ϕG(A)] = ϕG(A ).

1. A and B share Q, S ∈ M, with SQ 6= QS and det(Q) = |Q| = 1,

2. A chooses two numbers xA, nA ∈ N.

nA xA −nA 3. A computes MA = S Q S and sends it to B.

4. B receives from A the matrices MA.

5. B chooses two numbers xB, nB ∈ N, computes

nB xB −nB MB = S Q S and sends MB to A.

6. A computes MAB =

nA xA −nA nA nB xB xA −nB −nA S MB S = S (S Q S )S

7. B computes MBA =

nB xB −nB nB nA xAxB −nA −nB S MA S = S (S Q S )S

At the end A and B share the common matrix MAB = MBA, which repre- sents the Secret Shared Key (SSK). In fact,

nA+nB xB xA −(nA+nB ) MAB = S Q S

nB +nA xAxB −(nB +nA) nB xB −nB = S Q S = S MA S = MBA

83 CHAPTER 6. PROTECTING SAVED DATA

ALICE BOB

(nA, xA) (d, p, Q, S) (nB , xB )

MA MB n x −n n x −n MA = S A Q A S A EVE MB = S B Q B S B x x M = SnA M A S−nA M = SnB M B S−nB AB B BA A n n x −n x −n Unsecure Channell n n x −n x −n MAB = S A (S B Q B S B ) A S A MBA = S B (S A Q A S A ) B S B n n x x −n −n n n x x −n −n MAB = S A S B (Q B ) A S B S A MBA = S B S A (Q A ) B S A S B (n +n ) x x −(n +n ) (n +n ) x x −(n +n ) MAB = S A B Q B A S B A MBA = S B A Q A B S A B

(n +n ) x x −(n +n ) (n +n ) x x −(n +n ) MAB = S A B Q B A S B A = S B A Q A B S A B = MBA

Figure 6.1: Key Agreement process using conjugate.

Note that if |Q| 6= 1, a possible eavesdropper Eve (E) could set up a discrete logarithm problem by considering the determinantal equation [65]

nA xA −nA nA xA −nA |MA| = |S Q S | = |S ||Q ||S |

= |S|nA |Q|xA |S|−nA = |Q|xA with det(Q) known, if E can solve this scalar discrete logarithm problem, thus recovering xA, then she can easily find, by solving a linear problem, and adjust- ing the free parameters entering in the solution, a polynomial X in the matrix

xA S of degree ≤ d, with coefficients in Zp such that MAX = XQ . Using this, E can compute

xA −1 nB −1 xA −1 xB −nB −1 XMB X = (XS X )(XQ X ) (XS X )

nB xB −nB = S MA S = MAB because X commutes with S. In conclusion: if det(Q) 6= 1, then, the break- ing complexity of the algorithm is essentially equivalent to the breaking com- plexity of a (discrete) logarithm in Zp, i.e., to that of (scalar) Diffie-Hellman.

84 6.1. KEY AGREEMENT ALGORITHM

With det(Q) = 1 (see step 1 of agreement process), this “attack” cannot be per- formed. Figure 6.1 shows the agreement process performed by the algorithm. E could intercept S, Q, d, p, MA and MB. In order to recover the private keys

(e.g., nA and xA), she could set up the following equation

nA xA −nA nA −nA xA MA = S Q S = (S QS ) but this is much more difficult than a usual matrix discrete logarithm problem (DLP), as the base matrix is unknown. Other identities, such as

nA nA xA MAS = S Q are difficult to exploit because both SnA and QxA are not known separately. # Qd−1 d i We have that M = i=0 (p − p ). Let o(M) be the order of a matrix M ∈ M, i.e., the smallest integer such that M o(M) = 1. In order to avoid useless computations, it is sufficient to choose nA, nB < o(S) (resp. xA, xB < o(Q)). The order of a matrix M ∈ M is in general difficult to compute, but an upper bound for it can be found as follows. For each M ∈ M let pM (x) =

Qk di i=1 fi(x) be its characteristic polynomial factorized in Z[x], with α = max{di | i = 1, . . . k}. An upper bound (multiple) m(M) for its multiplicative order o(M) is given by the following formula [66]

m(M) = lcm(pd1 − 1, . . . , pdk − 1) · pdlogp(α)e

6.1.2 J2ME implementation

The previously described operations to perform key agreement have been de- veloped in Java Micro Edition (J2ME). We chose to implement in such program- ming language because we need a suite that can run on different hardware

85 CHAPTER 6. PROTECTING SAVED DATA platforms and operating systems. Moreover we noticed that a good perfor- mance evaluation can be obtained, comparing our implementation of the key agreement algorithm with Bouncy Castle’s implementation of Elliptic Curve and standard Diffie-Hellman key agreement algorithm. Bouncy castle provide a plethora of API performing different cryptographic operations implemented in JAVA, J2ME and C#, we used the Elliptic Curve Diffie-Hellmen (ECDH) and the standard Diffie-Hellman (DH) key agreement J2ME implementation to perform the comparison. The first step to implement the algorithm described in Section 6.1.1, is to implement the modular opera- tions on matrices (e.g., modular matrix multiplication, power, inversion, con- jugate and other ancillary operations). It is very important, in a mobile environment, to optimize every step of ev- ery operation with respect to resource consumption: in small capacity devices every waste of resources implies a delay, larger than the delay, in more per- forming devices corresponding to the same waste: because of the shortage of RAM, CPU and storage capacity, operations need to be optimized as much as possible. To perform the operations described in Section 6.1.1 we use a 32 bit un- signed integer data structure. Unfortunately in JAVA and J2ME there is no unsigned integer data structure; to solve this problem there are two possible approaches:

1. use bigger data structures, such as 64 bit signed long integer simulating a 32 bit size applying modulus when the value exceeds 232,

2. use available 32 bit signed integer combining it with arithmetical opera- tions modulus 231.

86 6.1. KEY AGREEMENT ALGORITHM

We have chosen the latter solution, i.e., to develop the modular matrix as a integer array (int[ ]) with modulo 231. This data structure is, in our opinion, the best compromise between RAM wasting and CPU usage due to operations needed to perform a task. Security of the key agreement is not affected using 31 bit integers, while performances are compromised, if one uses the 64 bit signed integer to simulate 32 bit unsigned integer. Using long integers the RAM consumption doubles and the system’s performances, in our opinion, degrade too much to justify the slight improvement in security.

6.1.3 Performance testing methodology

In this section we report our performance tests of Matrix Conjugation Based Key Agreement versus Elliptic Curve and standard Diffie-Hellman on a Nokia N70 platform. The Nokia N70 is a multimedia smartphone launched in Q3 2005. In 2007, it was the second most popular cellular phone, with 8% of all sales at Rampal Cellular Stockmarket[67]. Our experiments show similar results with other mobile devices. Nokia N70 is equipped with:

• CPU : Texas Instruments OMAP 1710 (ARM architecture 926TEJ v5) – 220 MHz processor

• RAM : 55 MB

• FLASH : 19.9 MB

• MMC : 2 GB

• SCREEN : 176×208 TFT Matrix, 256K colours

• BATTERY : BL-5C (970 mAh)

87 CHAPTER 6. PROTECTING SAVED DATA

• OS : BB5 / Symbian OS v8.1a, S60 Platform Second Edition, Feature Pack 3 operating system

• JAVA : MIDP 2.0 midlets

In a mobile device, in general, and using J2ME, in particular, there are several problems in measuring the time required for a given task, because the accuracy of the System.currentTimeMillis() function is not sufficient. We will use, as an estimate of the time length of a given task, the average of the time lengths, measured on several repetitions of the same task. More precisely:

Definition 8 Let n be the number of iterations of one task, and let θi denote the time needed to perform the ith task measured using the System.currentTimeMillis(). The actual time that the device needs to perform such task will be measured as follows:

n 1 X Θ = θ 6.1 n n i i=1 § ¤ ¦ ¥ It is an empirical fact that Θn becomes approximately independent from n, for “large” n. The size on n depends on the task is and usually smaller for longer tasks (i.e., larger Θi) , see Section 6.1.5 below. For each algorithm tested, we performed the above described operation for the most used instances of the algorithms; e.g., for the ECDH case we tested all the curves recommended by the NIST [68]. For what concerns standard Diffie- Helman and Matrix Conjugation Based Key Agreement analysis, we consid- ered instances with comparable private key length, in order to have an idea of brute force attack complexity with respect to performances.

88 6.1. KEY AGREEMENT ALGORITHM

Public Data Generation Key Agreement TOT

6000

5000

4000

3000

2000

1000

0

t t t t t t t i i i i ) i ) i i ) ) ) ) ) ) ) ) K k k k 2 8 4 t t 9 t t 6 5 6 9 4 1 0 1 4 i b b i b b i b i 1 b b 6 2 7 9 7 1 1 8 1 0 5 6 b 2 4 b 9 6 b 4 b 5 1 1 7 0 2 4 7 1 5 9 5 1 7 4 3 9 2 3 3 5 3 8 9 2 7 1 ( ( ( 1 1 1 2 3 3 4 6 1 2 3 2 2 8 3 0 H 5 5 H ( ( ( ( ( ( ( 2 3 2 4 4 5 H 1 D D

C C C C C C C 6 7 8 9 0 1 2 D C E E C E E C C E C C E E C 1 1 1

E E E C C C C E M M M C C C M M M M

M M M

Figure 6.2: Public data and Key Agreement generation time: all tests EC . . . bit: Elliptic Curve Diffie-Hellman with a . . . bit key EC . . . bitK: Koblitz Elliptic Curve Diffie-Hellman with a . . . bit key MC d (. . . ): Matrix Conjugation at dimension d with a . . . bit key DH . . . : Diffie-Hellman with a . . . bit key

Next section shows the experimental results of the comparison of various performances of different key agreement algorithms.

6.1.4 Performance evaluation

Here we show the results of all the tests performed on standard key agreement algorithms and protocols and on Matrix Conjugation Based Key Agreement. We compared the performance of Matrix Conjugation Based Key Agree- ment to other reference algorithms, such as Diffie-Hellman key agreement (DH) [69] and Elliptic Curve Diffie-Hellman key agreement (ECDH) [70]. We remark that these algorithms are the most used to perform key agreement operations in desktop and mobile environments. Among the NIST suggested Elliptic Curves [71], we select both Koblitz curves (ending with a K in Figure 6.2 and Figure

89 CHAPTER 6. PROTECTING SAVED DATA

Public Data Generation Key Agreement TOT

1200

1000

800

600

400

200

0

t t t t i i i ) i ) ) ) ) ) ) ) ) ) K k k 2 8 4 t 9 t t 6 5 6 9 4 1 0 1 4 i b b b i b i 1 6 2 7 9 7 1 1 8 1 0 5 6 b 2 4 6 b 4 b 5 7 0 2 4 7 1 5 9 5 1 7 4 3 9 2 5 3 8 9 1 ( ( ( 1 1 1 2 3 3 4 6 1 2 2 8 3 0 H H ( ( ( ( ( ( ( 3 2 4 4 5 H 1 D D

C C C C 6 7 8 9 0 1 2 D C E E E C C E C C C 1 1 1

E E C C C C E M M M C C C M M M M

M M M

Figure 6.3: Public data and Key Agreement generation time: results with an upper bound of 1 sec. EC . . . bit: Elliptic Curve Diffie-Hellman with a . . . bit key EC . . . bitK: Koblitz Elliptic Curve Diffie-Hellman with a . . . bit key MC d (. . . ): Matrix Conjugation at dimension d with a . . . bit key DH . . . : Diffie-Hellman with a . . . bit key

6.3) and pseudo-random curves over GF (p).

In Figure 6.2 the time comparison between Matrix Conjugation Based Key Agreement (MC in Figure 6.2 and Figure 6.3), standard and Elliptic Curve Diffie-Hellman is shown. We can note that conjugation based key agreement generates the public data and the SSK faster than the other algorithms. Since in Figure 6.2 the difference in generation times for the secret and the key agree- ment is not really significant, we illustrate in Figure 6.3 a closer look to show better the differences in time.

While a key agreement using Elliptic Curve with a 571-bits key takes 5706.3 milliseconds, a key agreement using conjugation based key agreement with a 5 × 5 matrix (775-bits key) takes only 20.63 milliseconds. This difference is sig-

90 6.1. KEY AGREEMENT ALGORITHM

nificant even considering that the SSK generated by Matrix Conjugation Based Key Agreement is 50% larger than the Elliptic Curve SSK. Even when consider- ing the case of standard Diffie-Hellman, the differences in mobile environment look quite impressive; for example, a Diffie-Hellman 768-bits SSK is agreed in 343.44 milliseconds while a Matrix Conjugation Based Key Agreement 775-bits SSK takes only 20.63 milliseconds. These differences are illustrated in Figure 6.3.

6.1.5 Experimental results

Table 6.1 summarizes all the results obtained in the performance testing for the different classes of algorithms. Parameters field indicates:

• In the ECDH case, the type of curve that is used to generate the agreement (K indicates a Koblitz curve) and the size of the generated SSK;

• In the DH case, the size of the generated SSK;

• In the Matrix Conjugation Based Key Agreement, the matrix dimension and the bit size of the matrix generated as key.

Public Data Generation (Pub. Data) field indicates the time to generate the ex- changed data to agree a SSK. The field Key Agreement (Key Agr.) shows time needed to generate the SSK by means of exchanged and private data. In Total field the sum of times used to generate exchanged data and SSK is shown. The last field, Iterations (Iter.), indicates how many times the agreement has been performed. This field is useful to understand the accuracy of the values in the Public Data Generation, Key Agreement and Total fields. In all cases but ECDH we did 100 iterations; in ECDH cases we decided to use just 10

91 CHAPTER 6. PROTECTING SAVED DATA

Param. Pub. Data Key Agr. Total Iter. Elliptic Curve Diffie-Hellman 163bit K 110,90 100,00 210,90 10 192bit 185,90 195,30 381,20 10 224bit 298,50 281,20 579,70 10 233bit K 696,90 759,30 1456,20 10 239bit 1684,40 1626,60 3311,00 10 256bit 312,50 262,50 575,00 10 283bit K 407,80 442,20 850,00 10 384bit 493,70 415,70 909,40 10 409bit K 561,00 560,90 1121,90 10 521bit 1404,60 1342,20 2746,80 10 571bit 2845,30 2861,00 5706,30 10 Diffie-Hellman 512 37,51 68,58 106,09 100 768 116,25 227,19 343,44 100 1024 282,98 539,83 822,81 100 Matrix Conjugation Based Key Agreement 3 (279) 3,27 3,13 6,40 100 4 (496) 6,72 5,94 12,66 100 5 (775) 10,32 10,31 20,63 100 6 (1116) 16,72 15,47 32,19 100 7 (1519) 23,76 22,96 46,72 100 8 (1984) 33,90 31,41 65,31 100 9 (2511) 44,35 44,85 89,20 100 10 (3100) 57,97 56,87 114,84 100 11 (3751) 74,21 71,26 145,47 100 12 (4464) 93,91 89,53 183,44 100

Table 6.1: Time used from algorithms to generate the secret to agree a SSK.

92 6.2. ENCRYPTION ALGORITHM iterations because times were more than one order of magnitude bigger than in the other cases, so that keeping the same accuracy was not necessary.

6.1.6 Concluding remarks

In this section we compared a custom key agreement algorithm based on ma- trix conjugation with standard Diffie-Hellman and Elliptic Curve Diffie-Hellman key agreement. Our experiments have been performed using one of the most popular smartphone in the world. Experimental results showed that the key agreement based on matrix conjugation results to be from 8 to 450 times faster than the two DH. Providing the users new services on their mobile device enlarges the need of security to protect the information exchanged; such information can contain data about bank accounts, credit card numbers, pins or simply passwords. Currently existing cryptographic methods affect too much usability of ap- plications, charging the system with resource consumption due to cryptographic operations. Considering the growing business opportunity around the mobile world and, at the same time, the need of new more performing applications that can run on small capacity devices, as smartphones or netbooks, this sec- tion’s results open the possibility to apply such cryptographic methodology to many scenarios in mobile devices use.

6.2 Encryption algorithm

QP-DYN is an encryption algorithm based on some ideas coming from [72] used for the encryption/decryption phase of the communication. We are not authorized to disclose information about how the algorithm works, we can just provide information on the performance and statistic testing performed in

93 CHAPTER 6. PROTECTING SAVED DATA

comparison with other stream cypher algorithms. The security of QP-DYN’s has been statistically tested and the results are available in Section 6.2.2. These results do not prove that QP-DYN is unbreak- able; however they show that QP-DYN not only satisfies NIST requirements for classified information but also it passes tighter and more robust tests, such as Rabbit, Alphabit Pseudodiehard, FIPS-140-2 and Crush test batteries.

6.2.1 Performances

Performance testing have been executed on a Nokia N70 (see Section 6.1.3 for the device’s details) We compare QP-DYN with RC4 [73] and AES CFB [74] Stream Cipher be- cause both perform stream cipher operations as QP-DYN. In Figure 6.4 (a), Figure 6.4 (b) and Figure 6.4 (c) the results of performance testing between QP-DYN and RC4 for different key sizes are shown. Time per- formances shown in the figures are the sum of the encryption and decryption times. The sizes of the key tested are:

• 512-bit for RC4 compared to QP-DYN with a 4x4 matrix for a total of 496-bit (Figure 6.4 (a));

• 768-bit for RC4 compared to QP-DYN with 5x5 matrix for a total of 775- bit (Figure 6.4 (b));

• 1024-bit for RC4 compared to QP-DYN with 6x6 matrix for a total of 1116- bit (Figure 6.4 (c)).

94 6.2. ENCRYPTION ALGORITHM

80 70 s 60

d

n 50

o

c 40

e RC4 512

s 30

i

l QP 4 (496)

l

i 20

m 10 0 32 96 160 224 288 352 416 480 544 608 size

(a) 70 60

s

d 50

n

o 40

c

e 30 RC4 768

s

i

l QP 5 (775)

l 20

i

m 10 0 32 96 160 224 288 352 416 480 544 608 size

(b) 90 80

s 70 d 60

n

o 50

c

e 40 RC4 1024

s

i 30 l QP 6 (1116)

l

i 20

m 10 0 32 96 160 224 288 352 416 480 544 608 size

(c)

Figure 6.4: Overall encryption and decryption time comparison between (sizes in bytes) (a) RC4 512-bit and QP4, (b) RC4 768-bit and QP5, (c) RC4 1024-bit and QP6.

95 CHAPTER 6. PROTECTING SAVED DATA

We observe that the time differences in the above figures are in the follow- ing ranges:

• From 15 up to 52 milliseconds for Figure 6.4 (a);

• From 18 up to 45 milliseconds for Figure 6.4 (b);

• From 37 up to 65 milliseconds for Figure 6.4 (c).

The time differences are within the range 15-65 milliseconds and thus they do not affect substantially the usability of QP-DYN compared to RC4. Further- more, it is useful to remember that RC4 is not considered secure (see also the results shown in “Statistically testing QP - Dyn and RC4”). We also compared performance results on mobile environments of QP-DYN with an AES implementation performing Stream Cipher (AES CFB Stream Ci- pher). In particular, Figure 6.5 illustrates the results of a comparison between an AES CFB Stream Cipher implementation using a 256-bit key and QP-DYN with 3x3 matrixes (279-bit key).

120 100

s

d 80

n

o

c 60

e AES – Strm 256

s

i 40 l QP 3 (279)

l i 20

m 0 32 96 160 224 288 352 416 480 544 608 size

Figure 6.5: Overall encryption and decryption time comparison between AES CFB 256- bit and QP3 (sizes in bytes).

96 6.2. ENCRYPTION ALGORITHM

In our experiments, the size of the plaintext where QP-DYN and AES take roughly the same time to encrypt/decrypt was about 256 bytes. As it can be seen from the figure, the time differences between AES CFB and QP-DYN are not dramatic:

• To encrypt/decrypt 32 bytes of plaintext AES CFB takes 22 milliseconds less than QP-DYN;

• To encrypt/decrypt 256 bytes of plaintext AES CFB and QP-DYN take the same time;

• To encrypt/decrypt 512 bytes of plaintext AES CFB takes 24 milliseconds more than QP-DYN.

QP-DYN can be even used to perform Block Cipher operations so we com- pared it with AES in his standard Block Cipher implementation. As AES has been designed to perform Block Cipher operations the encryption/decryption times are better than the AES CFB. Figure 6.6 shows the results of our experi-

80 70 s 60

d

n 50

o

c 40

e AES - Block 256

s 30

i

l QP 3 (279)

l

i 20

m 10 0 32 96 160 224 288 352 416 480 544 608 size

Figure 6.6: Overall encryption and decryption time comparison between AES 256-bit and QP3 (sizes in bytes).

97 CHAPTER 6. PROTECTING SAVED DATA ments for a standard implementation of AES using a 256-bit key and QP-DYN with a 3x3 matrix (279-bit). In particular:

• to encrypt and decrypt 32 bytes of plaintext AES takes 0.3 milliseconds while QP-DYN 28.25 milliseconds;

• to encrypt and decrypt 512 bytes of plaintext AES takes 4 milliseconds while QP-DYN 63 milliseconds.

We remark again that those time differences are very small (of the order of 60 milliseconds), and thus they should not have any impact on the practical usability of QP.

6.2.2 Statistically testing QP-DYN and RC4

QP-DYN performs encryption and decryption in a stream cipher mode; in par- ticular, it generates a key-stream of the same size of the plaintext to be ciphered. This key-stream is XOR-ed with the plaintext generating the ciphered text. Such operations are the same as those performed by other stream cipher al- gorithms e.g., RC4 [75]. In 2005, Andreas Klein presented an analysis of the RC4 stream cipher, showing correlations between the RC4 keystream and the key, and again in 2008 Klein presented a successful attack on RC4 key-stream based on his 2005 work ([76]). These works show that if there are correlations in the key-stream generated from a stream cipher, the stream cipher itself is reversible and can be statistically attacked recovering the key. The National Institute of Standards and Technologies (NIST) sets the guide- lines to verify a stream cipher algorithm based on pseudo-random numbers generators (PRNG) [77]. A PRNG should successfully “pass” some statistic

98 6.2. ENCRYPTION ALGORITHM tests in order to be usable to cipher classified information 1. These tests are a subset of other sets of tests used to discover correlations in bit sequences gen- erated from a PRNG. NIST gives some documentation about these tests in [78]. As there is a lot of work about RC4 stream cipher cryptanalysis and about the correlations notable in the key-stream generated, we decided to start ana- lyzing the differences, noticed after performing NIST tests, between RC4 and QP-DYN. We tested RC4 and QP-DYN using the TestU01 [79] C library for sta- tistical testing; this library provides more tests than those required by NIST; moreover some of these tests are harder to be passed. The tests in TestU01 li- brary are divided in some batteries: SmallCrush, BigCrush, Rabbit, Alphabit, FIPS-140-2, pseudo DIEHARD. There is not a battery performing all the tests required by the NIST but the tests are available in the library [80]; we imple- mented a battery of tests performing all the tests required. The results of our NIST test battery running for RC4 and QP-DYN algo- rithm are shown in [57]. RC4 does not pass some tests, while QP-DYN passes all the tests required by NIST. Moreover while QP-DYN does not show correlations in the keystream generated, RC4 does it in a very short time if compared with the QP-DYN times (Total CPU time for RC4 is equal to 00:32:13.93, Total CPU time for QP-DYN amounts to 04:00:07.85). In every run performed, RC4 failed always the same tests, while QP-DYN always passed all the tests performed. The results of the tests performed give a clear indication that QP-DYN,

1Classified information is sensitive information to which access is restricted by law or regula- tion to particular classes of persons. A formal security clearance is required to handle classified documents or access classified data. The clearance process requires a satisfactory background in- vestigation. There are typically several levels of sensitivity, with differing clearance requirements. This sort of hierarchical system of secrecy is used by virtually every national government. The act of assigning the level of sensitivity to data is called data classification.

99 CHAPTER 6. PROTECTING SAVED DATA when used properly with strong keys, is a strong and robust stream cipher even but effort required is higher than that required by other algorithms.

6.3 Protecting inter process communication

Smartphone applications are commonly installed and stored in memory, and in modern devices all the application’s data are kept safe from the OS by us- ing a sandbox approach. Such approach prevents other applications to access unauthorized data insulating each application from the others [81], [82], [83]. In many cases applications installed on the same device may interoperate in their working environment using mechanisms similar to the inter-process communication (IPC) and made available by the mobile operating system. Un- fortunately, mobile devices lack in flexible solutions for making these commu- nications secure. In this section is presented a framework proposed to secure the message ex- change with the services installed on Google Android mobile devices. VASs re- alized by different providers are discovered, used and composed by an Appli- cation Frame designed for realizing complex goals. We implemented a proto- type of our proposed framework on a real device and we performed extensive testing to measure the overhead introduced by the cryptographic operations required to protect the inter process communication. We named this framework SAVED (Secure Android Value addED services). SAVED enables secure communication between services and applications us- ing such services via Inter Process Communication (IPC)/Remote Procedure Call (RPC). Each VAS is realized through an Android Service. The access to such a service requires the execution of an authentication and authorization phase among the involved parties. Once this initial phase is completed, the ap-

100 6.3. PROTECTING INTER PROCESS COMMUNICATION plication sets up a secure communication with the service using a symmetric encryption scheme.

6.3.1 State of the art

Android is a multi-process system, in which each application (and parts of the system) runs in its own process. Most security between applications and the system is enforced at the process level through standard Linux facilities, such as user and group IDs that are assigned to applications [84]. The Android sys- tem requires that all installed applications be digitally signed with a certificate whose private key is held by the application’s developer. The Android sys- tem uses the certificate as a means of identifying the author of an application and establishing trust relationships between applications. The Android ap- proach grants security of application’s data, and prevents access to all services developed by others. Every service publishes in its personal manifest file the permissions required to use the service. One of the permission settings in the manifest file is Protection level. The Protection level field configures the secu- rity policies required by the service; if the level is set to signature the service will communicate only with these applications with which it shares the same developer certificate. The main advantage of the approach followed in the Android design is that developers have to focus their attention only on the application, while the OS grants that all the applications that are not allowed to access the services are prevented from doing so. This simplification comes at an extra cost: only developers sharing certificates and private keys can use services already devel- oped in new applications. This is a huge limitation compared to the growing size of the mall market and the number of organizations and developers en-

101 CHAPTER 6. PROTECTING SAVED DATA rolled in publishing applications and services. The approach of Android prevents third parties to start using the frame- work’s VAS. Developers can use each others’ services sharing certificates and credentials: in this case, the applications can interact but the security of the whole framework is granted from a single digital signature; if the developer’s digital signature is stolen a hacker could sign his/her own applications, thus getting complete access to all data of the framework. Our approach wants to promote the framework scalability and grant secure access to services developed by other users without the need to share private data. We propose to insert a new layer that handles security of inter-process communications; in such layer, trustability is granted directly by the security policy of the framework, and each application can require access and publish services interacting with the framework like in a PKI environment. Thanks to SAVED framework it is possible to face different kinds of threats:

• Service Spoofing: the application refers to a service by simply using an interface that establishes the name, the package and the methods signa- tures; if the original service is replaced on the mobile device, applications that exploit that service are unaware of the substitution.

• Memory Dump: starting from Android 1.5, a new API has been intro- duced to generate a memory dump programmatically. The static method dumpHprofData(String fileName) of the Debug class generate a dump file that can be converted with the hprof-conv tool of the Android SDK and, subsequently, analyzed with different memory analysis tools (e.g., Eclipse MAT, JProfile, etc.). If a fake application execute the dump peri- odically and export the dump data using a connection (e.g., HTTP con-

102 6.3. PROTECTING INTER PROCESS COMMUNICATION

nection), it is possible to steal the data exchanged among applications and services.

6.3.2 The framework

SAVED (Secure Android Value addED services) is a framework that grants se- cure communication between services without requiring private data sharing. Our intent is to improve interoperability between applications and services facing the limits of the Android’s native approach. The purpose of SAVED is to allow applications to use services developed by others, to add new VAS to the framework or even to create new applications using already existing VAS. All the interactions performed using the proposed frameworks will be per- formed in a secure way. SAVED adds supplementary security at the process communication level: each application is accredited to the framework which grants privileges to access in a secure way shared services and facilities. Single process security provided using sandboxes with the Android approach is also preserved in SAVED. In our framework we defined two main entities:

• Application, which provides graphical user interface, and all the logic implementing the task to be realized. Applications are implemented ex- tending the Android.Activity class.

• Value Added Service (VAS), which provides to the applications devel- oped using the framework all the certified services. VASs are imple- mented as remote services extending the Android.Service class. The Prox- yCA and the ProxyTSA are two special VAS in the framework; these VASs allow the communication with a Certification Authority and a Times- tamping Authority, respectively.

103 CHAPTER 6. PROTECTING SAVED DATA

In order to realize Applications participating to the framework, developers have to extend specific interfaces and include particular resource packages. When a new VAS is realized, it is required to export its class package. Such class packages will be imported from the Applications that will use the ser- vices provided by the VAS. The packages imported will be used to perform inter-process communication. Including such packages and extending the in- terfaces will provide the supplementary security layer that will grant a secure communication between entities and prevent the access to the services to those applications that are not allowed. Moreover, we tried to address some best practices to create components participating to the framework enforcing the required security needs. Some examples follow:

• Activation code: when the Application/VAS is installed on the device an unlock code should be required to the user; the Application/VAS will remain locked (preventing all interactions) until the user will insert the proper activation code of every entity;

• Use of standard certificate: each component should have a proper X509 digital certificate signed from a valid Certification Authority (CA), such certificate will be saved in a keystore inside the component memory area; the component will be responsible to take care of managing correctly the keystore itself to grant a secure saving of the other’s certificates;

• Model View Control Pattern: VAS and Applications will take care of imple- menting independently graphical user interfaces to be shown to the end user;

• Mutual Authentication: each entity needs to implement a mechanism to

104 6.3. PROTECTING INTER PROCESS COMMUNICATION

Figure 6.7: Mutual Authentication phase.

grant mutual authentication. The mutual authentication should be en- sured by mutually exchanging and verifying the digital certificates. Us- ing a handshake schema (e.g., TLS handshake) the involved entities ex- change their digital certificates, check the certificates validity through the ProxyCA, and mutually authenticate themselves (Figure 6.7).

• Session Authentication: once the entities are mutually authenticated, a ses- sion key (i.e., SK) is shared. According to our approach the SK is gen-

105 CHAPTER 6. PROTECTING SAVED DATA

Figure 6.8: Session Authentication phase.

erated by both the Application and the VAS using parameter defined by the two parties (i.e., CTRL A and CTRL B). Adopting a key agreement protocol (e.g., Diffie-Hellman protocol) the involved entities agree on se- cret SK that will be used to encrypt subsequent communications (Figure 6.8).

• Session Encryption: Every VAS allows access to its functionalities only to “trusted” Applications; trusted Applications have performed success- fully the Mutually Authentication and the Session Authentication phases. In order to enforce the uniqueness of each interaction with VAS a random

106 6.3. PROTECTING INTER PROCESS COMMUNICATION

Figure 6.9: Session Encryption phase.

value (i.e. Nonce A) is used; the confidentiality is granted by encrypting the exchanged data with the SK.The Application composes the results of different VAS in order to realize a complex goal. At the end of this phase, the Application interacts with a timestamping authority through the ProxyTSA in order to securely keep track of the creation time of the realized goal (Figure 6.9). The sensitive data of the operation are summa- rized applying an Hash function (i.e., Op Hash) and these data are sent to the Timestamping service.

107 CHAPTER 6. PROTECTING SAVED DATA

Mutual Authentication, Session Authentication and Session Encryption repre- sent the secure core of SAVED framework and should be carefully performed in order to join the framework.

6.3.3 The framework implementation

We developed a prototype of the SAVED framework on an Android 1.5 plat- form. The main features of the proposed framework are encapsulated into the jar files that contains two kind of files (i.e., .aidl, .Stub) for the inter process communication. AIDL (Android Interface Definition Language) is an IDL [85, 86] with which it is possible to generate automatically the source code that al- lows two Android applications to exchange information using IPC. AIDL/IPC interface based mechanism is similar to Common Object Model (COM) or Com- mon Object Request Broker Architecture (CORBA). In order to implement an AIDL/IPC service it is required to perform some steps:

• Create an .aidl file to define the interface (YourInterface.aidl). The inter- face defines the access methods and the fields available to a client.

• Add the .aidl file to the makefile and implement the methods of the in- terface creating a class that extends the YourInterface.Stub (.Stub file is automatically generated by the tool) and implements methods declared in the .aidl description file.

• Publish the interface to clients rewriting the Service.onBind (Intent) method; this method will return an instance of the class implementing the inter- face.

108 6.3. PROTECTING INTER PROCESS COMMUNICATION

Figure 6.10: SAVED framework main packages.

This IPC mechanism needs a way to share complex information, such as non- primitive types, between two entities. In order to achieve this goal Android provides Parcelable class able to serialize and deserialize complex types. Figure 6.10 simplifies the package diagram of SAVED. The picture shows on top the following core .jar files:

• pkgApp.jar contains the interface InterfaceApplication that must be im- plemented by every class that want to participate SAVED as an Applica- tion;

• pkgServ.jar contains the interface InterfaceService that needs to be imple- mented by every class that want to be a VAS in the framework;

• pkgCA.jar carries the IProxyCA.aidl with his relative .Stub file; these files

109 CHAPTER 6. PROTECTING SAVED DATA

allow the communications between the entities of SAVED and the Prox- yCA. Moreover, the jar file contains the parcelable class ReqX509 that is mandatory for the communication;

• pkgTSA.jar packages the IProxyTime.aidl with his relative .Stub file to grant communication with the Proxy TSA;

• pkgCommBase.jar contains the three base parcelable files that grant the communication between the Application and the VASs, namely Certifi- catePack.java, KeyPack.java and ResourcePack.java.

In order to grant to an Application to contact and receive services from all the VAS inside the framework, and so assemble the services offered from the VAS to create complex applications, it is required to install the ProxyTSA and the ProxyCA Android packages (apk); these entities are shown in the lower left half of Figure 6.10. ProxyCA is one of the underlying VASs that exist in the framework. All entities must submit to the ProxyCA the digital certificates they receive from their communication partners. The service contacts a web service that works as an online Certification Authority, inserts the certificate in a XML file and through a secure HTTP connection (i.e., HTTPS) asks for the certificate verification. The web service checks the certificate validity and answers with an XML response. ProxyTSA is another basic VAS of SAVED. As the ProxyCA the ProxyTSA takes in account the communication with an external partner, the timestamping web service. All the communications between the proxy and the timestamping web service are managed through XML messages on HTTPS. The lower right half of Figure 6.10 illustrates a VAS and an Application participating SAVED. Third parties that want to contribute to the framework

110 6.3. PROTECTING INTER PROCESS COMMUNICATION may easily create and add new Applications or VASs. In the remaining we sketch how a developer can realize VASs and Applications.

Building a value added service

1. Create a new Android project with a class that extends the native Service class; Import in the project pkgServ.jar, pkgCommBase.jar, pkgCA.jar;

2. The main class of the project must implement InterfaceService interface class and consequently all his methods;

3. Create the graphical user interface;

4. Create the IServiceX.aidl in the project as described previously;

5. Create and export pkgXVAS.jar containing IServiceX.aidl and the corre- sponding .Stub file generated automatically;

6. Service class must implement, all the standard methods of the Android native Service class, and the .aidl interface with all the methods defined through the description language;

7. Release the service as an .apk file for the installation on the device.

Building an application

1. Create a new Android project which contains a class that extends the na- tive Activity Android class; Import in the project pkgApp.jar, pkgComm- Base.jar, pkgCA.jar, pkgTS.jar;

2. The main class of the project must implement the InterfaceApplication interface with all his methods;

111 CHAPTER 6. PROTECTING SAVED DATA

3. Create a graphical user interface to allow the user to interact with the Application;

4. Import from each VAS you want to use in the Application the correspond- ing jar file (i.e., pkgXVAS.jar)

5. Use each service in a proper way, taking care of managing and releasing correctly the connection with the involved VAS. Note that early versions of Android platform serialize the access to the services.

6. Release the Application as an .apk file for the installation on the device.

Assume we are in a scenario where we have one Application and one VAS, each one with its own digital certificate signed by different CAs. Note that in this scenario, none of the entities “knows” the public key or the certificate of the counterpart. If the two entities wish to cooperate, they need to authenticate each other. After contacting the ProxyCA to verify the communication partner trustability (cfr. the Mutual Authentication phase), an asymmetric cryptogra- phy session to exchange the session key can be started (cfr. the Session Authen- tication phase). Finally, the session between the involved parties is encrypted using symmetric cryptography (cfr. the Session Encryption phase). The need to switch from asymmetric to symmetric cryptography is due to the performance overhead of asymmetric cryptography: indeed, the switch from asymmetric to symmetric cryptography improves the performances of the whole framework reducing the effort due to encryption/decryption operations.

6.3.4 On a real device

The framework has been tested on an Android HTC Magic device. The de- vice was equipped with Android 1.5 OS, 3.2 M- camera, Integrated GPS

112 6.3. PROTECTING INTER PROCESS COMMUNICATION

Phase Time (ms) 1. Mutual Authentication* 1197 1. Mutual Authentication 446 2. Session Authentication 257 3. Session Encryption 795 Total Framework Overhead 1498

Table 6.2: Time overhead for the framework phases.

Antenna, IEEE 802.11 b/g Wi-Fi. Using Android ADB tool different .apk, cre- ated using Eclipse IDE, have been installed on the HTC Magic. The testing phase has highlighted a slower response of the Applications due to security operations, inter-process communications via AIDL interfaces and parcelable classes. We executed some performance tests using our prototype. We aimed at measuring the time computational overhead introduced by the use of SAVED, and thus we measured the time needed to execute security functions. In par- ticular, we have considered the overhead related to each one of the phases de- scribed in Section 6.3.2. In Table 6.2 we can see the time overhead introduced by SAVED. The first row of the table refers to the first execution of the Mutual Authentication phase, while the second row refers to the subsequent executions. In the first case the more time required is justified by the need to update the keystore with the new digital certificates; this delay is paid once. The total framework overhead amounts to 1.5 second preserving the usability for real use cases.

113

People have really gotten com- fortable not only sharing more in- formation and different kinds, but more openly and with more peo- ple - and that social norm is just something that has evolved over time. Mark Zuckerberg 7 Value added services on backup data

Introduction

In this chapter we present some possible use cases where the application of our backup approach can bring an improvement of the interaction among people.

In the first part of the chapter we show a system which allows the user to share part of his/her backup data with some selected contacts; a shared backup can ease communication within an enterprise environment, among friends or university colleagues and, with some constraint such as geographic location and time, in other situations such as meetings or conferences. Some results of the experimentation of the shared backup proposed have been recently pre- sented in the 4th IFIP International Conference on New Technologies, Mobility and Security [28].

In the second part we show a methodology to extract social network from backup data. The methodology proposed helps building the network extract- ing connections into backups, and helps making searches on the web for in- formation publicly available and findable using standard search engines. The

115 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA social network can be useful for several objectives; for example in an enterprise it can be used to choose people that are “friends” out of office to be inserted into workgroups, this may improve productivity avoiding conflicts between collab- orators. An approach like that exploits dynamics already present into groups of people [87]. Some ideas in the second part of this chapter have been proposed in Del- lutri’s PhD thesis [31]. An extension of this Dellutri’s thesis, showing some re- sults of this chapter have been published in the First IEEE International Work- shop on Information Forensics and Security [32].

7.1 Sharing backup data with closed groups

The common interface, introduced in Chapter 3, used for backup can be even used as base to enable sharing services to the users of the system. Usually, in closed groups of people, users unconsciously share each others contact, part of the calendar, production files or even, in personal interaction, pictures and videos. The general idea is to give to the user the possibility of sharing part of the backup as a common synchronization interface with some selected contacts or group of contacts of their choice in his personal or business network.

7.1.1 Social backup in business environment

In an enterprise where people collaborate daily, it could be important for em- ployees to share commonly useful information e.g., calendar, part of the ad- dress book, templates for presentations or documents etc... Moreover if a new employee joins the team, his/her contacts are added to the common address book and shared with selected users of his/her new team; his/her new busi-

116 7.1. SHARING BACKUP DATA WITH CLOSED GROUPS

Figure 7.1: Use case of meeting backup and share.

ness device is added to a specific closed group and all data updated to the last changes are kept from the shared backup and saved on it. If somebody’s device is lost, or stolen, or the employee leaves the company, the group administrator can disconnected it from the social backup and the privacy of the group mem- bers is granted. Using our approach, all these updates are directly exchanged and notified on employees’ smartphones.

7.1.2 Sharing conference data

Using some restrictions (i.e., time, location), our approach, can be useful in some particular kind of events, such as meeting, conventions or conferences. In this kind of events, the interest on some information (e.g., organizer contacts, event schedule) is temporary; the participant is interested in such information just for the time he/she is in the meeting location. Organizing committee can inform participant sharing documents and other related info just in the event area and when the event is. In this way participant will have just the infor- mation he/she needs directly on his/her mobile device, e.g., the conference

117 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA

schedule is shared via calendar, venue address via maps, committee contacts via address book; this avoids a plethora of non-useful data for the attendee and a lot of noisy requests of information for the committee.

(a) (b) (c)

Figure 7.2: Android Backup and Restore client.

7.1.3 Shared backup for smartphone

To allow users to share part of their backup with closed groups we deployed a set of REST web services (see Appendix C for the services implemented) on the backup server described in Section 5.4. In the control layer we implemented the business logic which handles the sharing services; using these REST API the user can manage via client his/her groups and sharing allowing or denying access to a resource to other users. Before granting access to any content the server checks the owner’s settings to verify if the user can access such resource. The shared approach proposed can be generalized, under some conditions, to a open community willing to share his/her data. In small groups, where all participants directly know each other the information can be shared freely; a system like that does not introduce privacy or security problems. If some-

118 7.1. SHARING BACKUP DATA WITH CLOSED GROUPS

(a) (b) (c) (d)

Figure 7.3: Android Backup and Restore client.

one wants to share an information with a friend the system just eases the task keeping the information up to date in both “sharers” devices. In an open community the information to be shared must be authorized by the information owner. In such case, when a user tries to share an information this information must be verified for example through a code sent to the in- formation. For example if a user wants to share a mobile phone number, the system will send a text message to the number with a verification code to be inserted into the system to verify the ownership of an information. We equipped the Android backup client (see Section 4.2) to access the web services deployed on the server.

7.1.4 Running the application

We ran our application on real devices. Figure 7.2 and Figure 7.3 show some snapshots of the Android’s client GUI that explains how the system works; the Symbian client and the server side implementations are omitted. Figure 7.2(a) illustrates the backup setup features where the user can choose which data to

119 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA backup and the type (full or incremental) of the backup to be performed. Figure 7.2(b) shows the interface where the user can select a backup to be restored: note that this backup may have been performed on another device. In Figure 7.2(c) depicts a granular view of a backup: in this interface the user can choose to restore just a part of the backup, or to keep updated only part of his/her data. Figure 7.3 illustrates how to share information based on geographic coordi- nates. In Figure 7.3 (a) the user is presented the actions used to manage groups with which he/she shares information; Figure 7.3 (b) shows all the possible actions which can be performed by the client. If the user chooses share data, Figure 7.3 (c) is presented and he/she can share data with his/her friends or with the groups he/she participates. In Figure 7.3 (c) and (d) show the inter- faces to share data geographically: tapping on the map the user specifies the area where a resource is visible and shares a resource from a backup. When an- other user of his/her group accesses the area in which the shared information resides, the new user is notified and the information is made available.

7.2 Extracting social network

In this section we propose an approach that allows one to get information about the social network of an individual by complementing the information pro- vided by its (smart)phone with the data publicly available on the net. Our approach is based on a profile graph, whose nodes are the people in- volved and the (weighted) edges represent their mutual links. In a first phase, a preliminary version of the graph is built by using all the information available in the backed up smartphone; later, the obtained graph is refined by mining publicly available data from the Web. Finally, the graph is clustered to gener-

120 7.2. EXTRACTING SOCIAL NETWORK ate cliques of people. All the phases of the process, described above, are performed by an inte- grated and interactive software tool, that allows the user to rapidly recover a smartphone’s owner social network. Merging the information coming from the Web with the information stored on the mobile device allows to reach “clearer” results avoiding homonymy problems and improving the precision of the link weighting.

7.2.1 Introduction

The everyday increasing spread of (Web) social networks, like Facebook, Linkedin, Fickr, Twitter, mySpace, etc. provides an invaluable amount of personal data publicly available, but it is often difficult if not impossible to distinguish real friends from Web ones if the WWW is our unique source. However, things change considerably if we can access an individual’s mo- bile device data: the two sources, the phone and the Web, together provide a precise picture of his/her social network. In some cases, if the smartphone is used to access some social networks, and therefore it stores the relative pass- words, the picture provided can be really sharp. The social network generated can be used to profile users. Users profiles can become a key point in workgroups creation; productivity should be im- proved when people already know each others and if they share interests dif- ferent from work. Beside the group generation in an enterprise the user profile might prove helpful in other fields including marketing, new social network- ing services boot-strapping, and Customer Relationship Management. The added value given by this approach is that intersecting the information provided by the smartphone with the information freely available on the Web,

121 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA allows to: i) filter effectively the often too many Web contacts, ii) discover the mutual relation between the phone contacts, iii) reduce ambiguities (e.g., Face- book friends you do not really know are either filtered out or their connection’s weight in the social network graph is really low compared with real friends ex- isting both in the smartphone and Web data); iv) provide a “closeness” score. The above approach, aimed at performing the Mobile Identity Profiling (MIP), i.e., reconstructing a user’s profile by combining the smartphone’s data analy- sis with social relationships data found on the Web, is splitted into three stages:

1. the Smartphone Data Analysis (SDA) (Section 7.2.3);

2. the Web Data Analysis (WDA) (Section 7.2.4);

3. the Clustering Analysis (CA) (Section 7.2.5).

The goal of the process is to build the smartphone owner’s social network, namely the profile graph, and to find all sub-graphs (clusters) which represent the social groups within the graph. The purpose of this section is to give the reader an idea about the effective- ness of our approach. We will discuss how the process is performed using an example to lead the reader through all the stages.

7.2.2 Related work

To the best of our knowledge, our approach of combining three different tech- niques in order to reconstruct an individual’s social network is novel. In this section we briefly discuss related works about the three distinct processes. The leitmotif connecting these processes is the concept of identity; through this sec- tion, with identity we mean “that part of the self by which we are known to

122 7.2. EXTRACTING SOCIAL NETWORK others” [88]. A remarkable work about the identity construction on social net- works is given by Zhao et al. in [89], where the authors study identity con- struction on Facebook (http://www.facebook.com).

The first phase of the approach described in this section, Smartphone Data Analysis, is based on some previous works, where we extracted information residing in mobile devices [33], [34] and analyzed this information to trace the smartphone’s user activity for forensics objective.

Focusing on Web Data Analysis, interesting results are presented by Mika et al. in [90], [91]; the authors, dealing with the problem of “bootstrapping” a Friend-Of-A-Friend (FOAF) based social network, proposed “the traditional Web as source of information about the social networks in a community”. So they introduced a system for collecting social network data which fetches data from the traditional Web by mining the index of Google. Since social networks spread, many “common” users put themself on the Web and, in particular, they entered information about who their friends are. We are able to extend Mika’s experiment to common users, thanks to the part of social networks data that is publicly available on the Web and that is periodically crawled by search engines.

Dealing with Clustering Analysis, we focused in the identification of lo- cally dense subgraphs that are sparsely inter-connected, also known as the paradigm of intra-cluster density versus inter-cluster sparsity (see [92], that provides an excellent overview about graph clustering). In Section 7.2.5 we provide details about the algorithms used.

123 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA

7.2.3 Smartphone Data Analysis (SDA)

Smartphone data analysis aims to decode the content of a smartphone and anal- yse it generating a graph representing the interactions between the users and his/her mobile device contacts. The decoding phase aims at generating parsers able to export data in XML- format, or that can be integrated directly in the analysis application. Data de- coded (contacts list, SMS list, event log and calendar entries list) can be hardly analysed manually by a human operator, because she has to correlate their unique identifier, in order to reconstruct situations, conversations and relation- ships between a device’s owner and her contacts. The Smartphone Data Analysis is composed of four sub-phases1: The File Analysis, that analyses files contained in the device filesystem or- ganizing them by MIME-Type and run the decoding tool over personal data files. The Contact Analysis, which merges together duplicate contacts informa- tion, highlights those contacts which may represent potential source of noise for the next Web analysis. The Event Analysis, that mines the phone’s log to reconstruct the user’s activity. Events generated by a mobile device always belong to the following macro-classes: voice calls, data calls, SMS/MMS sent or received, SIM change, SD change. Voice calls and SMS/MMS logs are useful to reconstruct of the phone owner’s social activity, and are used to determine the strength of the relation between the owner and each contact. The Messages Analysis completes the event analysis by extending it to all

1In this section we do not deal with the calendar analysis, because it is not directly correlated with the social network discovery.

124 7.2. EXTRACTING SOCIAL NETWORK

(a)(b)(c) Figure 7.4: The graph representation of contacts (a) and their relationships with the phone’s owner (b), which are revealed by the number of calls and number of SM- S/MMS. In (c) is shown the graph after the execution of SESORR; the edges represent the relationships extracted from the Web (web-edges).

SMS/MMS that have been deleted from event log but could still persist in the saved SMS/MMS list. After these analysis sub-phases have been completed, the profile graph is built and the information collected is organized and stored inside it. Such data structure allows us to represent the social network given by the phone interac- tion between the owner and the contacts. The graph generated is an undirected2 graph; the weight of each edge connecting two vertexes represents the strength of the connection between the user (central vertex) and a contact (other vertex).

In our graph representation the phone owner is in the center of a circle composed by the contacts we found in her smartphone (Figure 7.4a). After the SDA, the graph is augmented with edges from the phone’s owner and the con- tacts with whom she has communicated (via SMS/MMS or call). The weight of

2An undirected graph is a graph in which the vertices are connected by undirected edges. An undirected edge is an edge that has no orientation.

125 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA these links is computed trivially as the sum between the number of calls (sent or received) and SMS/MMS (sent and received) between the owner and the contacts. The value of this sum is used to compute the edges length, in order to put the most frequently contacted people closer to the owner (Figure 7.4b).

7.2.4 Web Data Analysis (WDA)

The goal of the Web Data Analysis component is to find the social network be- tween the phone owner and her contacts, and among them, by retrieving peo- ple’s public information on the World Wide Web. As mentioned before, we follow the approach of Mika et al. [90] to retrieve relationships from search engine records. In this section we will detail the relationships-retrieving algo- rithm and the techniques used to estimate the Web edges weight.

SESORR Algorithm

In order to reconstruct relationships among a phone’s contacts, we used the huge amount of data collected by search engines over the years to obtain re- lational network data. Our approach is to submit all possible pairs of names and surnames to the search engine and to retrieve the results, i.e., the pages where the two pairs hname, surnameii,j occur simultaneously, by counting the number of pages found (hits) and, for each of them, by saving the title and the short description returned by the search engine. Moreover, it counts the non- stopwords3 contained in titles and description for further analysis. To accom- plish this task, we designed the SESORR (Search Engine SOcial Relationships- Retrieving) algorithm. As preliminary examination, SESORR submits the query

3In a natural language, stopwords are function words or connectives such as articles and prepo- sitions that do not provide useful information for our scope.

126 7.2. EXTRACTING SOCIAL NETWORK

hname, surnamei ∨ hsurname, namei

for each contact and stores the results in the G nodes data structures. In such way it is able to discard from subsequent queries the contacts which are not present on the Web (i.e., the query returned a result set R = ∅). Finally, for each pair of contacts i, j, SESORR submits the following query:

(hnamei, surnameii ∨ hsurnamei, nameii) ∧ (hnamej , surnamej i ∨ hsurnamej , namej i)

and stores the results. Name and surname pairs are sent to the search engine by enclosing them within quotation marks: in such way the search engine is forced to retrieve only pages which contain the adjacency of the search terms. The piece of software which implements SESORR is able to contact both Google and Yahoo. After the SESORR execution, the profile graph is enriched by Web edges between the owner and their contacts, and among contacts. An example is reported in Figure 7.4c. For each Web edge, SESORR merges the titles and the descriptions of each result set entry in a single string. It computes the occurring frequency of each non-stop word. Such keywords and their frequency are stored in the Web edge and are displayed to the operator when she clicks on the edge. Given a Web edge, the list of keywords and their frequencies provides a kind of “semantic vision” of the relationship and the user is able to figure out a meaning of the relationship at glance.

127 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA

Figure 7.5: Frequency distribution of URLs (domains) providing relationships.

Moreover, besides title and description, SESORR stores each URL in the result set. By calculating the frequency with which each URL occurs over all relationships on a single profile, SESORR also provides a distribution of fre- quency of domains related to the profile and its contacts (see Figure 7.5).

Web-edge weight estimation

In order to measure a Web edge weight, i.e., how similar are two contacts be- tween which a Web edge exists, we define a function σ(e) ∈ [0, 1] which mea- sures the similarity between u and v individuals. In the semantic Web area, the similarity between two classes is assessed by observing the number of in- stances that these classes share, their individual number of instances, and the total number of instances they contain. The most frequently used metrics are the following:

128 7.2. EXTRACTING SOCIAL NETWORK

Jaccard index [93] between two sets X and Y is defined as the ratio between the size of the intersection and the size of the union of the two sets being com- pared: |X ∩ Y | σ(X,Y ) = |X ∪ Y |

Normalized Google Distance (NGD) [94] it takes advantage of the number of hits returned by Google to compute the semantic distance between concepts. Given two search terms x and y, the the normalised Google distance between x and y, NGD(x, y), can be obtained as follows:

max{log f(x), log f(y)} − log f(x, y) NDG(x, y) = log M − min{log f(x), log f(y)} where f(x) is the number of Google hits for the search term x, f(y) is the num- ber of Google hits for the search term y, f(x, y) is the number of Google hits for the tuple of search terms xy, and M is the number of Web pages indexed by Google (approximately ten billion pages). In our preliminary experiments, we measured the Pearson’s correlation be- tween the Jaccard index and the NGD; the results were (approximately) in the range 0.3 − 0.4, thus exhibiting a small-medium correlation. The software al- lows the user either to choose between the metrics, or to combine them by providing relative weights.

7.2.5 Clustering Analysis (CA)

At the final analysis stage, we want to identify subgroups of contacts sharing similarities. Generally speaking, the goal of clustering is to group together similar elements and thereby to identify the skeleton structure of the input data.

129 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA

Figure 7.6: Contact-to-cluster assignment.

In this section we have employed clustering techniques to split the phone owner’s social graph into small subgraphs (clusters). We chose spectral al- gorithms, i.e., algorithms based on spectral properties of the matrices asso- ciated to the input graph, because i) they are general and versatile, and ii) they proved to perform effectively in the identification of locally dense subgraphs that are sparsely inter-connected. In particular, we used the Spectral [95] and FullSVD [96] algorithms; both are based on the Singular Value Decomposition (SVD) performed on the adjacency matrix of the input graph.

Spectral. This algorithm, introduced in in [95], where it was called Spectral Algorithm I, is essentially a projection onto the first k right singular vectors. The intuition of this technique is that the matrix A describes the location of m points in an n-dimensional space. The projection onto the subspace defined by the top k right singular vectors gives the best k-rank approximation of A.

FullSVD. Drineas et al. studied in [96] k-means and its continuous version. While the discrete version is known to be NP-hard, the latter can be solved efficiently using a projection onto the top k left singular values. Similar to the Spectral algorithm, the cluster assignment is a discretization of the continuous solution. We refer to this method as FullSVD in order to avoid ambiguities with

130 7.2. EXTRACTING SOCIAL NETWORK

the SVD computation which is the core of all these algorithms. Both the algorithms output a matrix C which has on the rows the nodes (contacts) indexes, and on the columns the clusters indexes. The matrix cells represent intuitively the weight of the “closeness” between a contact and a cluster. We assign a node to the cluster with the maximum absolute value. In Figure 7.6 is shown a screenshot with the C matrix details and, for each contact, the chosen cluster. It is important to mention that, in both the clustering algorithms used, k, i.e., the number of clusters, is an input parameter; the quality of the results heavily depend on a good choice of its value. In the literature there are several measures to assess the quality of a cluster [92], and our tool can measure many of them, thus providing a feedback to the operator. In Figure 7.7 are reported some snapshots depicting, for each algorithm and

spectral (unweighted) spectral (Jaccard) spectral (Google Similarity) ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●●●●●● ● ●●●● ●●● ● ●● ●● ● ●● ●●●● 0.8 ● 0.8 ●● 0.8 ● ●● ● ●●●● ●●●●●●● ●●●● ●●● ● ●●●●●●●●●●●●●●●●●●●● ●●● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●●●●●● ● 0.4 0.4 ●● 0.4 ●● ●● ●● ●● ● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.0 0.0 0.0 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 k k k

SVD (unweighted) SVD (Jaccard) SVD (Google Similarity) ● ● ●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●● ●●●●● ●● ●●●● ●●●● ●●● ●●● ●● ● ●● ● ● ● 0.8 ● 0.8 ● 0.8 ● ● ● ● ● ● ● ● ● ● ●●●●●● ●● ●● ● ●● ● ● ●●● ●● 0.4 0.4 ●●●●●●● 0.4 ● ●●●● ● ● ●●●● ●● ● ●●●●● ●● ●● ●●●● ● ● ●● ●●● ● ●●●●●●●●● ●●● ● ●●●●●●● ●● ●●●●●●●●●● ● ●●●●●●●●●●●● ● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●● ● ●●●●● ●●●●● ●●● ● ● ●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●● ● ●●● ●●●●● ●●● ●●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.0 0.0 0.0 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 k k k

coverage ● inter−cluster conductance performance intra−cluster conductance

Figure 7.7: Clustering metrics trends. The profile graph, used in the example, has 218 contacts and 1242 Web edges; the black vertical line is relative to k = 10, the chosen value for the input parameter k. 131 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA

University colleagues Musician friends

Friends (from Facebook)

Work colleagues

Family members

Figure 7.8: The final result of the whole process: the social network clusters. for each Web edge weight metric, the performance of clustering quality indexes to the variation of k.

7.2.6 The Final Result: The Social Network

After the three stages, described in the previous sections, the tool is able to produce a graphical view of the clusters, shown in Figure 7.8. Here, for each cluster, the Phone’s owner is represented by a black node. It is interesting to notice that, by looking at each cluster’s Web edge key-

132 7.3. CONCLUSIONS words, we have been able to gather the area of interest shared by individuals in each cluster. Furthermore, the graph structure in each cluster may provide an intuition about the mutual relationship of the people involved. For example, from the “work colleagues cluster”, that is far from being a complete graph, it is possible to see “who works with whom”; even more interesting, inside the “musician friends cluster” we see a complete subgraph, made by five nodes, that corre- sponds to the members of a rock band, and only one of them actually plays with the Phone’s owner (together with the other people/nodes shown in the cluster). It is important to emphasize that all the above information, together with everything shown in Figure 7.8, have been obtained by a smartphone and our tool, able to mine the Web data, with no additional information available.

7.3 Conclusions

Our profiling method relies on information stored in the smartphone and its precision depends on the quantity and quality of such data. Since the method we used to find a person on the Web, and her relation- ships with other phone contacts, relies on her first name and last name, preci- sion is strongly dependent on the care used by the owner when she inserted each first name and last name. Sometimes only the names or the nicknames of a contact are inserted (e.g., most intimate contact); after submitting such weak identity to a search engine, this will produce no or useless results. To deal with this aspect, the framework performs a pre-processing of all contact entries (e.g., highlights entries which have name or surname missing) and suggests the op- erator identifies the contacts (where possible) and enters their correct names

133 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA and surnames. An obvious limitation deals with the time frame that we can reconstruct. The event log stored in the device is limited to a fixed size which restricts the vision of user activity to the latest. Also the size of stored sent and received SMS/MMS, if set, will limit precision. We were not able to perform name-surname/number matching and we just can access the log of the operations performed by the user in the last pe- riod as we have no access to mobile company’s customers data. Even holding this weak quantity of information the results look interesting as we were able to generate clusters mapping real life relationship between the user and her friends.

134 8 Conclusions and Future Work

Conclusions

In this thesis we proposed a full integrated solution which aims for solving the backup and synchronization problem in mobile environments. We propose to focus the management af the information, on how the information is logically structured (e.g., a contact will have name, surname, phone numbers. . . ). Our solution delegates backup client applications installed on mobile devices to ex- tract data from the internal data stores of the device and send these data using a common format to a remote server. The approach proposed is based on three main parts:

The first part of the system is a client application installed on the mobile de- vice. The client extracts the information from the internal storage of the device and sends such information to the backup server using an extensible common format (i.e.,XML). Moreover the client is responsible to get information from the server and restore such information into the device. Client applications have been implemented differently depending on the availability of APIs on the specific platform. In Android and newer Symbian

135 CHAPTER 8. CONCLUSIONS AND FUTURE WORK devices we have been able to access data into datastores of the device backed up via standard APIs. Moreover in Android under some conditions (rooted device) it has been possible to access even applications settings. Unfortunately accessing data directly into datastores (in particular with writing permissions) was not possible for the client applications implemented for older versions of Symbian OS and for Microsoft Windows Mobile 5 and 6. These applications backup the full system and the backup is later analyzed on the backup server. All the applications developed interact with the server using the common format proposed to grant interoperability between vendors and operating sys- tems.

The second part of the system is the backup server. The server implements the functionalities of getting data from the client application, handle these data and store the information into a common database. In case of restore, the backup server provides access to internally stored information to the clients; access is given granting privacy and security to users. We implemented the backup server as to be the more standard, scalable and extensible as possible. The basic idea is that our backup server should use a standard communication protocol that can be exploited by every class of device. Experimental results have shown that mobile clients running on differ- ent architectures/operating systems can interact with the proposed server via HTTP/HTTPS accessing all the features provided. The backup server have been even enabled to extract personal data from old Symbian raw backup. We proposed a methodology to reverse engineer datastores where personal data are contained and implement the parsers that extract these data.

136 The third component of the system are the services on backup data. These services can be provided by the same provider of the backup, or from other authorized service providers. In this thesis we have proposed two kind of services; one focused on end users and another on business and administrators of the system. The first ser- vice provides to the users the capability of sharing part of data in their personal backup with some selected contacts of their choice. The second service imple- ments a social network extractor which starting from backup data and data publicly available on the web, generates a social network and the cliques of contacts into the backup; this is done by clustering the various groups of inter- connected contacts.

The approach proposed has been considered by Telecom Italia to be used into the cubovision project to implement the set-top box backup operations. Part of the information stored by the user into his/her set-top box device is saved on a remote backup server. The set-top box device mainly contains video/au- dio files, but there are some other contents, such as applications installed and configured by the user, that are backed up using some ideas presented in this thesis.

Future work

Currently we are implementing an Apple iPhone and a RIM BlackBerry client application able to interact with the system implementing all the features of the Android application. Moreover we are extending the Android application to improve usability and improve performances.

137 CHAPTER 8. CONCLUSIONS AND FUTURE WORK

We are also participating to the Ericsson Application Awards 20111 with the shared backup idea and with the improved Android application equipped with augmented reality. The social network extraction tool described in Section 7.2 is being im- proved to generate the social network of all the backup system; this, when used by a sufficient number of persons, will solve the name/surname prob- lem. In fact the mobile phone number can be considered a unique identifier and allows the system to disambiguate homonyms or merge contacts named differently in different backups. The merge could be done considering set of names and surnames (data set containing common names and surnames can be found in the web); matching the name and surname field of a contact with commonly used ones will help ignoring nicknames and will make the system more precise. New services on backup data can be provided to users and to administra- tors such as integration with other systems such as interconnected TV, set-top boxes, laptop and tablet devices. We designed the server part of the system to be really extensible; a huge quantity of services using backup data can be provided. This opens the project to a plethora of novel ideas, in this thesis we described two use cases just to show how such an extensible backup system can be exploited by service providers.

1http://www.ericssonapplicationawards.com/

138 A The Symbian S60 format

A.1 Address book

Type flag Meaning Type flag Meaning 04024008 fax 14020001 H fax 2402C003 job 04140280 home 24028004 mobile(work) 1C02C00C nickname 0402000D video 1402400D video (home) 0C02C00D wv user ID 0402C008 url 04024009 po.box 04028009 extension 0402400A city 0402800A state 14024002 extension (home) 14028002 street (home) 14024003 state (home) 14028003 country (home) 24024006 street (work) 24028006 postal code (work) 24024007 country (work) 3C02000B DTMF Type flag Meaning Type flag Meaning 2402C004 W fax 04028007 general 0402C007 mobile 1402C000 mobile(home) 04028008 ? 04020008 pager 2402800D video (work) 24020004 work 14028001 url(home) 24024005 url(work) 0402C009 street 0402000A postal code 0402C00A country 14020002 po.box (home) 1402C002 postal code (home) 14020003 city (home) 2402C005 po.box (work) 24020006 extension (work) 2402C006 city (work) 24020007 state (work) 3402800B note

Table A.1: Possible values for the rows of table “DATA TYPE TABLE”. They describe the type of attributes present in the “DATA BLOCK”. (Symbian S60 v2)

139 APPENDIX A. THE SYMBIAN S60 FORMAT

Contacts and their data are stored in the Contacts.cdb file (located un- der C:\System\Data). During the methodology iterations, we found that contacts data were fragmented and spread across the entire file. In fact, after a contact update, Symbian preserves the old contact entry and appends new one at the end of the file with the same ID but fresh data. When the system performs a DB compression, obsolete entries are purged. After a first analysis, we found that data could grouped in three macro-areas (parts, see Table A.2). For each contact, the three parts are connected because each of them shares the same contact ID. The first part stores metadata about each contact and a block containing attributes like phone/fax/mobile numbers, snail mail ad- dress and notes:

D2 64 VD 10 00 00 00 ID FF 09 13 00 10 FF FF FF FF CXF1 20 30 30 65 31 32 20 30 30 65 31 32 30 30 66 66 61 39 30 30 66 66 61 39 64 31 32 37 36 64 31 32 37 36 76 12 9DFA 0F 20 E1 00 EDIT DATE 76 12 9DFA 0F 20 E1 00 CREATION DATE 04 00 00 00 00 00 04 00 00 00 00 00 00 00 1F 00 00 1F 1D TYPE TABLE LEN 04 00 00 00 04 | 14 02 00 00 00 00 00 00 00 00 00 00 TYPE TABLE 04 02 C0 07 00 00 00 00 00 00 00 00 | 04 00 00 00 00 04 00 00 00 00 1A DATA BLOCK LEN 20 00 FIELD FLAG 33 33 38 38 37 36 35 34 32 33 DATA BLOCK

The second part stores contact’s name, surname and company:

140 A.2. CALENDAR

10 00 00 00 ID 12 NAME LENGHT 50 61 70 E0 20 43 65 6C 6C NAME 09 13 00 10 CXF1

The third part stores email addresses:

1C EMAIL LENGHT 32 EMAIL ID 00 00 00 03 EMAIL FLAG 10 00 00 00 ID 24 EMAIL ADDRESS LEN 6F 64 6F 6D 65 6E 69 63 40 EMAIL ADDRESS 6C 69 62 65 72 6F 2E 69 74 |

A.2 Calendar

Calendar entries are stored in Calendar file (located under C:\System\Data). A calendar’s entry belongs to one of the following categories: anniversary, meet- ing or note. A sample of calendar’s entry, an anniversary without alarm, is shown below:

141 APPENDIX A. THE SYMBIAN S60 FORMAT

03 FT 0F 00 00 00 VS 0A AF 52 BL 02 00 00 00 ID 01 00 00 Flag1 A4 28 52 03 CD (GG MM) 05 05 AA 28 A2 AC AD A2 AC 01 00 00 01 00 Flag3 08 00 00 Flag2 20 TL 41 6E 6E 69 76 65 72 73 Text 61 72 79 41 6F 66 66 | 0E 20 29 ET AA 28 SD AA 28 ED

An example of anniversary with the alarm setted to on is shown here below:

142 A.2. CALENDAR

03 FT 0F 00 00 00 VS 1A AF 52 BL 02 00 00 00 ID 01 00 00 Flag1 A4 28 5B 03 CD (GG MM) 05 05 AA 28 A2 AC AD AT 01 00 00 01 00 VS2 3C ANL 43 61 6C 65 6E 41 6C 61 ATXT 72 6D 53 6F 75 6E 64 32 | 09 01 00 09 01 00 08 00 00 Flag2 1E TL 41 6E 6E 69 76 65 72 73 Text 61 72 79 41 6F 6E | 0E 20 29 ET AA 28 SD AA 28 ED

An example of meeting with the alarm set to off is shown below:

143 APPENDIX A. THE SYMBIAN S60 FORMAT

00 FT 0F 00 00 00 SF 00 00 00 0A AF 50 50 02 00 00 00 ID 01 00 00 Flag1 A4 28 C6 02 CD (GGMM) 01 RF A6 28 ERD A8 28 A8 28 01 00 00 00 00 01 00 00 00 00 08 00 00 Flag2 1A TL 4D 65 65 74 41 6F 66 66 52 64 61 79 Text 0E 20 29 ET A6 28 64 05 SD A6 28 91 05 ED

An example of meeting with the alarm set to on is shown below:

144 A.2. CALENDAR

00 FT 0F 00 00 00 SF 00 00 00 1A AF 50 50 02 00 00 00 ID 01 00 00 Flag1 A4 28 34 03 CD (GGMM) 01 RF A5 28 ERD A6 28 A6 28 01 00 00 00 00 01 00 00 00 00 3C RTN LEN 43 61 6C 65 6E 41 6C 61 RTN 72 6D 53 6F 75 6E 64 AF | 05 00 00 05 00 00 08 00 00 Flag2 18 TL 4D 65 65 74 41 6F 6E 52 64 61 79 Text 0E 20 29 ET A5 28 EC 01 SD A5 28 90 03 ED

Some meetings are saved in a different way, we call this kind of entries special meetings; here below is shown a special meeting with the alarm set to off and without repetition.

145 APPENDIX A. THE SYMBIAN S60 FORMAT

00 FT 0F 00 00 00 SF 00 00 00 08 AF 50 50 02 00 00 00 ID 01 00 00 Flag1 A4 28 7A 03 CD (GGMM) 08 00 00 Flag2 1C TL 53 6D 65 65 74 41 6F Text 66 66 52 6F 66 66 | 0E 20 29 ET A6 28 64 05 SD A6 28 91 05 ED

An example of special meeting with the alarm set to on and repetition set to off is shown below:

00 FT 0F 00 00 00 SF 00 00 00 18 AF 50 50 02 00 00 00 ID 01 00 00 Flag1 A4 28 9C 03 CD (GGMM) 3C RTN LEN 43 61 6C 65 6E 41 6C 61 RTN 72 6D 53 6F 75 6E 64 AF | 05 00 00 05 00 00 08 00 00 Flag2 1A TL 53 4D 65 65 74 41 6F Text 6E 52 6F 66 66 | 0E 20 29 ET A6 28 94 02 SD A6 28 C1 02 ED

146 A.3. EVENTS LOG

Notes are stored in a similar format as special meeting. An example of note is shown below:

02 FT 0F 00 00 00 VS 08 AF 52 BL 02 00 00 00 ID 01 00 00 Flag1 A4 28 6F 03 CD {ID2} 08 00 00 Flag2 10 NL 44 61 79 4E 6F 74 65 Text 0E 20 29 ET A5 28 SD A5 28 ED

A.3 Events log

Events and status changes are stored in Logdbu.dat file (located under C: \System\Data), and can belong to the following categories: , mms, voice and data calls, SIM changes. In the last case, the event is stored as an sms, so we will not examine it. Details about the fields are reported in Table A.4.

An example of SMS is shown in the following:

147 APPENDIX A. THE SYMBIAN S60 FORMAT

03 03 BA A3 EB EB B6 2C E1 00 DATE FF FF B2 NAME FLAG 60 16 00 00 | 1C 60 16 00 00 52 61 6D 6F 6E 61 20 4D NAME LENGTH 6F 72 65 74 74 69 NAME 04 00 05 00 33 04 00 05 00 33 80 MESS LENGTH 44 4F 4D 41 4E 49 20 53 45 52 41 20 | 54 49 20 49 4E 56 49 54 4F 20 41 44 | 20 55 53 43 49 52 45 20 49 4E 53 49 MESS 45 4D 45 20 58 20 55 4E 41 20 43 45 | 4E 41 2C 6F 76 76 69 61 6D 65 6E 74 | 65 20 6F 67 | 1A NUMBER LENGHT 2B 33 39 XX XX XX XX NUMBER XX XX XX XX XX XX | 18 18 00 DIRECTION 00 00 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 A4 00 01 00 00 00 A4

A voice call example is shown below:

148 A.3. EVENTS LOG

00 00 1F 66 4A EF C8 2C E1 00 DATE 01 01 70 NAME FLAG 63 16 00 00 63 16 00 00 1C NAME LENGHT 52 61 6D 6F 6E 61 20 4D NAME 6F 72 65 74 74 69 | 01 01 00 DIRECTION 00 00 00 00 CALL TIME 67 02 36 67 02 36 1A NUMBER LENGHT 2B 33 39 XX XX XX XX NUMBER XX XX XX XX XX XX | 20 03 00 00 02 A6 20 03 00 00 02 A6 17 00 00 A4 17 00 00 A4

An MMS example is reported below:

05 05 41 47 86 12 05 2A E1 00 DATA 01 01 F0 F0 00 00 00 00 00 00 00 00 0E 0E 54 69 6D 20 6D 6D 73 PROVIDER 00 00 07 00 00 00 00 00 07 00 00 00 02 00 02 00 30 08 NS 36 34 31 38 2C 37 32 37 NUMBER 5A 5A

A data call, or data traffic log entry, may belong to two different categories which are related to the type of storage format used: mms-type and sms-type.

149 APPENDIX A. THE SYMBIAN S60 FORMAT

The text body of sms or mms is used to store a single packet content. An ex- ample of sms-type data call is reported below:

03 03 C1 36 5F 21 08 2A E1 00 DATA FF FF A2 03 00 00 00 03 A2 03 00 00 00 03 00 04 00 33 00 04 00 33 7E 50 65 72 20 75 74 69 6C 69 7A 7A | 61 72 65 20 69 6C 20 73 65 72 76 69 | 7A 69 6F 20 64 65 76 69 20 61 74 74 PACKET CONTENT 69 76 61 72 65 20 41 6C 69 63 65 20 | 4D 41 49 4C 20 65 20 61 73 73 6F 63 | 69 61 72 65 | 0A SEP MES NUM 34 39 30 30 31 34 39 30 30 31 18 00 00 FLAG END

A.4 SMS

SMS are stored in the first folder (assuming that the folders are ordered al- phabetically) in /System/Mail folder. An example of received message is reported in the following table:

150 A.4. SMS

68 3C 00 10 68 3C 68 3C 00 10 68 3C ...... 00 10 00 00 00 00 00 10 00 00 00 00 25 3A 00 10 f l a g 1 0C 52 69 63 65 76 Text 0E ET 20 29 34 18 Received Flag Mar 00 10 45 04 01 00 00 10 45 04 01 00 01 00 00 00 02 00 01 00 00 00 02 00 01 Received Flag 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 FA 54 17 46 F2 29 E1 00 Date 28 SNL 33 34 39 34 36 37 37 31 34 36 34 Number 44 NL 69 73 74 65 66 61 6E 6F 20 41 6C 65 Name 00 00 00 00 00 02 00 00 00 00 00 02 00 00 00 02 03 00 00 00 00 02 03 00 00 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 F1 5A 15 41 F2 29 E1 00 SCRD 02 91 SCF 34 SCNL 2B 33 39 33 32 30 35 SCN 38 35 38 35 30 30 | 15 00 81 ESNF 28 ESNL 33 34 39 34 36 37 37 31 34 36 ESN

An example of sent message is reported in the following table:

151 APPENDIX A. THE SYMBIAN S60 FORMAT

68 3C 00 10 68 3C 68 3C 00 10 68 3C ...... 00 10 00 00 00 00 00 10 00 00 00 00 25 3A 00 10 f l a g 1 10 TL 49 6E 76 69 61 74 6F Text 0E ET 20 29 34 18 Received Flag Mar 00 10 F5 02 01 00 00 10 F5 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Received Flag 00 02 03 00 00 00 00 02 03 00 00 00 01 00 00 00 01 00 01 00 00 00 01 00 00 00 00 00 00 0C 09 22 F2 1C 32 E1 00 Date 00 91 f l a g 2 34 UNL 2B 33 39 33 34 39 32 UN 30 30 30 38 39 38 | 04 91 RNF 34 RNL 2B 33 39 33 34 39 34 RN 36 37 37 31 34 36 | 00 00 00 00 00 93 2D 4B F2 29 E1 00 SCRD

152 A.4. SMS

Field Name Size (Bytes) Description Example

VD 2 (Uncertain) Used by DB indexing D2 64 ID 4 Contact identifier (stored as little-endian) 10 00 00 00 CXF1 9 Flag. The first byte have to be equal to the last four. The second byte depends FF 09 13 00 10 FF on Symbian version; values can be 09, 0B, 10. ”13 00 10” is constant. FF FF FF EDIT DATE (ED) 8 17 bytes-offset from CXF1. Represents the number of microseconds from year 76 12 9D FA 0F 20 zero. E1 00 CREATION DATE (CD) 8 Represents the number of microseconds from year zero. 76 12 9D FA 0F 20 E1 00 TYPE TABLE LEN 1 41 bytes offset from CXF1. Stores the lenght (in bytes) of the TYPE TABLE 1D (TTL) TYPE TABLE (TT ) TTL Table of 12-bytes lenght rows, describing the types of the corrisponding data 04 00 00 00 04 in the DATA BLOCK. The first 5 bytes are the table start flag. The last 5 bytes 14 02 00 00 00 00 indicate the end table flag. The first 12-bytes row does not contain useful infor- 00 00 00 00 00 00 mation. For further information about data types, see Table A.1. 04 02 C0 07 00 00 00 00 00 00 00 00 04 00 00 00 00 DATA BLOCK LEN 1 Stores the size in bytes of DATA BLOCK 1A (DBL) FIELD FLAG (FF ) 2 Flag which is repeated as many times as the number of fields in DATA BLOCK. 20 00 DB DBL−2FF −1 33 33 38 38 37 36 DATA BLOCK ( ) 2 Stores contact’s information, according to fields type described in TYPE TABLE. Each field is separated by 00. 35 34 32 33

ID 4 Contact identifier (stored as little-endian) - The same as above. 10 00 00 00 NAME LENGHT (NL) 1 The size in nibbles of the name field. 0E NL 43 6C 61 75 64 69 NAME 2 The contact’s name 6F SURNAME LENGHT 1 The size in nibbles of the surname field. 08 (SL) SL 43 65 71 61 SURNAME 2 The contact’s surname COMPANY NAME LENGHT 1 The size in nibbles of the company field. 0C (CNL) COMPANY NAME 1 The contact’s company 44 72 2E 77 68 79 (CN) CONTACT END 4 Flag. Denotes the end of a contact’s details. The first byte depends on Symbian 09 13 00 10 version (09, 0B, 10, as in CXF1 field). The other bytes are constant.

EMAIL LENGHT (EL) 1 The size in nibbles of the email address block. 1C EL 32 EMAIL ID 2 The ID of email address EMAIL FLAG 4 Flag. 00 00 00 03 ID 4 Contact identifier (stored as little-endian) - The same as above. 10 00 00 00 EMAIL ADDRESS LENGHT 1 The size in nibbles of the email address string. 24 (EFL) EFL 6F 64 6F 6D 65 6E EMAIL ADDRESS 2 The email address string. 69 63 40 6C 69 62 65 72 6F 2E 69 74

Table A.2: This table lists all contact’s data which can be found in the Contacts.cdb. Since data are located in three logical file areas, the table is split in three parts.

153 APPENDIX A. THE SYMBIAN S60 FORMAT

Field Name Size (Bytes) Description Example

FT 1 Indicates the event type: if 00 is a Meeting if 02 is a daynote if 03 is an anniver- 00 sary. VARIABLE SEQUENCE(VS) 4 A four bytes variable secuence if the entry represents a Meeting and the first 0F OO OO OO byte is 10 this indicates the meeting needs to be processed in a different way ALARM FLAG (AF ) 1 This byte indicates if the alarm is set to ON (1A for normal events 18 for Special 0A Meetings) or OFF (0A for normal events 08 for Special Meetings). BODY LENGTH (BL) 1 Represents the length of the in nebbles of the following part of the entry. 52 ID 4 Calendar entry identifier (stored as little-endian) 02 00 00 00 Flag1 3 Indicates a calendar entry in this area 10 00 00 CREATION DATE (CD) 4 Stores the creation date of the Calendar entry, is composer by GG and MM. A4 28 52 03 DAY (GG) 2 Represents the day part of the CD field. A4 28 MONTH (MM) 2 Represents the month part of the CD field. 52 03 REP FLAG (RF ) 1 Appears only for meeting type entryes; indicates if the repetition of the meeting 01 is daily (value 01), weekly (value 02), montly (value 03). REPEAT UNTIL 2 Appears only for meeting type entryes; indicates the date until the event has A5 28 to be repeated. ANNIVER DATE (AD) 2 Stores the date of the event, is an integer counting the number of days since AA 28 1-1-1980. This field appears only if the entry type is Anniversary. ALARM TIME (AT ) 2 If the alarm is set to on stores the information about the alarm time, else is A2 AC unused. For the day note this field does not appear. VAR SEQ 5 is a variable secuence, in case of Anniversary the first 3 bytes are 01 00 00 if the 01 00 00 01 00 anniversary’s alarm is set to off the lasttwo are 01 00 may vary but their value is always lass than 32., AL NAME LEN (ANL) 1 Indicates the size in nibbles of the ALARM NAME field 3C AN ANL 43 61 6C 65 6E 41 AL NAME ( ) 4 Stores a text field indicating the ringtone name for the alarm 6C 61 72 6D 53 6F 75 6E 64 32 Flag2 3 is a flag characterizing a calendar event 08 00 00 TEXT LENGTH (TL) 1 Indicates the size in nibbles of the TEXT field 20 TL − 1 41 6E 6E 69 76 65 TEXT 2 Stores the text field of the calendar entry 72 73 61 72 79 41 6F 66 66 END TEXT (ET ) 3 Is the end flag of the TEXT field the value is always 0E 20 29 0E 20 29 START DATE (SD) 2 Stores the starting date of the entry if is a note or an anniversary else it does A5 28 not appear START DATE M 4 Stores the starting date of the entry if is a meeting else it does not appear A5 28 EC 01 (SDM) END DATE (ED) 2 Stores the ending date of the entry if is a note or an anniversary else it does not A5 28 appear END DATE M(EDM) 4 Stores the ending date of the entry if is a meeting else it does not appear A5 28 90 03

Table A.3: This table lists all calendar entries such as Notes Meetings Anniversaries stored in the Calendar file.

154 A.4. SMS

Field Name Size (Bytes) Description Example

COMMON PART

START DATA FLAG 1 Indicates a date starting at next byte, this flag combined with the EDF in- 03 (SDF ) dicates what kind of data are stored in the session. (03 DATA FF indicates an SMS, GPRS traffic or ‘DATAMESSAGE’ MMS, 05 DATA 01 MMS recived from the operator or GPRS traffic to the operator, 00 DATE 01 indicates incoming and outgoing calls) DATE 8 Stores the date in wich the operation has been performed. The date in stored BA A3 EB EB B6 2C E1 00 in big endian format. END DATA FLAG 1 Is located with an offset of 8 after the SDF and indicates that a date finishes FF (EDF ) here. NAME FLAG (NF ) 1 Is located with an offset of 1 after the EDF and if the entry refers to a con- B2 tact present in the address book (for the SMS the value is B2 if the message is to/from a contact in the address book for calls the value can be 70 if present else 60). NAME LENGTH 1 If the contact is present in the address book is located with an offset of 4 after 1C (NL) the NF and indicates the length in nibbles of the subsecuent field NAME. NL 52 61 6D 6F 6E 61 20 4D 6F NAME 2 If the contact is present in the address book is located with an offset of 1 after the NL and stores the name of the contact stored in the address book. 72 65 74 74 69

SMS PART

MESS LENGTH 1 If the contact is present in the address book is located with an offset of 5 after 80 (ML) the NAME field and indicates the length in nibbles of the subsecuent field MESS, else is st with an offset of 5 after NF . ML ML 44 4F 4D 41 4E 49 20 53 45 MESS 2 Is located with an offset of 1 after the and stores the message sent/re- cived. 52 41 20 54 49 20 49 4E 56 49 54 4F 20 41 44 20 55 53 43 49 52 45 20 49 4E 53 49 45 4D 45 20 58 20 55 4E 41 20 43 45 4E 41 2C 6F 76 76 69 61 6D 65 6E 74 65 20 6F 67 1A NUMBER LENGTH 1 Is located with an offset of 1 after the MESS and indicates the length in 1A (NUL) nibbles of the subsecuent field NUMBER. NUL NUL 2B 33 39 XX XX XX XX XX XX NUMBER 2 Is located with an offset of 1 after the and stores the number of the sender/recipient of the message. XX XX XX XX DIR 1 Is located with an offset of 2 after the NUMBER and stores the informa- 02 tion about the direction of the data stored in the section (value 00 indicates a sent message else the value will be 02).

CALL PART

DIRECTION 1 If the contact is present in the address book is located with an offset of 2 after 02 the NAME and stores the information about the direction of the data stored in the section (value 00 indicates an exiting call else the value will be 02). CALL TIME (CT ) 4 Is located with an offset of 2 after the DIR field and stores the information 00 00 00 00 about the duration of the call, data is atored in big endian format. NUMBER LENGHT 1 Is located with an offset of 4 after the CT and indicates the length in nibbles 1A (NUL) of the subsecuent field NUMBER. NUL NUL 2B 33 39 XX XX XX XX XX XX NUMBER 2 Is located with an offset of 1 after the and stores the number of the sender/recipient of the message. XX XX XX XX

MMS PART

PROV LEN (PL) 1 is located with an offset of 5 after the EDF and indicates the length in nibbles 0E of the subsecuent field P ROV IDER. PL PL 54 69 6D 20 6D 6D 73 PROVIDER 2 Is located with an offset of 1 after the field and stores the information about the mms service provider’s name. NUM START (NS) 2 Is located with an offset of 4 after the CT and indicates the length in nibbles 30 08 of the subsecuent field NUMBER. NUL NUL 36 34 31 38 2C 37 32 37 NUMBER 2 Is located with an offset of 1 after the and stores the number of the sender/recipient of the message.

Table A.4: This table lists all event entries such as SMS, MMS, voice and data calls, SIM change.

155 APPENDIX A. THE SYMBIAN S60 FORMAT

Field Name Size (Bytes) Description Example

COMMON PART

flag 1 4 If this flag is in the starting part of the file or at offset 5 the file does not contain 25 3A 00 10 SMS so there will be no need to parse it. REC FLAG MARK 4 Received Flag Marker indicates a recived message 20 29 34 18 (RF M) REC FLAG (RF M) 1 Starts at byte 13 after the (RF M). If its value is 01 then the message is recived 10 else if the value is 00 the message is a sent message. TEXT LEN (TL) 1 Generally is just after the flag 1 indicates the message’s text length. 10 SPEC MES (SM) 1 If appears after TEXT LEN indicates a message from a special number. 02 TL − 1 49 6E 76 69 61 74 TEXT 2 Stores the text of the SMS message. 6F END TEXT (ET ) 1 Indicates the end of the message text. 0E

RECEIVED MESSAGE

DATE 8 This field starts 12 bytes after the recived flag (REC FLAG). FA 54 17 46 F2 29 E1 00 SEND NUM LEN 1 This is an otiponal field: appears only if the sender’s number is stored in the 28 (SNL) address book. Indicates the length of the sender NUMBER field. SNL 33 34 39 34 36 37 NUMBER 4 Stores the number of the sender if the sender appears in the address book. 37 31 34 36 34 NAME LENGTH (NL) 1 This is an otiponal field: appears only if the sender’s number is stored in the 44 address book. Indicates the length of the following NAME field. NL 69 73 74 65 66 61 NAME 4 Stores the name of the sender if the sender appears in the address book. 6E 6F 20 41 6C 65 SERV CENT REC DATE 8 Is stored with an offset of 23 bytes after the name field. F1 5A 15 41 F2 29 (SCRD) E1 00 SERV CENT FLAG 2 Indicates that the message service center’s number starts here. 02 91 (SCF ) SERV CENT NUM LEN 1 Indicates the length of the following SERV CENT NUM field. 34 (SCNL) SCNL 2B 33 39 33 32 30 SERV CENT NUM 4 Stores the number of the messge service provider. (SCN) 35 38 35 38 35 30 30 EFF SERV CENT FLAG 3 Indicates that the effective message service center’s number starts here. 15 00 81 (ESCF ) EFF SERV NUM LEN 1 Indicates the length of the following EFF SERV NUM field. 28 (ESNL) ESN ESNL 33 34 39 34 36 37 EFF SERV NUM ( ) 4 Stores the effective number of the SMS message service provider. 37 31 34 36

SENT MESSAGE

DATE 8 Is stored with 14 bytes offset from the end of REC FLAG. 00 0C 09 22 F2 1C 32 E1 00 Flag 2 2 Is a flag indicating the presence of a recived message. 00 91 UNDEF NUMB LEN 1 Indicates the length of the following UNDEFINED NUMBER field. 34 (UNL) RNL 2B 33 39 33 34 39 UNDEFINED NUMBER 4 It is not clear which number does this field stores. Maybe the number of the (UN) sender’s message service provider. 32 30 30 30 38 39 38 RECIVER NUMB FLAG 2 This flag indicates the presence of the sender’s number in the next bytes. 04 91 (RNF ) RECIVER NUMBER LEN 1 Indicates the length of the following RECIVER NUMBER field. 34 (RNL) RNL 2B 33 39 33 34 39 RECIVER NUMBER 4 Stores the reciver’s number. (RN) 34 36 37 37 31 34 36 SERV CEN REC DATE 8 Stores the reciving date for the message service provider, it is stored with an 00 93 2D 4B F2 29 (SCRD) offset of 2 bytes from the end of RECIVER NUMBER. E1 00

Table A.5: This table lists all fields characterizing an SMS.

156 B The Backup communication protocol

B.1 Backup item

/backup/{backupType}/device/{imei}/

HTTP method: PUT Attributes: backupType indicates the type of backup performed; possible val- ues are full or diff. imei allows to identify the backed up device via its IMEI number.

2010-07-04 20:01:21.902 full false false false true false

Figure B.1: Example of XML payload for a backup item.

157 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL

B.2 Contact item

/backup/{backupType}/device/{imei}/contacts/{contactItemName}

HTTP method: PUT and GET Attributes: contactItemName is the unique identifier used by the client for a contact resource; backupType indicates the type of backup performed; pos- sible values are full or diff in case of PUT request and restore in case of GET request.

value0 [email protected] [email protected] name +123456789054 2 +12309876543 1 2010-07-07 12:20:12.997 new

Figure B.2: Example of XML payload for a contact item.

158 B.3. CALENDAR ITEM

B.3 Calendar item

/backup/{backupType}/device/{imei}/calendar/{calendarItemName}

HTTP method: PUT and GET Attributes: calendarItemName is the unic identifier used by the client for the calendar items;

0 1 6 meeting name meeting description 1277769600000 somelocation 1277683200000

Meeting event 2010-07-04 20:50:47.119 new

Figure B.3: Example of XML payload for a calendar item.

159 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL

B.4 Message item

/backup/{backupType}/device/{imei}/sms/{smsItemName}

HTTP method: PUT and GET Attributes: smsItemName is the unic identifier used by the client for the SMS resources;

text +123456789000 2 2010-07-04 20:57:33.669

Figure B.4: Example of XML payload for a message item.

160 B.5. GENERIC FILE ITEM

B.5 Generic file item

Files are sent in Base-64 encoding, if the file is too big for a single package the file is splitted in several chunk and sent chunk by chunk to the server which keeps track of the chunks received and assembles the file after all chunks have been received.

/backup/{backupType}/device/{imei}/files/{fileItemName} /init_byte/{init_byte}/final_byte/{final_byte}

HTTP method: PUT and GET Attributes: fileItemName is the unic identifier used by the client for files, (e.g., the path); init byte is the first byte of the file’s chunk sent. final byte is the last byte of the file’s chunk sent.

UBy/eQhuUlasfiUe/bocsDM3TbRsHPAfASGQj4fc1 +eRu2vnsuab0z6kYYlmo1BWtKbU/wBrGmkxtMLctJLwHjTiRSn h06ZAhwskO9kcVyaUFDUUFelcgQ4U4Jgjc3qx5fDTc9/ ...... /hiZsZZEQkoILIo6kCm30/TlRk0SktinpQ== file 169999 160000 09.jpg 2010-07-04 21:02:39.532

Figure B.5: Example of XML payload for a generic file item.

161 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL

B.6 Setting item

/backup/{backupType}/device/{imei}/app_settings/{fileItemName} /init_byte/{init_byte}/final_byte/{final_byte}

HTTP method: PUT and GET

Settings are managed as files, for android are usually Shared Preferences files, for iPhone plist files, these files can be analyzed on the server to extract data and make these data interoperable.

B.7 List methods

HTTP method: GET These methods are used to obtain the lists of resources present in the last backup. List methods have been implemented for; contacts, files, sms, calendar, settings. /backup/diff/device/{imei}/contactsIdList

/backup/diff/device/{imei}/filesIdList

/backup/diff/device/{imei}/smsIdList

/backup/diff/device/{imei}/calendarIdList

/backup/diff/device/{imei}/appSettingsIdList

Figure B.6 shows the XML response produced by the server for a list of items required using the contactsIdList method. Each dataItem contains two information: the itemName that is the unique identifier of the client and the

162 B.7. LIST METHODS

480 2010-07-04 20:25:40.0 481 ......

Figure B.6: Example of XML payload for a contact list response. timestamp of the last backup. These data are used when performing the dif- ferential backup to undestand which contents should be updated.

163 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL

B.8 Restore

B.8.1 Listing items on the server

/backup/device/{imei}/backup_item_list

HTTP method: GET This method provides the list of all the backups present on the server for the device identified by the IMEI and for all the devices owned by the authenti- cated user. When a user decides to restore from a backup, he/she choose the backup to restore from the list given by this method. Figure B.7 shows a typical list of backups.

B.8.2 Choosing data to be restored

/backup_restore/device/{imei}/{data_type}/{backup_id}

HTTP method: GET Attributes: data type indicates the type of data to be restored, possible values are contact, calendar, file, SMS or app depending on what to be resotred; backup id identifies the backup on the server. Figure B.8 shows the response from the server to a restore request. Choice of data to be restored can be done punctually identifying just one item on the server. The response to a request like that will be like that shown in Figure B.8. /restore/device/{imei}/{data_type}/{item_id}

164 B.8. RESTORE

2010-06-23 01:53:05.0 full 00000001 true true true true false 1 ...... 2010-07-05 11:38:07.0 diff 00000000 false false false true false 27

Figure B.7: Example of XML payload for a setting item.

165 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL

49 item description ...... 29

Figure B.8: Restore method response.

166 C The Sharing communication protocol

C.1 Sharing methods

C.1.1 Item listing

Returns the list of the sharable items present in the last backup. /sharing/device/{imei}/contactsIdList

HTTP method: GET

480 2010-07-04 20:25:40.0 481

Figure C.1: Example of XML payload for a list of items.

167 APPENDIX C. THE SHARING COMMUNICATION PROTOCOL

Similar methods are available for files (filesIdList) and calendars (calendarIdList)

C.1.2 Share a item

Method used to share a item. /sharing/device/{imei}/{data_type}/sharing_item/{id}

HTTP method: PUT and GET Attributes: data type can be contact, calendar or file in case of PUT; in case of GET file cannot be used. Files are managed by /sharing/device/{imei}/file/init_byte/{init_byte}/final_byte/ {final_byte}/sharing_item/{id} id represents the itemName when the request method is PUT, when the method is GET, id is the identifier into the server’s database.

7 false

Figure C.2: Example of XML payload to share an item with a group.

For each data to be shared sharingItem, for each group group with which the user wants to share information. isLB indicates whether the information is geotagged or not in this case the value should be false in PUT requests as the method does not handle location based data.

168 C.1. SHARING METHODS

C.1.3 Location based sharing

Method used to share a item using location based attributes.

/sharing/device/{imei}/{data_type}/lb_sharing_item/{id}

HTTP method: PUT and GET Attributes: the method works as the non location base method, even in this case there is a method to handle files in case of GET request;

/sharing/device/{imei}/file/init_byte/{init_byte}/ final_byte/{final_byte}/LB_SharingItem/{id} sharingItem and group, in this method are used as in the non location based case; latitude, longitude and radius (see Figure C.3) can be defined to set the area where the information is available. isLB indicates whether the information is geotagged or not.

x 41.963706 12.501572 5000 true

Figure C.3: Example of XML payload to share an item with a group using location.

When the request is a GET and the isLB field is true, latitude, longitude and radius are used to locate the item on the map.

169 APPENDIX C. THE SHARING COMMUNICATION PROTOCOL

C.1.4 Listing shared data

This method is used to list all information shared by user’s groups; the method can be location based or not. /sharing/device/{imei}/{data_type}/sharing_item_list

/sharing/device/{imei}/{data_type}/lb_sharing_item_list

HTTP method: GET Attributes: data type can assume contact, calendar or file value de- pending on the kind of data to be retrieved;

University 7 2 Johnn Doe false University 7 3 Mike Black false ......

Figure C.4: Example of XML payload for a list of items.

170 C.1. SHARING METHODS

Result does not contain data, but metadata visible in sharingItem; group indicates the group with which the information is shared, sharing id is the identifier of the data on the server and description contains a human read- able description of the content shared. Results can be filtered by group using the following method with the iden- tifier of the group in group id field. /sharing/device/{imei}/{data_type}/sharing_item_list/group/ {group_id}

171 APPENDIX C. THE SHARING COMMUNICATION PROTOCOL

C.2 Groups methods

C.2.1 Creating group

Using this method the user can create a new group.

/sharing/device/{imei}/group

HTTP method: PUT

[email protected] University

Figure C.5: Example of XML payload to create a group.

Such method gets groupName field to set the name of the group, and all the usernames in the memberList to set the users in the group.

C.2.2 Listing groups

Method used to get the list of groups available for the user, and the users par- ticipating the group. sharing/device/{imei}/group_list

HTTP method: GET

172 C.2. GROUPS METHODS

University johnn johnn bill 7 Work mike johnn bill 12

Figure C.6: Example of XML payload of a list of groups.

C.2.3 Handling invitations

Using this method the user can invite other users to a group or handle his/her invitations to groups. /sharing/device/{imei}/invitations

HTTP method: PUT and GET

173 APPENDIX C. THE SHARING COMMUNICATION PROTOCOL

[email protected] [email protected] 5 7

Figure C.7: Example of XML payload to invite users to a group.

In PUT case the user sends the XML in Figure C.7 with username fields set using the usernames of the users to be invited to the group groupId.

174 C.2. GROUPS METHODS

Lavoro [email protected] johnn 4 ...... University [email protected] jack/nickname> 8

Figure C.8: Example of XML payload of invitations received by the user.

In GET case the user receives the XML in Figure C.8 with all the groups where is invited. Using the invitations response method using a PUT request the user can decide, setting the status to IGNORED, ACCEPTED or REFUSED whether to ig- nore, accept or refuse the request. /sharing/device/{imei}/invitations_response/group/{group_id}/ {status}

175

Bibliography

[1] Jon Toigo. Disaster recovery planning : managing risk and catastrophe in infor- mation systems Yourdon Press, Englewood Cliffs N.J., 1989.

[2] Jon Toigo. Disaster recovery planning : preparing for the unthinkable. Prentice Hall, Upper Saddle River NJ, 3rd ed. edition, 2003.

[3] ADR Data Recovery. Data loss facts, 2008. http://www. adrdatarecovery.com/content/adr_loss_stat.html.

[4] Inc. ONTRACK Data International. Understanding data loss, 2001. http://www.ontrackdatarecovery.com/ understanding-data-loss/.

[5] DATAMATE. Microsoft data loss findings, 2001. http://www. datamate.com.au/content/view/14/.

[6] Lawrence M. Bridwel and Peter Tippet. ICSA Labs 7th Annual Computer Virus Prevalence Survey 2001. ICSA Lab, Upper Saddle River NJ, 7th ed. edition, 2001.

[7] Meta Group. It performance engineering & measurement strategies: Quantifying performance loss. Technical report, Meta Group, 2000.

[8] Winterthur. Un telefono cellulare rubato su due e` un iphone. Co- municato stampa, sep 2010. http://www.axa-winterthur.ch/ It/chi-siamo/media/comunicati-stampa-2010/Documents/ 20100926-axawin-iphone_it.pdf.

all the URLs reported in this bibliography have been last viewed in December 2010.

177 BIBLIOGRAPHY

[9] Rory Cellan-Jones. Government calls for action on mobile phone crime, feb 2010. The government has called on the mobile phone industry to do more to protect handset owners against theft.

[10] Lexton Snol. More smartphones than PCs by 2011. PC Ad- visor, August 2009 http://www.pcadvisor.co.uk/news/index. cfm?NewsID=3200338.

[11] Larry Dignan. Smartphone operating systems: The market share, usage disconnect, may 2009. http://blogs.zdnet.com/BTL/?p=18730.

[12] Paul Miller. Canalys: Android takes q2 smartphone market share lead in us with 886 percent year-over-year growth, aug 2010. http://www.engadget.com/2010/08/02/canalys-android-takes- q2-smartphone-market-share-lead-in-us-wit/.

[13] Christy Pettey and Laurence Goasduff. Gartner says worldwide mobile device sales to end users reached 1.6 billion units in 2010; smartphone sales grew 72 percent in 2010. Gartner press release, Gartner Inc., February 2011.

[14] International Telecommunication Union. Mobile cellular, subscriptions per 100 people, 2009. http://www.itu.int/en/pages/default. aspx. viewed 28th January, 2010.

[15] George Reese. Database Programming with JDBC and Java, Second Edition, chapter Chapter 7: Distributed Application Architecture. O’Reilly & As- sociates, nov 2000.

178 BIBLIOGRAPHY

[16] Rajkumar Buyya, Chee S. Yeo, and Srikumar Venugopal. Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities. Aug 2008.

[17] Eric Knorr and Galen Gruman. What cloud computing really means. web, 09 2008. The next big trend sounds nebulous, but it’s not so fuzzy when you view the value proposition from the perspective of IT professionals.

[18] Alessandro Acquisti, Elisabetta Carrara, Fred Stutzman, Jon Callas, Klaus Schimmer, Maz Nadjm, Mathieu Gorge, Nicole Ellison, Paul King, Ralph Gross, and Scott Golder. ENISA position paper no.1 ”Security issues and recommendations for online social networks”. Technical report, ENISA, November 2007. http://www.enisa.europa.eu/act/res/other- areas/social-networks/security-issues-and-recommendations-for-online- social-networks/at download/fullReport.

[19] Ann Chervenak, Vivekanand Vellanki, and Zachary Kurmas. Protecting file systems: A survey of backup techniques. In Joint NASA and IEEE Mass Storage Conference, 1998.

[20] S. Agarwal, D. Starobinski, and A. Trachtenberg. On the scalability of data synchronization protocols for PDAs and mobile devices. IEEE Network, 16, 2002. http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.17.427.

[21] Open Mobile Alliance. SyncML specifications, version 1.1, April 2002. http://www.openmobilealliance.org/tech/affiliates/ syncml/syncmlindex.html#V11.

179 BIBLIOGRAPHY

[22] F. Dawson and T. Howes. RFC 2426 - vCard MIME Directory Profile. Netscape Communications, September 1998. http://www.ietf.org/ rfc/rfc2426.txt.

[23] Marc Staimer. Why cloud backup & restore (bur) now! Technical report, Dragon Slayer Consulting, apr 2010. And How Procrastination Only In- creases Risk.

[24] Microsoft. Zmanda, software company enriches cloud-based backup so- lution with structured data storage. Technical report, Microsoft, gen 2009.

[25] Michael Vrable, Stefan Savage, and Geoffrey M. Voelker. Cumulus: Filesystem Backup to the Cloud. ACM Transactions on Storage (TOS), 5(4), dec 2009.

[26] Zhaohui Wang and Angelos Stavrou. Exploiting smart-phone usb connec- tivity for fun and profit. In Proceedings of the 26th Annual Computer Security Applications Conference, Austin, Texas, USA, 2010. ACM.

[27] Marc-Olivier Killijian, David Powell, Michel Banatre,ˆ Paul Couderc, and Yves Roudier. Collaborative backup for dependable mobile applications (extended abstract). In In Proceedings of 2nd International Workshop on Mid- dleware for Pervasive and Ad-Hoc Computing (Middleware 2004, pages 146– 149. ACM Press, 2004.

[28] V. Ottaviani, A. Lentini, A. Grillo, S. Di Cesare, and G. F. Italiano. Shared backup & restore, save, recover and share personal information into closed groups of smartphones. In 4th IFIP International Conference on New Technologies, Mobility and Security. IEEE, feb. 2011.

180 BIBLIOGRAPHY

[29] Roy Thomas Fielding. Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine, 2000.

[30] Roy Thomas Fielding and Richard N. Taylor. Principled design of the modern web architecture. ACM Transactions on Internet Technology, 2(2):115–150, may 2002.

[31] Fabio Dellutri. Profiling Mobile Identities. PhD thesis, University of Rome ”Tor Vergata”, 2009.

[32] F. Dellutri, L. Laura, V. Ottaviani, and G.F. Italiano. Extracting social net- works from seized smartphones and web data. In Information Forensics and Security, 2009. WIFS 2009. First IEEE International Workshop on, pages 101 –105, dec 2009.

[33] Fabio Dellutri, Vittorio Ottaviani, and Gianluigi Me. Forensic acquisition for windows mobile pocketpc. In Proceedings of the Workshop on Security and High Performance Computing Systems, HPCS 2008, Nicosia, Cyprus June 3-6, 2008, pages 200–205, 2008.

[34] Rosamaria Berte,` Fabio Dellutri, Antonio Grillo, Alessandro Lentini, Gi- anluigi Me, and Vittorio Ottaviani. Fast smartphones forensic analysis results through miat and forensic farm. International Journal of Electronic Security and Digital Forensics (IJESDF), Inderscience, 2008.

[35] Rosamaria Berte,` Fabio Dellutri, Antonio Grillo, Alessandro Lentini, Gi- anluigi Me, and Vittorio Ottaviani. Handbook of Electronic Security and Dig- ital Forensics, chapter A Methodology for Smartphones Internal Memory Acquisition, Decoding and Analysis. Worldscience, 2008.

181 BIBLIOGRAPHY

[36] Gianluigi Me and Maurizio Rossi. Internal forensic acquisition for mobile equipments. In IEEE Computer Society Press, editor, 4th Int’l Workshop on Security in Systems and Networks (SSN2008), Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2008.

[37] Alessandro Distefano and Gianluigi Me. An overall assessment of mobile internal acquisition tool. Digital Investigation, 5(Supplement 1):S121–S127, 2008.

[38] Michael Santarini. Nand versus nor. EDN, October 2005.

[39] Microsoft. Linear flash memory devices on microsoft windows ce. http://www.microsoft.com/technet/archive/wce/plan/ flashce.mspx.

[40] Microsoft. The windows ce 5.0 object store. http://msdn2. microsoft.com/en-us/library/ms885891.aspx.

[41] Yost Scott. Why can’t i copy programs out of windows?, 2007. http://blogs.msdn.com/windowsmobile/archive/2007/12\ /29/why-can-t-i-copy-programs-out-of-windows.aspx.

[42] F. Dellutri, V. Ottaviani, D. Bocci, G.F. Italiano, and G. Me. Data reverse engineering on a smartphone. In Ultra Modern Telecommunications Work- shops, 2009. ICUMT ’09. International Conference on, pages 1 –8, oct 2009.

[43] P. H. Aiken. Reverse engineering of data. IBM Systems Journal, 37(2):246– 269, 1998.

[44] Chen and Associates. Reverse-DBMS (Access. 2.0) for Windows Reference Manual Version 3.0, 1994.

182 BIBLIOGRAPHY

[45] Roger H. L. Chiang. A knowledge-based system for performing reverse engineering of relational databases. Decis. Support Syst., 13(3-4):295–312, 1995.

[46] Kathi Hogshead Davis. August-ii: a tool for step-by-step data model re- verse engineering. Reverse Engineering, 1995., Proceedings of 2nd Working Conference on, pages 146–154, Jul 1995.

[47] Jean Henrard, Didier Roland, Anthony Cleve, and Jean-Luc Hainaut. Large-scale data reengineering: Return from experience. In WCRE ’08: Proceedings of the 2008 15th Working Conference on Reverse Engineering, pages 305–308, Washington, DC, USA, 2008. IEEE Computer Society.

[48] Contacts Database (CContactDatabase). Symbian developer library. http://www.symbian.com/Developer/techlib/v70docs/ SDL_v7.0/doc_source/reference/cpp/ContactsModel/ CContactDatabaseClass.html.

[49] Glenn E. Krasner and Stephen T. Pope. A cookbook for using the model- view controller user interface paradigm in Smalltalk-80. J. Object Oriented Program., 1(3):26–49, 1988.

[50] Jerome Louvel and Thierry Boileau. Restlet in Action. Manning Early Ac- cess Program, 2011.

[51] Noelios Technologies. Restlet, 2010. http://www.restlet.org/.

[52] The Apache Software Foundation. Apache tomcat, 2010.

[53] Douglas Schmidt. Pattern-oriented software architecture. Wiley, Chichester [England] ;;New York, 2000.

183 BIBLIOGRAPHY

[54] Martin Fowler. Pojo, 2000. http://www.martinfowler.com/bliki/ POJO.html.

[55] Richard D. Titus. Data is the new oil. Presentation: http://www. slideshare.net/rxdxt/data-is-the-new-oil, June 2010.

[56] Vittorio Ottaviani, Alberto Zanoni, and Massimo Regoli. Conjugation as public key agreement protocol in mobile cryptography. In Sokratis K. Katsikas and Pierangela Samarati, editors, SECRYPT, pages 411–416. SciTePress, 2010.

[57] Vittorio Ottaviani, Giuseppe F. Italiano, Antonio Grillo, and Alessandro Lentini. Benchmarking for the qp cryptographic suite. Technical report, University of Rome “Tor Vergata”, dept. of Informatics, Systems and Pro- duction, August 2009.

[58] Antonio Grillo. TIMiD: Trasferring Identities on Mobile Devices. PhD thesis, University of Rome Tor Vergata, 2011.

[59] Antonio Grillo, Alessandro Lentini, Vittorio Ottaviani, Giuseppe F. Ital- iano, and Fabrizio Battisti. Saved: Secure android value added services. In Proceedings of MOBICASE 2010 Conference, International Workshop on Mo- bile Security, 2010.

[60] Tohari Ahmad, Jiankun Hu, and Song Han. An efficient mobile voting system security scheme based on elliptic curve cryptography. Network and System Security, International Conference on, 0:474–479, 2009.

[61] Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Giuseppe F. Ital- iano. Transaction oriented text messaging with trusted-sms. In ACSAC, pages 485–494. IEEE Computer Society, 2008.

184 BIBLIOGRAPHY

[62] Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Giuliano Rulli. Trusted sms - a novel framework for non-repudiable sms-based processes. In Lu´ıs Azevedo and Ana Rita Londral, editors, HEALTHINF (1), pages 43–50. INSTICC - Institute for Systems and Technologies of Information, Control and Communication, 2008.

[63] Eligijus Sakalauskas, Povilas Tvarijonas, and Andrius Raulynaitis. Key agreement protocol (kap) using conjugacy and discrete logarithm prob- lems in group representation level. Informatica, 18(1):115–124, 2007.

[64] Whitfield Diffie and Martin E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22(6):644–654, 1976.

[65] Marco Bodrato. Personal communication, 2009.

[66] Frank Celler and C. R. Leedham-Green. Calculating the order of an in- vertible matrix. In In Groups and Computation II, pages 55–60. American Mathematical Society, 1995.

[67] PRNewswire. Rcs announces 2007 January-June trading data for the global cellular phone open market. Note, jul 2007.

[68] NIST. Recommended elliptic curves for federal government use. Technical report, NIST, July 1999.

[69] Eric Rescorla. Rfc 2631 - Diffie-Hellman key agreement method. Technical report, RTFM Inc., June 1999.

[70] Elaine Barker, Don Johnson, and Miles Smid. NIST SP 800-56A - Recom- mendation for Pair-Wise Key Establishment Schemes Using Discrete Logarithm Cryptography. NIST, March 2007.

185 BIBLIOGRAPHY

[71] Certicom Research. Standards for efficient cryptography - SEC 1: El- liptic curve cryptography. Technical Report 20, Certicom Corp., secg- [email protected], September 2000.

[72] M. Abundo, L. Accardi, and A. Auricchio. Hyperbolic automor- phisms of tori and pseudo-random sequences. Calcolo, 29:213–240, 1992. 10.1007/BF02576183.

[73] E. Rescorla. Diffie-hellman key agreement method. RFC 2631, 1999.

[74] FIPS. the official aes standard. FIPS PUB 197, 2001.

[75] Kalle Kaukonen and Rodney Thayer. A Stream Cipher Encryption Algo- rithm ”Arcfour”. 1999.

[76] Andreas Klein. Attacks on the rc4 stream cipher. Designs, Codes and Cryp- tography, 48:269–286, 2008. 10.1007/s10623-008-9206-6.

[77] NIST. Random number generation, dec 2000. http://csrc.nist. gov/groups/ST/toolkit/rng/index.html.

[78] NIST. Guide to the statistical tests, apr 2008. http://csrc.nist.gov/ groups/ST/toolkit/rng/stats_tests.html.

[79] Pierre L’Ecuyer and Richard Simard. Testu01, oct 2009. http://www. iro.umontreal.ca/˜simardr/testu01/tu01.html.

[80] Pierre L’Ecuyer and Richard Simard. TestU01. A Software Library in ANSI C for Empirical Testing of Random Number Generators. Departement d’Informatique et de Recherche Operationnelle Universite de Montreal, aug 2009. User’s guide, compact version.

186 BIBLIOGRAPHY

[81] Jesse Burns. Developing secure mobile applications for android. Technical report, iSEC Partners, 2008.

[82] Jesse Burns. Mobile application security on android. Technical report, Black Hat, 2009. Context on Android security.

[83] Android developers. Security and permission, jun 2010. http://developer.android.com/guide/topics/security/ security.html.

[84] Li Gong, Marianne Mueller, Hemma Prafullchandra, and Roland Schemers. Going beyond the sandbox: An overview of the new security architecture in the java development kit 1.2. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterey, California, dec 1997.

[85] David Alex Lamb. Sharing intermediate representations: the interface de- scription language. PhD thesis, Carnegie-Mellon University, Department of Computer Science, 1983.

[86] F.Bachmann et al. Documenting software architecture: Documenting in- terfaces. Technical report, Sofware Enginerring Institute, Carniege Mel- lon, 2002.

[87] Robert Kail. Human development : a life-span view. Wadsworth Cengage Learning, Australia ;;Belmont CA, 5th ed. edition, 2010.

[88] David L. Altheide. Identity and the definition of the situation in a mass- mediated context. Symbolic Interaction, 23(1):1–27, 2000.

187 BIBLIOGRAPHY

[89] Shanyang Zhao, Sherri Grasmuck, and Jason Martin. Identity construc- tion on facebook: Digital empowerment in anchored relationships. Com- put. Hum. Behav., 24(5):1816–1836, 2008.

[90] Peter Mika. Bootstrapping the foaf-web: An experiment in social network mining, 2004. http://www.cs.vu.nl/˜pmika/research/ foaf-ws/mining.html.

[91] Peter Mika. Flink: Semantic web technology for the extraction and anal- ysis of social networks. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2-3):211–223, October 2005.

[92] Marco Gaertler. Clustering. In Ulrik Brandes and Thomas Erlebach, edi- tors, Network Analysis: Methodological Foundations, volume 3418 of Lecture Notes in Computer Science, pages 178–215. Springer, February 2005. http: //springerlink.metapress.com/content/19b5r48lqx3nx7gc.

[93] Paul Jaccard. Etude comparative de la distribution florale dans une por- tion des alpes et des jura. Bull Soc Vaudoise Sci Nat, 37:547–579, 1901.

[94] Rudi L. Cilibrasi and Paul M. B. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370–383, March 2007. http://dx.doi.org/10.1109/TKDE.2007.48.

[95] Ravi Kannan, Santosh Vempala, and Adrian Vetta. On Clusterings: Good, Bad, Spectral. Journal of the ACM, 51(3):497–515, May 2004.

[96] P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering Large Graphs via the Singular Value Decomposition. Machine Learning, 56(1-3):9–33, 2004.

188 BIBLIOGRAPHY

[97] E. Casalicchio, E. Galli, and V. Ottaviani. MobileOnRealEnvironment-GIS: A federated mobile network simulator of mobile nodes on real geographic data. In Distributed Simulation and Real Time Applications, 2009. DS-RT ’09. 13th IEEE/ACM International Symposium on, pages 255 –258, oct 2009.

189