Android Forensics: a Physical Approach

Lamine M. Aouad, Tahar M. Kechadi Centre of Cybercrime Investigation University College Dublin - Ireland

Abstract— There has been an exponential growth of An- memory imaging approach is the “holy grail” of any forensic droid systems in the last few years. However, the capability acquisition. Performing a bit-by-bit copy of the original to perform efficient and fast forensic analyses on these media was indeed always ranked the highest in terms of devices is still limited, due to the lack of standardized effectiveness and accuracy, such as in [2], [3]. The work processes along with the wide range of variants or versions presented in this paper focuses on a generalized method for in the , the file system, the data storage, in physical acquisition and memory imaging and analysis on addition to the manufacturers specific customizations. In this Android devices. We specifically target the support of the paper, we present a generalized method for physical acqui- native file system used by Android, namely yaffs. sition and analysis of memory images of Android devices. The next section will present a brief state-of-the-art and It is known that the main advantage of acquiring physical overview of underlying systems. Section 3 will then present memory images is a more complete capture of the data, the proposed method, evaluation, and a discussion. Section including deleted items. In addition, physical acquisition 4 presents then concluding remarks. methods can work with damaged devices and generally make fewer alterations to the original device while being acquired. 2. Background Yaffs2 (Yet Another Flash ) used in the majority In recent years, there has been an increasing interest in of existing devices is still not fully supported by forensic mobile devices forensic, and many studies and surveys (on commercial tools. We aim at covering this gap by presenting current methods and existing tools) have been presented, an easy end-to-end procedure for the acquisition of data including [2] [3] [5] [6], among others. Given the huge partitions on a range of Android systems using yaffs2, as variety of these systems and devices (the Android OS well as the mounting and analysis of these memory images for instance is compliant with 300+ different smartphone on a workstation. models), it should come as no surprise that it is quite a large list of specifications. Indeed, no standardized or generalized Keywords: Android, Memory imaging, Yaffs2. methods exist, either software or hardware. An interesting fact to mention here is that most of the existing tools are 1. Introduction commercial, with unspecified implementation, and no or The number of mobile phone subscriptions worldwide little documentation of their architecture or the way they reached more than 5.6 billion last year (Gartner research do either logical or physical acquisitions. In this work, we - 2011). The technology and functionality present on these aim at setting up a general physical acquisition method and phones is continually evolving. Smart phones are now be- document the fundamentals of analyzing it. In the following, coming widely spread, and have certainly hugely contributed we will present the underlying systems and technologies. to the phenomenal increase in mobile phone subscriptions (700% in the last ten years!). Android is becoming the most 2.1 Android OS common platform for these phones, with 43% penetration in Android is an open source mobile device OS developed the US market (Q3 2011) [7]. by Google, based on the Linux 2.6 kernel. The Linux kernel The amount of information stored on these devices has was chosen due to its proven driver model, existing drivers, increased dramatically. These include emails, SMSs, browser memory and process management, networking support along history, bookmarks, messages, chat, network passwords, with other core operating system services [8]. It has also personal notes, contacts, call logs, geolocation information, developed its own Java runtime engine, optimized for the and much more. There is also a wealth of information in limited resources available on a mobile platform, called the third-party applications. These are all potentially relevant in “Dalvik Virtual Machine”. Lastly, the application framework a forensic investigation. However, the growing number and was created in order to provide the system libraries in a variety of devices and customized systems and interfaces concise manner to the end-user applications [9]. make it difficult to develop a single process or tool for effective data extraction and analysis. 2.2 Yaffs filesystem Low-level analysis of complete memory images can offer Android uses the yaffs flash file system, the first NAND a solution to this. The literature has also shown that the optimized Linux flash file system. For mobile devices, hard Fig. 1: Android system architecture

Fig. 2: Yaffs embedded structure disks are too large in size, too fragile and consume too much power to be useful. In contrast, flash memory provides fast read access time and better kinetic shock resistance than addressed NAND flash in blocks that are divided into 64 hard disks. There are fundamentally two different types of chunks with each chunk containing 2048 bytes (so blocks are flash memory: NOR and NAND. NOR is low density, offers 128K) plus a 64-byte out-of-band/spare area (OOB) where slow writes and fast reads. NAND is low cost, high density various tags and metadata are stored, as we will see below. and offers fast writes and slow reads. Embedded systems are increasingly using NAND flash for storage and NOR for 3. Process overview code and execution [10]. The main idea here is to provide the user with a general- Yaffs was developed by Toby Churchill Ltd (TCL) as a ized method that can be carried out without the need of any reliable filing system with fast boot time for their flash mem- specific forensic tool. We present the setting up followed by ory devices . The authors initially tried to modify existing the overall process and discussion. flash file systems such as JFFS (used mainly for NOR) to add NAND support, but it turned out that the slow boot time 3.1 Setup and RAM consumption of existing flash file systems was The presented method runs under Linux. The Android unacceptable. Furthermore, there are too many fundamental SDK tools are to be installed, including the Android Debug differences between NOR and NAND to make performance Bridge [11]. As yaffs2 is not supported by default in Linux, optimal. For instance, since erasing NOR is much longer we had to incorporate it. Yaffs2 has been downloaded and than for NAND, garbage collection methodologies for NOR compiled to enable kernel support. Also, the mtd-utils are not suitable for NAND. This led to the development of package needs to be installed. Lastly, MTD is cross-compiled a different flash file system especially for NAND according to be used on the devices. to its features and limitations to optimize performance and ensure robustness. Upon completion yaffs performed better 3.2 The method than existing flash file systems and can still be used with We tested this method on a NexusOne with Android 2.1 NOR flash even though it was specifically designed for and kernel version 2.6.29. The phone has to be rooted. NAND. The description of yaffs is given in figure 2. In order to acquire access to the root directory, Universal Serial Bus (USB) debugging will have to be enabled on the 2.3 phone. Our target partition is the 5th, and it is mounted on Linux only understands character and block devices, such /dev/block/mtdblock5. Note that the process can be applied as keyboards and disk drives. With Linux on flash, however, to any other partition, or a set of partitions. a flash transition layer provides the system with device func- tionality. A Memory Technology Device (MTD) is needed to 3.3 Acquiring the memory image provide an interface between the Linux OS and the physical We extract the memory contents in their entirety through flash device because flash memory devices are not seen as the communication port. For MTD devices, nanddump can be character or block devices. The MTD system is simply “an used to collect NAND data independently of the higher-level abstraction layer for raw flash devices” that allows software filesystem deployed on the memory. For devices that do not to utilize a single interface to access a variety of flash employ MTD, other collection techniques can be employed. technologies. For most Android devices, the MTD subsystem For instance, the dd utility can be used. It is also important to note that not all the data is necessarily stored in on-board We also generate a dump without the OOB area as some of memory [12]. We used an empty sdcard, with respect to the the techniques that we are going to use work better without best practices in forensics. The linux shell command is the OOB: following: #./nanddump -f /sdcard/userdatapadbad.nanddump #cd mtd-utils-arm /dev/mtd/mtd5 --bb=padbad #adb push nanddump /sdcard ECC failed: 0 #adb push mtd_debug /sdcard ECC corrected: 0 #adb shell [here we are on the phone] Number of bad blocks: 1 #mount -o remount,rw /sdcard /sdcard Number of bbt blocks: 0 # 755 /sdcard/nanddump Block size 131072, page size 2048, OOB size 64 Dumping data starting at Now we have a cross-compiled copy of nanddump and 0x00000000 and ending at 0x0c440000... mtddebug, executable on our device. We also take note of The resulted images are now on the sdcard: the version of yaffs that is running on our device (cat #ls -l /proc/yaffs). As we know the mounting point of our target ----rwxr-x system sdcard_rw partition, we can collect some other important information 702360 2011-09-03 21:11 nanddump ----rwxr-x system sdcard_rw about it (via cat /proc/mtd). 644505 2011-09-03 21:11 mtd_debug From this, we can see how yaffs2 and MTD organize the ----rwxr-x system sdcard_rw NAND flash structure. We can see that totalBytesPerChunk 212213760 2011-11-22 18:38 userdataoobpadbad.nanddump is equal to 2048, so now we know that for this version of ----rwxr-x system sdcard_rw yaffs, and for this device, the data page size (Chunk for yaffs) 205783040 2011-11-22 18:43 is 2048 bytes. The size of block (erasesize) is 00020000 userdatapadbad.nanddump in hexadecimal, i.e. 131072 (128 kilobytes). Usually, each It might appear that the size of the first dump is incorrect, block is followed by 64-byte out-of-band/spare area (OOB) however, if we consider that we took also the OOB area into where various tags and metadata are stored. We also know consideration, now the size of each structure is: there are 64 chunks (2048 bytes for each) per block. This Size of a chunk = (2048 bytes + 64 bytes) = 2112 partition has 1570 blocks so the total size is: 1570×131072 Size of a block = (Size of a chunk) * 64 bytes = 205783040 bytes. This is the same size that we can = 135168 bytes (132 kilobytes) see in hexadecimal (running the 2nd command mentioned Total Size of partition = number of block * size of block above, i.e. cat /proc/mtd) at mtd5 (0c440000). To check = 1570 * 135168 bytes = 212213760 bytes that, we can use the mtd-debug command (mtd_debug info /dev/mtd/mtd5). The structure of our yaffs2 blocks is shown As we can see, both the nanddump with the OOB area in figure 3. and the nanddump without the OOB area correspond to the expected sizes. Finally we have to export this dump from the sdcard to the Ubuntu workstation using adb pull. 3.4 Mounting the image We are now going to mount the userdata partition which includes the OOB data. We will use a simulated NAND device. The yaffs2 module is responsible for all aspects of Fig. 3: Block structure the file system, while the MTD driver manages the writing of the data to the NAND flash. We have then to take into Now we are ready to use nanddump to make a dump of consideration an additional layer of complexity. This is the whole userdata partition. There is a variety of options; a tricky step because the version of the yaffs2 module we used -o as we need to dump also the OOB data to mount available on your system might not be compatible with the the image on the target machine, the -f option to specify one present on the phone. The same applies to the MTD the path of the file, and -bb=padbad to specify that we want driver. In both cases you will see only the folder LOST+FOUND. to copy the badblock as well. #cd /sdcard First we load the driver and the yaffs2 module: #./nanddump -o -f /sdcard/userdataoobpadbad.nanddump #sudo modprobe mtdchar /dev/mtd/mtd5 --bb=padbad #sudo modprobe mtd ECC failed: 0 #sudo modprobe mtdblock ECC corrected: 0 #insmod Number of bad blocks: 1 #$DIR/$YOURKERNELNAME/fs/yaffs2/yaffs.ko Number of bbt blocks: 0 Block size 131072, page size 2048, OOB size 64 We can now build a simulated NAND device of 1GB: Dumping data starting at 0x00000000 and ending at 0x0c440000... #sudo modprobe nandsim first_id_byte=0xec second_id_byte=0xd3 3.5 Yaffs2 file carving third_id_byte=0x51 fourth_id_byte=0x95 File carving is the process of reassembling computer Figure 4 shows other parameters for different sizes of the files from fragments in the absence of filesystem metadata. simulated device. We can have a look inside this simulated The carving process makes use of knowledge of common partition with a simple nanddump command. Using a normal file structures, information contained in files, and heuristics hexadecimal examiner can present some problems with the regarding how filesystems fragment data. Fusing these three OOB area. sources of information, a file carving system infers which fragments belong together [14]. There are many commercial tools to carve data file, but not many of them support yaffs2. We used an open source tool called Scapel [15]. Scalpel is a fast file carver that reads a database of header and footer definitions and extracts matching files or data fragments from a set of image files or raw device files. Scalpel is filesystem- independent and would carve files from FATx, NTFS, /3, HFS+, or raw partitions with the help of a configuration file. It is useful for both digital forensic investigations and file recovery. The parameters set up is very important. It will define the extension, the maximum size to carve, the header definition and the footer. The final result of the analysis Fig. 4: Nandsim parameter heavily depends on these parameters. Here we used the image without the OOB area. #scalpel -o ~/scalpel ~/userdatapadbad.nanddump #sudo nanddump -a /dev/mtd0 | xxd | less Scalpel is done, files carved 0000000: ffff ffff ffff ffff ...... = 7998, elapsed = 93 seconds. 0000010: ffff ffff ffff ffff ...... 0000020: ffff ffff ffff ffff ...... 0000030: ffff ffff ffff ffff ...... Note that the tested phone has been used very little, and 0000040: ffff ffff ffff ffff ...... we still recovered an extraordinary amount of information, 0000050: ffff ffff ffff ffff ...... 7998 files. All files are categorized into folders with the name of their extension. They can now be analyzed using We see all groups of ffff, because when a block is erased established traditional forensic techniques usually applied to in a NAND device, the entire block is overwritten with 0xFF. The erase operation is the only mechanism by which a 0 can other filesystems. be changed to a 1 in NAND flash [4]. Optionally we can 3.6 yaffs2 strings analysis enable the debug mode of yaffs [1]. The next step is to use nandwrite to copy both the data and OOB on the simulated For each file, strings prints the printable character se- NAND flash: quences that are at least 4 characters long (or the number given with the options below), and are followed by an #sudo nandwrite -a -o /dev/mtd0 ~/userdataoobpadbad.nanddump unprintable character. We can use this command to extract some data from the userdata image. Let us consider two examples. In the first one we would like to know all Writing data to block 1565 at offset 0xc3a0000 the names of wireless networks that the tested phone was Writing data to block 1566 at offset 0xc3c0000 connected to (note that the respective passwords are stored Writing data to block 1567 at offset 0xc3e0000 without encryption): Writing data to block 1568 at offset 0xc400000 Writing data to block 1569 at offset 0xc420000 #strings --all --radix=x userdatapadbad.nanddump | grep ssid | less The 1570 yaffs2 blocks are now copied on dev0. We can initially do an hex analysis of the mounted partition, and we 445049 ssid="eduroam" 445119 ssid="Fitzsimons Hotel Bar" can see that the content of the device is completely changed 44515f ssid="MPC - Fitzsimons 2nd Floor" (see below, nanddump -c /dev/mtd0). Finally we can mount 4451ab ssid="Paddy Wagon_WiFi" the image: 4451ed ssid="Kinlay House" 44522b ssid="Wireless-Galway-1" #sudo mount -t yaffs2 /dev/mtdblock0 /mnt/mtd 44526e ssid="www.izone.ie" 4452ac ssid="Stephen’s Green Free WIFI" and we will see then the full /data filesystem of our 4452f7 ssid="eircom" device accessible on the Linux workstation. Simple analysis 44532f ssid="bitbuzz" 445368 ssid="Harcourt Hotel FREE WiFi" techniques can then be used. The next two sections present 4453b2 ssid="opennet" two examples with carving and strings analysis. 4453eb ssid="WaveLAN Network" #nanddump -c /dev/mtd0 | grep -v "00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00" | grep -v "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff" | less

0x000197b0: 36 34 33 30 31 65 31 37 30 64 33 30 33 39 33 31 |64301e170d303931| 0x000197c0: 33 31 33 32 33 36 33 30 33 30 33 30 33 37 33 30 |3132363030303730| 0x000197d0: 33 32 35 61 31 37 30 64 33 33 33 37 33 30 33 34 |325a170d33373034| 0x000197e0: 33 31 33 33 33 30 33 30 33 30 33 37 33 30 33 32 |3133303030373032| 0x000197f0: 35 61 33 30 37 34 33 31 30 62 33 30 30 39 30 36 |5a3074310b300906| OOB Data: ff ff a2 9e 07 30 1e a3 ce b8 0d 02 2c ed e6 a9 |.....0...... ,...| OOB Data: 35 27 97 c2 7a 99 aa b4 bd 09 7c a7 b0 d1 7b d2 |5’..z.....|...{.| OOB Data: 3c 34 00 00 0c 03 00 00 66 69 9b 95 55 9b 0f ff |<4...... fi..U...| OOB Data: cf f3 03 cf 96 99 9b 96 55 a7 f3 ff cf aa 55 9b |...... U.....U.|

The all option means that we want to analyze the entire aims to be generic and easily applicable. Also, yaffs2 is file. The radix=x means that the offset will be printed in not currently well supported, including by well known hex format. With the grep command we can do a search for forensic tools. This work introduces then a fast, yet powerful, specific keywords. imaging and analysis technique, along with documenting its The second example shows all the places that have been fundamentals. searched on Google maps: #strings --all --radix=x userdatapadbad.nanddump References | grep maps.google.com | less [1] yaffs official website. Available at http://www.yaffs.net 7381305 http://maps.google.com/?q=harcourt street [2] Hoog A. and Gaffaney K. iPhone forensics. Via Forensics White paper, 8692f0f http://maps.google.com/?q=rome 2009. 86930d3 http://maps.google.com/?q=dawson street [3] Hoog A. Android forensics. Mobile Forensics World. 2009. 885e1ed http://maps.google.com/?q=barcellona [4] Hoog A. Android Forensics - Investigation, Analysis and Mobile 886c083 http://maps.google.com/?q=ginevra Security for Google Android. Elsevier, 2011. [5] Ayers R., and Jansen W., and Moenner L, and Delaitre A. Cell Phone These kind of techniques represent simple, yet powerful, Forensic Tools: An Overview and Analysis update. NIST Technical framework for fast and accurate memory image analysis, Report. 2007. retrieving targeted information in few minutes. Also, these, [6] Jansen W. and Ayers R. Guidelines on Cell Phone Forensics. Rec- ommendations of the National Institute of Standards and Technology. and other basic data extraction, can be scripted-out fairly NIST Technical Report. 2009. easily depending on the users needs. [7] Nielsen Research. Android Grew Its Smartphone Marketshare; iPhone Stayed Flat. 2011. [8] Androidology I: Architecture. Available at http://www.android.com/ 3.7 Discussion [9] Frank Maker and Yu-Hsuan Chan A Survey on Android vs. Linux. We proposed a method for a fast imaging and analysis of Department of Electrical and Computer Engineering, University of California, Davis. 2011. a data partition of an Android device based on the yaffs2 [10] NAND vs. NOR : Technology Overview. Available at filesystem. With this generalized procedure, we demonstrate http://www.toshiba.com/taec/components/Generic/Memory_Resources/ that a wealth of information can be recovered in a forensi- NANDvsNOR.pdf [11] Android Debug Bridge. Available at http://developer.android.com/ cally sound manner in few minutes without the need of any guide/developing/tools/adb.html specific tool or system. [12] Timothy Vidas, Chengye Zhang. Toward a general collection method- This work can be extended in many ways, including in ology for Android devices. In Proceedings of the 11th Digital Forensics Research Workshop (DFRWS 2011). New Orleans, LA. August 2011. hex analysis that permits to recover raw and deleted data [13] NANDSIM options. Available at http://www.linux-mtd.infradead.org/ from the phone, fine-tuning configuration parameters for files [14] Christiaan Beek. Introduction to File carving. White paper. McAfee. analysis, and building a set of significant searching words 2011. [15] Scalpel tool. Available at http://www.digitalforensicssolutions.com/ (ssid, passwd, etc.) to be used in the analysis in a more Scalpel/ automated way. The whole procedure was also scripted-out so it can be used without having a deep knowledge of the device or underlying technologies and commands. 4. Conclusion In this paper, we present a physical acquisition method for Android data partitions based on yaffs2. The method