information

Article Unix Domain Sockets Applied in Android Malware Should Not Be Ignored

Xu Jiang * ID , Dejun Mu and Huixiang Zhang ID

School of Automation, Northwestern Polytechnical University, Xi’an 710072, China; [email protected] (D.M.); [email protected] (H.Z.) * Correspondence: [email protected]

Received: 8 February 2018; Accepted: 2 March 2018; Published: 4 March 2018

Abstract: Increasingly, malicious Android apps use various methods to steal private user data without their knowledge. Detecting the leakage of private data is the focus of mobile information security. An initial investigation found that none of the existing security analysis systems can track the flow of information through Unix domain sockets to detect the leakage of private data through such sockets, which can result in zero-day exploits in the information security field. In this paper, we conduct the first systematic study on Unix domain sockets as applied in Android apps. Then, we identify scenarios in which such apps can leak private data through Unix domain sockets, which the existing dynamic taint analysis systems do not catch. Based on these insights, we propose and implement JDroid, a taint analysis system that can track information flows through Unix domain sockets effectively to detect such privacy leaks.

Keywords: Android; information flows; Unix domain sockets; private data; malware

1. Introduction In the second quarter of 2017, Android dominated the smartphone market, garnering an 86.1% share, according to a report by the International Data Corporation. Meanwhile, there are significant numbers of Android apps that enrich Android features. As mobile devices have become integrated into daily life, mobile devices collect increasing amounts of private data. Unfortunately, the Android has been the target of increasing attacks by third-party apps [1–3], forming a widespread, serious challenge because apps also increasingly attempt to steal private data (e.g., IMEI, and location) and send them to remote servers. This has become especially true in the era of big data [4,5]. Hackers use several different methods to steal private data while remaining undetected by existing analysis systems, including inter- communication (IPC) [6–10]. Since Android is based on a tailored environment, it inherits a subset of the traditional Linux IPCs that differ from Android IPCs [11–14]. Among Linux IPCs implemented within Android, Unix domain sockets are the only one apps can easily make use of. Although Google encourages Android developers to use Android IPCs, some still use Unix domain sockets, known as local sockets [15]. This practice occurs not only because using UNIX domain sockets for IPC is more efficient but also because Android IPCs are unsuitable for communication between the Java language in which most apps are written and native processes/threads [16]. Both the Android software development kit (SDK) and the Android native development kit (NDK) [17] provide for Unix domain sockets. To the best of our knowledge, how malicious apps exploit Unix domain sockets has not yet been systematically studied. After analyzed 2600 apps including 1500 normal apps and 1100 malicious apps, 315 (21%) normal apps and 209 (19%) malicious apps have related APIs or system calls in code. In addition, APIs for Unix domain sockets can be used in different versions of Android operating systems including the latest version (i.e., Android 8.0) according to the documentation on the official

Information 2018, 9, 54; doi:10.3390/info9030054 www.mdpi.com/journal/information Information 2018, 9, 54 2 of 16 developer website, which is verified through the implement of malware stealing private data based on Unix domain sockets [18]. What is more important, the existing taint analysis systems are unable to detect such leaks. Motivated by these insights, in this study, we conduct a systematic study on information flows through Unix domain sockets and propose and implement JDroid, an efficient dynamic taint analysis system that tracks information flows through Unix domain sockets. Because the tracking process involves taint propagation at the Java level, the native level and between both through JNI, JDroid reuses some existing modules from TaintDroid [19] and NDroid [20]. To make JDroid effective and efficient, we handle several challenging issues, such as the various approaches to the different types of Unix domain sockets, the differences between Android and Linux, etc. An evaluation using sample apps employs Unix domain sockets to transmit private data and circumvent detection by existing analysis systems and demonstrates the effectiveness of JDroid in detecting private data leakage through Unix domain sockets. We further evaluate and report JDroid’s performance. The rest of this paper is organized as follows. Section2 introduces the background, related work and describes undetected information leakage scenarios through Unix domain sockets. We detail the design, implementation, and evaluation of JDroid in Sections3–5. Finally, we report the limitations of JDroid and we conclude the paper in Section6.

2. Background

2.1. Android App Overview In an Android system, apps are commonly written in the Java language and compiled into Java bytecode, which is then translated to Dalvik bytecode and stored in .dex and .odex files that execute on the proprietary register-based Dalvik virtual machine (DVM) [21]. Apps may contain both Java and native components; these native components are simply shared libraries loaded dynamically at runtime [22]. The interaction between Java components and native components is well-defined by the Java Native Interface (JNI) specification and supported by the NDK [23]. The lowest level of Android architecture is the customized Linux kernel, which provides the basic architectural model for process scheduling, resource handling, memory management, networking, etc. Consequently, Linux mechanism applied in Android system is worth researching seriously [24].

2.2. Unix Domain Sockets A Unix domain socket is a data communications endpoint for exchanging data between processes executing on the same host operating system, and such sockets are a standard component of POSIX operating systems [25,26]. The APIs for Unix domain sockets are similar to those of Internet sockets; however, rather than using an underlying network protocol, all communication occurs entirely within the operating system kernel. Traditionally, there are three types of Unix domain socket address namespaces: FILESYSTEM, RESERVED, and ABSTRACT. An address in the FILESYSTEM namespace is associated with a file on the filesystem. RESERVED is in essence a sub-namespace of FILESYSTEM, while ABSTRACT is completely independent of the FILESYSTEM. The protocol family in the Unix domain is AF_UNIX/LOCAL [27]. The socket type specifies the communication semantics. SOCK_STREAM type sockets are full-duplex byte streams that must be in a connected state before any data may be sent or received through it [28]. The diagram in Figure1 shows the complete client/server interaction [29]. In addition, Unix domain sockets support both unordered and unreliable datagram transmissions (i.e., SOCK_DGRAM) and ordered and reliable datagram transmission (i.e., SOCK_SEQPACKET, which is similar to SOCK_STREAM). As the diagram in Figure2 shows, there is no flow control between the server and the client [30]. Each datagram message carries its destination address, its return address and a certain amount of data. Compared with SOCK_STREAM type sockets, the server Information 2018, 9, 54 3 of 16 need not create a socket for listening via the “listen” method and then call the “accept” method to wait forInformation a connection. 2018,, 9,, xx FORFOR PEERPEER REVIEWREVIEW 3 of 15

Server

Sockett( )

Biind( )

Liistten( )

Client Client Acceptt( )

Sockett( )

bllocks unttiill connecttiion from clliientt Connectt( ) connecttiion esttablliishmentt

Wriitte( ) datta(requestt) Read( )

Read( ) datta(replly) Wriitte( )

Cllose( )

FigureFigure 1.1.CConnection-basedonnection-based sockets sockets i interaction.nteraction.

Server

Sockett( )

Clliientt Biind( )

Sockett( ) Read( )

bllocks unttiill dattagram receiived from tthe clliientt Wriitte( ) datta (requestt)

Wriitte( ) datta(replly) Read( )

Cllose( )

FigFigureure 2. DDatagram-orientedatagram-oriented sockets i interaction.nteraction.

2.3.2.3. Related Work Currently, the the state-of-the-art state-of-the-art method method for detecting for detecting the leakage the leakage of private of dataprivate is called data “dynamic is called taint“dynamic analysis” taint andanalysis is typically” and is used typically in information used in information security to security enforce informationto enforce information flow policies flow to policies to preserve data confidentiality and integrity. Tracking information flows allows users to know how a program processes private data [31,32]. TaintDroid is a prominent representative of applications that track information flows in Android dynamic taint analysis systems, with some 2312 citations at the time of this writing. Many existing analysis systems such as Droidbox [33] and AppFence [34] have developed new functionality based on TaintDroid.

Information 2018, 9, 54 4 of 16 preserve data confidentiality and integrity. Tracking information flows allows users to know how a program processes private data [31,32]. TaintDroid is a prominent representative of applications that track information flows in Android dynamic taint analysis systems, with some 2312 citations at the time of this writing. Many existing analysis systems such as Droidbox [33] and AppFence [34] have developed new functionality based on TaintDroid. By modifying the Android application framework and the DVM, TaintDroid stores a 32-bit bitvector with each variable to encode a taint tag, supporting 32 different taint markers. As with local method variables and arguments, TaintDroid allocates taint tag storage by doubling the size of the stack frame allocation, and taint tags are stored adjacent to class fields and arrays inside the VM interpreter’s data structures. TaintDroid propagates the taint tags when the app is running, and monitors whether outgoing data has a taint tag in the Java layer. However, TaintDroid only loads native libraries from the firmware: it does not apply to those included in third-party apps. Currently, hackers increasingly use native code in their malicious apps to hide the program logic [35–37]. Thus, researchers have begun to pay more attention to the security of such third-party native libraries. Some systems use tools such as ptrace [38], strace [39], and ltrace [40] to collect the system-call sequences made by these libraries and use that information to analyse malicious app behavior. CopperDroid [41] collects and analyses system calls acquired by instrumenting QEMU and app behaviors related to Binder. DroidScope [42] tracks information flows by reconstructing both the OS-level and Java-level semantics simultaneously and seamlessly, although it is less efficient. NDroid tracks information flows across the boundary between the Java and native layers by instrumenting important JNI-related methods, and it monitors native code by processing each ARM/Thumb instruction. To work seamlessly with TaintDroid, NDroid reuses the modules modified by TaintDroid, and taints added by NDroid follow TaintDroid’s format in the DVM. NDroid leverages shadow registers and memory to save the taints during native-layer execution and sets the taints in the DVM stack so it can refer to them when the taints are propagated to the Java layer. Unfortunately, to the best of our knowledge, none of the existing dynamic taint-tracking approaches consider information leaks through Unix domain sockets.

2.4. Threat Model and Assumptions Android apps using APIs for Unix domain sockets need only request Internet permission [43], which is so commonly used that users are not wary about granting it. However, once the app has been granted Internet permission, the APIs for Unix domain sockets can be executed in either Java or native code. In this section, we analyze the scenarios in which private data can be transmitted through Unix domain sockets and then leaked and explain why such leakage cannot be detected by existing systems. We use TaintDroid and NDroid as references, because they are both advanced dynamic taint analysis systems and open source. An information flow from a source to a sink is the main requirement for leaking private data. We consider the source to be the APIs that can acquire private data and the sink to be any APIs that can send private data out of the Android system (usually the network interface). Dynamic taint analysis systems track how labeled data impact other data in ways that might leak private data [44]. Private data is first identified at the source, where a taint tag indicates the data type assigned. Later, the data will be checked when it gets sent to the sink. Thus, eliminating the taint tags attached by existing systems has raised concerns within the hacker community. To better understand the threat model, we define the client and the server as the sending and receiving ends, respectively, and group the apps that employ Unix domain sockets to transmit private data into three cases, depending on the client and server locations in the Android architecture, as shown in Table1. Information 2018, 9, 54 5 of 16

Table 1. The client/server combinations in information flows through Unix domain sockets.

Client Java Native Java Case 1 Case 3 Server Native Case 3 Case 2

Case 1. Information 2018As, 9, shownx FOR PEER in Figure REVIEW3a, in this case, the client and the server are both located in the Java5 of 15 layer. First, private data is transmitted from the client to the server, which is acquired by invoking the source.this case, Next, the the server server and processes the client it and are then created leaks solely it through by Android the sink. APIs It is noteworthy using specified that, in names this case,confined the server to the and Linux the client abstract are namespace. created solely TaintDroid by Android and APIs NDroid using specifiedcannot detect names such confined leakage, to thebecause Linux they abstract do not namespace. consider taint TaintDroid propagation and NDroid through cannot Unix domain detect such sockets leakage, in the because Java layer. they do not considerCase 2.taint As shown propagation in Figure through 3b, in Unix this domain case, the sockets client in and the the Java server layer. are both located in the nativeCase layer. 2. As The shown app invokes in Figure the3b, source in this to case, fetch the private client and data, the which server isare then both transmitted located in to the the native client layer.in the The native app layer invokes through the source JNI. The to fetch client private processes data, the which data isand then transmits transmitted it to tothe the server client through in the nativeUnix domain layer through sockets. JNI. Finally, The clientnative processes code can leak the datathe data and directly transmits by itcalling to the the server POSIX through socket Unix API. domainTaintDroid sockets. Finally,cannot nativedetect codesuch canleaks leak because the data it directlyloads only by callingthe app, the not POSIX the third socket-party API. native .TaintDroid Moreover, cannot it does detect not such consider leaks local because sockets. it loads NDroid only misses the app, such not leaks the third-party because it nativecannot library.track information Moreover, it flow does through not consider Unix localdomain sockets. sockets; NDroid consequently misses such it receives leaks because the information it cannot track from informationmemory without flow through corresponding Unix domain taint sockets; tags. consequently Most existing it receives analysis the systems information have from the memorycommon withoutvulnerability corresponding that they taintignore tags. POSIX Most socket existing APIs analysis as the systemssink. have the common vulnerability that they ignoreCase 3. POSIX As shown socket in APIs Figure as the 3c, sink. in this case, the client and the server are located in different layers:Case the 3. directionAs shown of ininformation Figure3c, in flow this could case, thebe from client the and Java the layer server to are the located native inlayer different, or vice layers: versa. theIn the direction Java layer, of information the client flowsends could private be data from to the the Java server layer in to the the native native layer layer, through or vice Unix versa. domain In the Javasockets. layer, Then, the client Java code sends fetches private private data to data the serverfrom the in native the native layer layer through through JNI Unixand leaks domain it through sockets. a Then,Java sink Java codeor native fetches code private processes, data from which the then native leak layer the through data directly JNI and through leaks it POSIX through socket a Java APIs. sink orTaintDroid native code and processes, NDroid cannot which detect then leak such the leakage data directly for the throughsame reasons POSIX as socketin Case APIs. 2. TaintDroid and NDroid cannot detect such leakage for the same reasons as in Case 2.

Java layer Java layer send privacy data out native layer Client LocalSocket Server Sinks native code sends out privacy data native layer

Client LocaCllSienot cket Server Sinks

(a) (b)

Java layer Java code sends out Client privacy data

t

e Sinks

k

c

o

S

l

a

native layer c native code sends out

o

L privacy data directly

Server Sinks

(c)

Figure 3. Examples of private data leakage through Unix domain sockets. (a) Case 1; (b) Case 2; Figure 3. Examples of private data leakage through Unix domain sockets. (a) Case 1; (b) Case 2; (c) Case 3. (c) Case 3.

3. Design and Implement Since JDroid must handle the taint propagation in the DVM and across the Java and native layers through JNI, we re-use some modules from TaintDroid and NDroid. Figure 4 illustrates the JDroid architecture. The general goal of JDroid is to track information flows through Unix domain sockets and detect private data leakage.

Information 2018, 9, 54 6 of 16

3. Design and Implement Since JDroid must handle the taint propagation in the DVM and across the Java and native layers through JNI, we re-use some modules from TaintDroid and NDroid. Figure4 illustrates the JDroid architecture. The general goal of JDroid is to track information flows through Unix domain sockets Information 2018, 9, x FOR PEER REVIEW 6 of 15 andInformation detect 2018 private, 9, x FOR data PEER leakage. REVIEW 6 of 15

JJaavvaa CCoommppoonneenntt JJaavvaa LLiibbrraarriieess

DDVVMM UUnniixx ddoommaaiinn ssoocckkeettss JJaavvaa--lleevveell vviieeww Hook Dalvik vm Hook Dalvik vm EEnnggiinnee

JNI Instructio JNI UUnniixx ddoommaaiinn ssoocckkeettss Instructio n TTaaiinntt n Engine TTrraaccee Engine Unix domain sockets Unix domain sockets SSyysstteemm Lib Hook OS-level view Lib Hook OS-level view Engine NNaattiivvee CCoommppoonneenntt Engine

SSyysstteemm LLiibbrraarriieess

LLiinnuuxx KKeerrnneell

QQEEMMUU OOSS--lleevveell vviieeww RReeccoonnssttrruuccttoorr

FigureFigFigureure 4.44.. JDroidJDroidJDroid overview.overviewoverview..

3.1.33.1..1.TaintTaint Propagation Propagation DevelopersDevelopersDevelopers can can can use use use both bothboth Java Java Java code code code and and and native native native code code code to implement to to implement implement communications communications communications through through through Unix domainUnixUnix domaindomain sockets. sockets.sockets. For JDroid, ForFor JDroid,JDroid, a challenging aa challengingchallenging issue is issueissue how isis to howhow correctly toto correctlycorrectly ensure taintensureensure propagation tainttaint propagationpropagation during theduringduring process thethe process ofprocess private ofof dataprivateprivate transmission. datadata transmission.transmission. ToToTo tackle tackletackle this thisthis issue, issue,issue, JDroid JDroidJDroid creates createscreates a aa structure ststructureructure called calledcalled Node NodeNode thatthat recordsrecords thethethe pathnamespathnamespathnames ofofof socketssocketssockets andandand the thethe corresponding correspondingcorresponding taint tainttaint tags, tags,tags, and andand uses usesuses a listaa listlist to toto store storestore the thethe Node NodeNode structure. structure.structure. As AsAs shown shownshown in Figure inin FigureFigure5, the 55,, Nodethethe NodeNode structure structurestructure includes includesincludes taint, tainttaint the,, taint thethe tainttaint tag for tagtag sending forfor sendingsending data; data;data; a srcname, aa srcname,srcname, which whichwhich is the isis pathname ththee pathnamepathname of the ofof sendingthethe sendingsending end; end;end; and andand a dstname, aa dstnamedstname the,, pathname thethe pathnamepathname of the ofof receiving thethe receivingreceiving end. endend..

ttyyppeeddeeff ssttrruucctt NNooddee{{ iinntt ttaaiinntt;; cchhaarr ssrrccnnaammee[[225566]];; cchhaarr ddssttnnaammee[[225566]];; ssttrruucctt nnooddee **nneexxtt;; }}NNooddee

Figure 5. The “Node” structure. FigureFigure 55.. TheThe ““NodeNode”” structurestructure.. 3.2. The Handle in the Sending End 3.2.3.2. TheThe HandleHandle intheinthe SendingSending EndEnd Prior to sending, each API related to sending data in both the Java and native libraries is tasked PriorPrior toto sending,sending, eacheach APIAPI relatedrelated toto sendingsending datadata inin bothboth thethe JavaJava andand nativenative librarieslibraries isis taskedtasked with creating a Node. withwith creatingcreating aa NodeNode.. ConnectionConnection--basedbased sockets. sockets. InIn the the first first case, case, the the client client is is located located in in the the Java Java layer. layer. SinceSince TaintDroidTaintDroid uses uses message message--levellevel taint taint trackingtracking that that represents represents the the upper upper bound bound of of the the taint taint tag tag assignedassigned toto variablesvariables containedcontained inin thethe message,message, JDroidJDroid alsoalso adoptsadopts messagemessage--levellevel tainttaint trackingtracking andand createscreates aa NodeNode toto rrecordecord thethe relevantrelevant information.information. UsingUsing thethe JavaJava APIAPI “write”“write” asas anan example,example, FigureFigure 66 showsshows howhow thethe functionsfunctions areare calledcalled andand howhow toto model model its its taint taint propagation propagation operation. operation. First, First, the the instrumented instrumented code code in in ““libcore.io.Posix.writelibcore.io.Posix.write”” obtainsobtains tainttaint andand invokesinvokes thethe nativenative methodmethod ““addTaintFileaddTaintFile”” toto createcreate thethe NodeNode andand insertinsert thethe entryentry inin the the NodeNode list.list. Note Note that that the the ARM/Thumb ARM/Thumb procedure procedure call call standard standard defines defines that that the the first first four four

Information 2018, 9, 54 7 of 16

Connection-based sockets. In the first case, the client is located in the Java layer. Since TaintDroid uses message-level taint tracking that represents the upper bound of the taint tag assigned to variables contained in the message, JDroid also adopts message-level taint tracking and creates a Node to record the relevant information. Using the Java API “write” as an example, Figure6 shows how the functions are called and how to model its taint propagation operation. First, the instrumented code in “libcore.io.Posix.write” obtains taint and invokes the native method “addTaintFile” to create the Node and insert the entry in the Information 2018, 9, x FOR PEER REVIEW 7 of 15 InformationNode list. 2018 Note, 9, x FOR that PEER the ARM/ThumbREVIEW procedure call standard defines that the first four parameters7 of 15 areparameters passed in are R0 passed to R3, whilein R0 theto R3, remaining while the parameters remaining are parameters pushed onto are stack, pushed and onto the returnstack, and value the is parameters are passed in R0 to R3, while the remaining parameters are pushed onto stack, and the placedreturn value in R0. is placed in R0. return value is placed in R0.

OutputStream.write(byte[] buffer) Outputstream OutputStream.write(byte[] buffer) Outputstream

IoBridge.write(FileDescriptor fd, byte[]bytes, int byteOffset, int byteCount IoBridge.write(FileDescriptor fd, byte[]bytes, int byteOffset, int byteCount

IoBridge IoBridge int write(FileDescriptor fd, ByteBuffer buffer}{ initn wt rtaitien(tF =il ebDufefsecrr.igpettoDr ifrde,c tBByytteeBBuuffffeerr bTuafifnetr(});{ initf t a(tianitn =t !b=u Tffaeirn.gt.etatDinitr_eclteBayr)te{BufferTaint(); if n(taatinvte !p=u Tblaiicn st.ttaatiinct _vcolieda ar)d{dTaintFile(int fd, int taint);} native public static void addTaintFile(int fd, int taint);} Posix Posix Static void Dalvik_dalvik_system_Taint_addTaintFile(const u4* args, JValue* pResult) St{atic void Dalvik_dalvik_system_Taint_addTaintFile(const u4* args, JValue* pResult) { int fd=(int)args[0];//args[0]= descriptor int fd=(int)args[0];//args[0]= libcore_io_Posix u4 taint = args[1]; // args[1] = the taint tag u4 taint = args[1]; // args[1] = the taint tag libcore_(iJoN_IP)osix struct sockaddr_un src_addr; (JNI) strsuocctk sleonc_kta dsrdcr_audnd srr_cl_enad; dr; sogcektlseonc_ktn sarmc_ea(dsodcr_klfedn,(;struct sockaddr*)&src_addr, &sock_src_len); gesttsroucckt nsaomckea(dsodcr_kufdn, (dssttr_uacdt dsor;ckaddr*)&src_addr, &sock_src_len); strsuocctk sleonc_kta ddstr_audnd dr_stl_enad; dr; socklen_t dst_addr_len; Kernel getpeername(sockfd,(struct sockaddr*)&dst_addr, &dst_src_len); Kernel geitnpseeerrtnDaamtae((psaotchkTfadi,n(stltirsutc, ti nsto ctakiandt,d srr*c)_&adsdtr_.saudnd_r,p &atdhs, td_sstr_ca_dlednr.)s;un_path);} insertData(pathTaintlist, int taint, src_addr.sun_path, dst_addr.sun_path);}

Figure 6. The “write” propagation operation. FigureFigure 6.6. TheThe “write “write”” propagation propagation operation operation.. In the second case, the client is located in native layer. JDroid adopts NDroid’s method to track InIn the the second second case, case, the the client client is islocated located in in native native layer. layer. JDroid JDroid adopts adopts NDroid NDroid’s’s method method to to track track the propagation of private data through JNI and monitor native code using shadow memory and thethe propagation propagation of of private private data data through through JNI JNI and and monitor monitor native native code code using using shadow shadow memory memory and and registers so that the corresponding taint tag always follows the private data. Consequently, JDroid registersregisters so so that that the the corresponding corresponding taint tagtag alwaysalways followsfollows the the private private data. data. Consequently, Consequently, JDroid JDroid can can retrieve a taint tag from memory in the native layer, check whether the data being sent includes canretrieve retrieve a taint a taint tag tag from from memory memory in the in the native native layer, layer, check check whether whether the data the data being being sent includessent includes a taint a taint tag and create an appropriate entry in the Node list. The rest of the process is identical to that a tagtaint and tag create and cre anate appropriate an appropriate entry entry in the in Node the Node list. The list. rest The of rest the of process the process is identical is identical to that to for that the for the Java layer. forJava the layer. Java layer. Datagram-oriented sockets. Datagram-oriented sockets use datagram communications DatagramDatagram-oriented-oriented sockets. Datagram-orientedDatagram-oriented sockets sockets use datagram use datagram communications communications between between one server and several clients. A datagram-oriented socket provides a symmetric data betweenone server one and server several and clients. several Aclients. datagram-oriented A datagram-oriented socket provides socket provides a symmetric a symmetric data exchange data exchange interface without requiring a connection to be established. The sending behavior is exchangeinterface interface without requiring without a requiring connection a connectionto be established. to be The established. sending behavior The sending is implemented behavior is by implemented by the “sendto” and “sendmsg” methods, as shown in Figure 7. implementedthe “sendto” by and the “sendmsg” “sendto” and methods, “sendmsg” as shown methods in Figure, as shown7. in Figure 7.

ssize_t sendto(intsockfd, const void *buf, size_tlen, int flags, conststructsockaddr *dest_addr, socklen_taddrlen) ssize_t sendto(intsockfd, const void *buf, size_tlen, int flags, conststructsockaddr *dest_addr, socklen_taddrlen) ssize_t sendmsg(intsockfd, conststructmsghdr *msg, int flags) ssize_t sendmsg(intsockfd, conststructmsghdr *msg, int flags)

Figure 7. The “sendto” and “sendmsg” methods. FigureFigure 7.7. TheThe “sendto “sendto”” and and “sendmsg “sendmsg”” methods methods.. Although Google provides Android APIs (e.g., LocalServerSocket, LocalSocket) for developers Although Google provides Android APIs (e.g., LocalServerSocket, LocalSocket) for developers to use with Unix domain sockets, they are not available for datagram-oriented sockets except when to use with Unix domain sockets, they are not available for datagram-oriented sockets except when the client invokes the method “connect” to establish the connection relation. If using “connect” to the client invokes the method “connect” to establish the connection relation. If using “connect” to connect the server, the client can use the method “send” to send the data. Following that path, the connect the server, the client can use the method “send” to send the data. Following that path, the handle is the same as in connection-based sockets. handle is the same as in connection-based sockets. Briefly, the SOCK_DGRAM type of Unix domain socket that does not invoke “connect” must be Briefly, the SOCK_DGRAM type of Unix domain socket that does not invoke “connect” must be implemented in native code. Therefore, JDroid hooked the “sendto” and “sendmsg” methods. For implemented in native code. Therefore, JDroid hooked the “sendto” and “sendmsg” methods. For “sendto”, JDroid parses the second parameter (i.e., buf) to check whether the data being sent has a “sendto”, JDroid parses the second parameter (i.e., buf) to check whether the data being sent has a taint tag and the fifth parameter (i.e., dest_addr) to obtain the dstname. For “sendmsg”, it uses the taint tag and the fifth parameter (i.e., dest_addr) to obtain the dstname. For “sendmsg”, it uses the msghdr structure to minimize the number of directly supplied arguments, as shown in Figure 8. msghdr structure to minimize the number of directly supplied arguments, as shown in Figure 8.

Information 2018, 9, 54 8 of 16

Although Google provides Android APIs (e.g., LocalServerSocket, LocalSocket) for developers to use with Unix domain sockets, they are not available for datagram-oriented sockets except when the client invokes the method “connect” to establish the connection relation. If using “connect” to connect the server, the client can use the method “send” to send the data. Following that path, the handle is the same as in connection-based sockets. Briefly, the SOCK_DGRAM type of Unix domain socket that does not invoke “connect” must be implemented in native code. Therefore, JDroid hooked the “sendto” and “sendmsg” methods. For “sendto”, JDroid parses the second parameter (i.e., buf) to check whether the data being sent has a taint tag and the fifth parameter (i.e., dest_addr) to obtain the dstname. For “sendmsg”, it uses the Information 2018, 9, x FOR PEER REVIEW 8 of 15 msghdr structure to minimize the number of directly supplied arguments, as shown in Figure8.

struct msghdr { Void *msg_name; //optional address socklen_t msg_namelen; //size of address struct iovec *msg_iov; // scatter size_t msg_iovlen; // elements in msg_iov void *msg_control; //ancillary data size_t msg_controllen; //ancillary data buffer len int msg_flags; // flags on received message };

FigureFigure 8.8. TheThe “msghdr”“msghdr” struct.struct. The data being sent is pointed to by the elements of themsg.msg_iov array, and dstname is The data being sent is pointed to by the elements of the msg.msg_iov array, and dstname is pointed to by msg.msg_name. Further, the Srcname can also be obtained by invoking pointed to by msg.msg_name. Further, the Srcname can also be obtained by invoking “getsockname”. “getsockname”. Based on these data items, JDroid creates a new Node. Based on these data items, JDroid creates a new Node.

3.3.3.3. TheTheHandle Handle inthe in the Receiving Receiving End End JDroidJDroid initializesinitializes thethe tainttaint tagtag forfor trackingtracking anan informationinformation flowflow enteringentering thethe serverserver usingusing twotwo steps.steps. TheThe firstfirst stepstep determinesdetermines whichwhich NodeNode shouldshould getget thethe tainttaint tagtag basedbased onon thethe peer’speer’s pathnamepathname (i.e.,(i.e., srcname) srcname and) and its itsown own pathname pathname (i.e., (i dstname)..e., dstname The). second The second step attaches step attaches the taint the to thetaint received to the data,received which data, can which used tocan continue used to trackingcontinuetaint tracking propagation taint propagation until it reaches until it a sink.reaches a sink. TwoTwo approachesapproaches areare requiredrequired toto obtainobtain thethe peer’speer’s pathnamepathname becausebecause ofof thethe differencesdifferences betweenbetween connection-basedconnection-based socketssockets andand datagram-orienteddatagram-oriented sockets.sockets. Connection-basedConnection-based sockets.sockets. ForFor communications communications between between connection connection-based-based sockets sockets (whether (whether in inthe the Java Java or or native native layer), layer), the the server server looks looks for for the the taint taint tag tag (i.e., (i.e., taint taint)) of the the corresponding corresponding NodeNode basedbased on on the the methods methods that that JDroid JDroid invokes: invokes: “getpeername” “getpeername” to obtain obtain the the client’s client’s pathname pathname and and “getsockname”“getsockname” toto getget itselfitself pathname.pathname. IfIf thethe clientclient doesdoes notnot bindbind thethe specifiedspecified pathname,pathname, thethe returnreturn valuevalue ofof “getpeername”“getpeername” may may be be NULL, NULL, differing differing from from Linux, Linux, which which returns returns the the pathname pathname allocated allocated by kernel.by kernel. Therefore, Therefore, JDroid JDroid confirms confirms that the that corresponding the corresponding Node is based Node not is only based on the not pathnames only on the of thepathnames client and of the the server, client but and also the the server sequence, but also of a theNode sequence in the list of based a Node on in its the creation list based time. onThen, its JDroidcreation associates time. Then, the JDroid taint with associates the received the taint data. with the received data. TakingTaking thethe JavaJava APIAPI “read”“read” asas anan example,example, JDroidJDroid firstfirst obtainsobtains thethe taint tag based on the Node,Node, whichwhich goesgoes back back to to the the Java Java layer layer and and was was attached attached to the to receivedthe received data. data. Figure Figure9 shows 9 howshows to how model to themodel taint the propagation taint propagation operation. operation.

InputStream.write(byte[] buffer)

IoBridge.read(FileDescriptor fd, byte[]bytes, int byteOffset, int byteCount)

int read(FileDescriptor fd, ByteBuffer buffer}{ int taint = native public static int getTaintFile(fd); Taint.addTaintByteArray((byte[])buffer, tag);}

private native int getTaintFile(FileDescriptor fd){ int recvfd = fd.getDescriptor(); struct sockaddr_un sock_addr; socklen_t sock_addr_len; getpeername(recvfd,(struct sockaddr*)&sock_addr, &sock_addr_len); //get the path of the sending end int taint=Node*findtag(sock_addr.sun_path); //look for the taint tag Return tag; }

Figure 9. “read” taint operation.

Information 2018, 9, x FOR PEER REVIEW 8 of 15

struct msghdr { Void *msg_name; //optional address socklen_t msg_namelen; //size of address struct iovec *msg_iov; // scatter size_t msg_iovlen; // elements in msg_iov void *msg_control; //ancillary data size_t msg_controllen; //ancillary data buffer len int msg_flags; // flags on received message };

Figure 8. The “msghdr” struct. The data being sent is pointed to by the elements of themsg.msg_iov array, and dstname is pointed to by msg.msg_name. Further, the Srcname can also be obtained by invoking “getsockname”. Based on these data items, JDroid creates a new Node.

3.3. TheHandle inthe Receiving End JDroid initializes the taint tag for tracking an information flow entering the server using two steps. The first step determines which Node should get the taint tag based on the peer’s pathname (i.e., srcname) and its own pathname (i.e., dstname). The second step attaches the taint to the received data, which can used to continue tracking taint propagation until it reaches a sink. Two approaches are required to obtain the peer’s pathname because of the differences between connection-based sockets and datagram-oriented sockets. Connection-based sockets. For communications between connection-based sockets (whether in the Java or native layer), the server looks for the taint tag (i.e., taint) of the corresponding Node based on the methods that JDroid invokes: “getpeername” to obtain the client’s pathname and “getsockname” to get itself pathname. If the client does not bind the specified pathname, the return value of “getpeername” may be NULL, differing from Linux, which returns the pathname allocated by kernel. Therefore, JDroid confirms that the corresponding Node is based not only on the pathnames of the client and the server, but also the sequence of a Node in the list based on its creation time. Then, JDroid associates the taint with the received data. Taking the Java API “read” as an example, JDroid first obtains the taint tag based on the Node, whichInformation goes2018 back, 9, 54to the Java layer and was attached to the received data. Figure 9 shows how9 ofto 16 model the taint propagation operation.

InputStream.write(byte[] buffer)

IoBridge.read(FileDescriptor fd, byte[]bytes, int byteOffset, int byteCount)

int read(FileDescriptor fd, ByteBuffer buffer}{ int taint = native public static int getTaintFile(fd); Taint.addTaintByteArray((byte[])buffer, tag);}

private native int getTaintFile(FileDescriptor fd){ int recvfd = fd.getDescriptor(); struct sockaddr_un sock_addr; socklen_t sock_addr_len; getpeername(recvfd,(struct sockaddr*)&sock_addr, &sock_addr_len); //get the path of the sending end int taint=Node*findtag(sock_addr.sun_path); //look for the taint tag Return tag; }

FigureFigure 9. 9. “read“read”” taint taint operation operation.. Information 2018, 9, x FOR PEER REVIEW 9 of 15 Information 2018, 9, x FOR PEER REVIEW 9 of 15 Datagram-orientedDatagram-oriented sockets. sockets.If If the the client client used used “connect”, “connect”, the server the server can employ can employ “recv” to “recv” receive to thereceive data,Datagram whichthe data,- isoriented the which same sockets. as is when the same usingIf the as connection-based client when used using “connect”, connection sockets. the In-based addition, server sockets. can for datagram-oriented employ In addition, “recv” for to sockets,receivedatagram the-oriented server data, which calls sockets, “recvfrom” is the the same server or as“recvmsg” calls when “recvfrom” using to initialize connection or the“recvmsg”- receivebased sockets. behavior to initialize In on addition, a the socket, receive for as showndatagrambehavi inor Figure-onoriented a socket, 10. sockets, as shown the in server Figure calls10. “recvfrom” or “recvmsg” to initialize the receive behavior on a socket, as shown in Figure 10. ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen); ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags); ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen); ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags); FigureFigure 10.10.The The “recvfrom”“recvfrom” and and“recvmsg” “recvmsg” methods. methods. Figure 10. The “recvfrom” and “recvmsg” methods. Since the server cannot exploit “getpeername” to get the client’s pathname, JDroid parses the functionSinceSince arguments thethe serverserver to cannotcannot obtain exploitexploit the srcname “getpeername”“getpeername” and uses that to getget to look thethe client’s clientfor the’s pathname,Nodepathname,. Then, JDroidJDroid JDroid parsesparses associates thethe functionfunctionthe taint arguments argumentswith the received toto obtainobtain buffer thethe srcnamesrcname using shadow andand usesuses memory thatthat toto lookandlook registers. forfor thethe Node.Node Only. Then,Then, in this JDroidJDroid way can associatesassociates JDroid thethecontinue tainttaint withwith to track thethe receivedthereceived data. bufferbuffer usingusing shadowshadow memorymemory andand registers.registers. Only in this wayway cancan JDroidJDroid continuecontinueFor to“recvfrom”to tracktrack thethe, data. data.src_addr represents the srcname, while for “recvmsg”, msg.msg_name specifies the srcnameForFor “recvfrom”,“recvfrom”. If the src_addr, src_addrsrc_addrormsg. representsrepresentsmsg_name thethe srcname,srcname are NULL, whilewhile, the forsocketsfor “recvmsg”,“recvmsg”, are identical msg.msg_namemsg. msg_nameto connected specifiesspecifies sockets thetheand srcname.srcname taint propagation. If If the the src_addr src_addr procedureormsg. or msg.msg_name msg_name is the same are as NULL are with NULL,, the a connected sockets the sockets are sock identical areet. identicalTaking to connected the to “recvfrom”connected sockets socketsand(without taint and a propagation NULL taint propagation src_addr procedure argument) procedure is the for is sameexample, the same as withthe as withhandler a connected a connected is shown sock socket. inet. Figure Taking Taking 11. the the “recvfrom” “recvfrom” (without(without aa NULLNULL src_addr argument) for example, example, the the handler handler is is shown shown in in Figure Figure 11 11. . ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen); s{s ize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen); { int taint=Node *findtag(src_addr.sun_path); itnati ntta_ipnati=rN ondeew T*afiinntd t(abgu(f,s rtca_iandtd)r;.sun_path); ttaaiinntt_Mpaapi.ri nnseewrTta(innetw T(abiunft,) ;taint); } taintMap.insert(newTaint); } Figure 11. Taint operation for the “recvfrom” method. Figure 11.11. Taint operationoperation forfor thethe “recvfrom”“recvfrom” method.method. 4. Experiments 4. Experiments In our experiments, we first used a simple tool, Monkeyrunner, to generate random input to test 2300In our apps. experiments, However, webecause first usedMonkeyrunner a simple tool, might Monkeyrunner, not trigger an toapp generate’s malicious random behavior input, weto testfound 2300 that apps. the However,Echo.apk leaksbecause private Monkeyrunner data by following might not Case trigger 3. From an app a positive’s malicious aspect, behavior our results, we founddemonstrate that the that Echo.apk malware leaks developers private datado not by payfollowing much attentionCase 3. From to exploiting a positive Unix aspect, domain our resultssocket. deMoreover,monstrate we that used malware two proof developers-of-concept do not(PoC) pay apps much (one attention each for to Cases exploiting 1 and Unix2) to furtherdomain evaluate socket. Moreover,JDroid’s ability we used to two track proof information-of-concept flows(PoC) apps through (one Unixeach for domain Cases 1sockets. and 2) to Finally, further weevaluate used JDroidCaffeineMark’s ability to evaluate to track JDroid information’s overhead. flows through Unix domain sockets. Finally, we used CaffeineMarkThe experiments to evaluate were JDroid performed’s overhead. in a virtual machine with 4 GB memory running Ubuntu. The hostThe was experiments an Intel(R) Core(TM) were performed i7 running in a @virtual 2.6 GHZ machine with 16with GB 4 of GB RAM. memory running Ubuntu. The host was an Intel(R) Core(TM) i7 running @ 2.6 GHZ with 16 GB of RAM. 4.1. PoC of Case 1 in Information Leakage 4.1. PoC of Case 1 in Information Leakage In this PoC, the app uses Java code to accomplish the entire transmission process based on the SOCK_STREAMIn this PoC, thesocket app type.uses JavaThe Javacode codeto accomplish first obtains the theentire device transmission’s IMEI with process the taintbased tag on (i.e.,the SOCK_STREAM0x400) and transmits socket it type. to the The server Java fromcode first the client.obtains When the device the client’s IMEI begins with tothe implement taint tag (i.e., the 0x400)sending and behavior transmits, JDroid it to creates the server an entry from in the the Node client. list When that records the client the taint begins tag to and implement the client and the sendingserver pathnames. behavior, JDroid Before creates receiving an entrythe data, in the the Node server list invokes that records “getpeername the taint tag” and and “getsockname” the client and serverthrough pathnames. JNI which Before then looksreceiving for thethe correspondingdata, the server Node invokesas discussed “getpeername earlier” and. When “getsockname” the Node is throughfound, JDroid JNI which invokes then “ Taint.addTaintByteArraylooks for the corresponding” to Nodeadd theas discussedtaint to the earlier received. When data. the The Node main is found,functions JDroid in the invokes information “Taint.addTaintByteArray flow identified by JDroid” to add are shownthe taint in toFigure the received 12. Finally, data. the The received main functionsdata is sent in theto theinformation specified flowserver. identified JDroid notby JDroid only adds are showna taint intag Figure to the 12 received. Finally, data the received but also datatracks is thesent information to the specified flow userver.ntil it reaches JDroid thenot sink. only adds a taint tag to the received data but also tracks the information flow until it reaches the sink.

Information 2018, 9, 54 10 of 16

4. Experiments In our experiments, we first used a simple tool, Monkeyrunner, to generate random input to test 2300 apps. However, because Monkeyrunner might not trigger an app’s malicious behavior, we found that the Echo.apk leaks private data by following Case 3. From a positive aspect, our results demonstrate that malware developers do not pay much attention to exploiting Unix domain socket. Moreover, we used two proof-of-concept (PoC) apps (one each for Cases 1 and 2) to further evaluate JDroid’s ability to track information flows through Unix domain sockets. Finally, we used CaffeineMark to evaluate JDroid’s overhead. The experiments were performed in a virtual machine with 4 GB memory running Ubuntu. The host was an Intel(R) Core(TM) i7 running @ 2.6 GHZ with 16 GB of RAM.

4.1. PoC of Case 1 in Information Leakage In this PoC, the app uses Java code to accomplish the entire transmission process based on the SOCK_STREAM socket type. The Java code first obtains the device’s IMEI with the taint tag (i.e., 0x400) and transmits it to the server from the client. When the client begins to implement the sending behavior, JDroid creates an entry in the Node list that records the taint tag and the client and server pathnames. Before receiving the data, the server invokes “getpeername” and “getsockname” through JNI which then looks for the corresponding Node as discussed earlier. When the Node is found, JDroid invokes “Taint.addTaintByteArray” to add the taint to the received data. The main functions in the information flow identified by JDroid are shown in Figure 12. Finally, the received data is sent to the specified server. JDroid not only adds a taint tag to the received data but also tracks the information flow until it reachesInformation the 2018 sink., 9, x FOR PEER REVIEW 10 of 15

Java layer attach 0x400 to the received data

Java code Client LocalSocket Server Java code

Note: the data related attach the taint tag 0x400 create inquire to privacy data to IMEI has leaked Native layer srcname:/dev/socket/sender dstname:/dev/socket/receiver taint:0x400

FigureFigure 12.12.PoC PoC forfor CaseCase 1.1.

4.2.4.2. PoCPoC ofof CaseCase 22 inin InformationInformation LeakageLeakage InIn this this PoC, PoC, the the client client and and the the server server are are both both located located in in the the native native layer layer and and based based on on the the SOCK_DGRAMSOCK_DGRAM socket socket type. type. Similar Similar to to the the PoC PoC for for Case Case 1, 1, this this PoC PoC first first fetches fetches private private data data related related to theto the ICCID ICCID with with the taintthe taint tag 0x1000, tag 0x1000, which which is then is transmittedthen transmitte to thed nativeto the layernative through layer through JNI. JDroid JNI. tracesJDroid the traces information the information flow through flow through JNI and locatesJNI and the locates memory the atmemory 0x4a98c9d4 at 0x4a98c9d4 associated associated with the taintwith tagthe 0x1000.taint tag Then, 0x1000. the Then, client transmitsthe client thetransmits data based the data on the based memory on the at memory 0x4a98c9d4 at 0 tox4a98c9d4 the server to usingthe server the native using librarythe nativ calle library “sendto”. call “sendto”. By hooking By “sendto”,hooking “sendto”, JDroid constructs JDroid constructs a Node by a Node parsing by theparsing “sendto” the “sendto” parameters parameters (i.e., buf (i.e., and dest_addr)buf and dest_addr to populate) to populate the taint theand taint dstname and dstname and invokes and “getsockname”invokes “getsockname” to obtain to the obtain srcname. the srcname After the. After server the calls server “recvfrom” calls “recvfrom” to receive to thereceive data, the JDroid data, retrievesJDroid retrieves the srcname the srcname by parsing by parsing the fifth the parameter fifth parameter of “recvfrom” of “recvfrom” (i.e., src_addr), (i.e., src_addr and looks), and for looks the Nodefor the corresponding Node corresponding to srcname to and srcname its own pathnameand its own (i.e., pathname dstname) and (i.e., associates dstname the) and taint associates “0x1000” withthetaint the“ received0x1000” bufferwith the memory received at 0x4a98c93c.buffer memory Finally, at 0 thatx4a98c93c. data reaches Finally, “sendto”. that data Since reaches “sendto” “sendto is a”. sink,Since JDroid “sendto notices” is a that sink, the JDroid ICCID-related notices that data the has ICCID leaked-related after checking data has the leaked parameters, after checking as shown the in Figureparameters, 13. as shown in Figure 13.

Java layer Attach the taint tag (0x1000) to privacy data Java code

native layer JNI taint map taint map 0x1000 0x4a98c9d4 0x1000 0x4a98c93c POSIX Client LocalSocket Server Socket API create inquire Node srcname:/data/data/com.example.test/sender Note that the information dstname:/data/data/com.example.test/receiver related to ICCID has leaked. taint:0x1000

Figure 13. PoC for Case 2.

4.3. Echo JDroid discovered that Echo may send private data related to IMEI to the specified server. The client and the server both use the SOCK_STREAM socket type and the FILESYSTEM namespace. Figure 14, which is an example of Case 3, shows the major functions in the information flow identified by JDroid. First, Java code invokes an Android API (i.e., getDeviceId) to obtain the IMEI that will be sent by the client in the Java layer. This type of parameter is a byte array and its taint tag is “0x400”. Before sending, the instrumented code invokes “getsockname” and “getpeername” to

Information 2018, 9, x FOR PEER REVIEW 10 of 15

Java layer attach 0x400 to the received data

Java code Client LocalSocket Server Java code

Note: the data related attach the taint tag 0x400 create inquire to privacy data to IMEI has leaked Native layer srcname:/dev/socket/sender dstname:/dev/socket/receiver taint:0x400

Figure 12. PoC for Case 1.

4.2. PoC of Case 2 in Information Leakage In this PoC, the client and the server are both located in the native layer and based on the SOCK_DGRAM socket type. Similar to the PoC for Case 1, this PoC first fetches private data related to the ICCID with the taint tag 0x1000, which is then transmitted to the native layer through JNI. JDroid traces the information flow through JNI and locates the memory at 0x4a98c9d4 associated with the taint tag 0x1000. Then, the client transmits the data based on the memory at 0x4a98c9d4 to the server using the native library call “sendto”. By hooking “sendto”, JDroid constructs a Node by parsing the “sendto” parameters (i.e., buf and dest_addr) to populate the taint and dstname and invokes “getsockname” to obtain the srcname. After the server calls “recvfrom” to receive the data, JDroid retrieves the srcname by parsing the fifth parameter of “recvfrom” (i.e., src_addr), and looks for the Node corresponding to srcname and its own pathname (i.e., dstname) and associates thetaint“0x1000” with the received buffer memory at 0x4a98c93c. Finally, that data reaches “sendto”. InformationSince2018 “sendto, 9, 54 ” is a sink, JDroid notices that the ICCID-related data has leaked after checking the11 of 16 parameters, as shown in Figure 13.

Java layer Attach the taint tag (0x1000) to privacy data Java code

native layer JNI taint map taint map 0x1000 0x4a98c9d4 0x1000 0x4a98c93c POSIX Client LocalSocket Server Socket API create inquire Node srcname:/data/data/com.example.test/sender Note that the information dstname:/data/data/com.example.test/receiver related to ICCID has leaked. taint:0x1000

Figure 13. PoC for Case 2. Figure 13. PoC for Case 2. 4.3. Echo 4.3. Echo JDroid discovered that Echo may send private data related to IMEI to the specified server. The JDroidclient and discovered the server that both Echouse the may SOCK_STREAM send private socket data type related and tothe IMEI FILESYSTEM to the specified namespace. server. The clientFigure and 14, the which server is anboth example use the of SOCK_STREAM Case 3, shows the socket major type functions and the inFILESYSTEM the information namespace. flow Figureidentified 14, which by is JDroid. an example First, Java of Case code 3,invokes shows an the Android major functionsAPI (i.e., getDeviceId) in the information to obtain flow the identifiedIMEI by JDroid.that will First, be sent Java by code the client invokes in the an Java Android layer. APIThis type (i.e., of getDeviceId) parameter is toa byte obtain array the and IMEI its taint that tag will be sent byis “ the0x400 client”. Before in the sending, Java layer. the instrumented This type of parametercode invokes is “getsockname” a byte array and and its “getpeername” taint tag is “0x400”. to BeforeInformation sending, 2018 the, 9, x FOR instrumented PEER REVIEW code invokes “getsockname” and “getpeername” to obtain11 of 15 the srcname and dstname respectively. Based on this information, JDroid creates a new Node for the obtain the srcname and dstname respectively. Based on this information, JDroid creates a new Node serverfor inquiry. the server inquiry.

Java layer Attach the taint tag Java code 0x400 to privacy data

Client create L o c Node native layer a lS srcname:/data/data/com.njt.echo/files/dsf/ o c dstname:/data/data/com.njt.echo/files/file/ k e t taint:0x400 inquire POSIX native Server API taint map 0x4a98cb3c 0x400 Note that the information related to IMEI has leaked.

Figure 14. Echo.apk. Figure 14. Echo.apk. Before to the server in the native layer begins receiving the data, JDroid looks up the Beforecorresponding to the server Node in (by the searching native layer for srcname beginsreceiving and its own the pathname data, JDroid (i.e., looksdstname up) the. Next, corresponding JDroid Nodeinitializes (by searching the taint for tag srcname in the native and its layer own based pathname on the (i.e.,dstname).found Node. More Next, precisely, JDroid it initializes associates thethe taint tag intaint the tag native “0x400 layer” with based the onmemory the found location Node. at “0x More4a98cb3c precisely,” and a shadow it associates register. the When taint the tag data “0x400” with thereaches memory the sink location (i.e., “sendto”), at “0x4a98cb3c” JDroid warns and that a shadowdata concerning register. the When IMEI has the leaked. data reaches the sink (i.e., “sendto”), JDroid warns that data concerning the IMEI has leaked. 4.4. Compare with Other Analysis Systems 4.4. CompareWhen with tested Other with Analysis the two Systems PoC apps and Echo, TaintDroid and NDroid, which are advanced dynamic taint analysis systems currently in use, were unable to detect information leaks through WhenUnix domain tested sockets. with the two PoC apps and Echo, TaintDroid and NDroid, which are advanced dynamic taintVirusTotal analysis is a systems free service currently that includes in use, 55 were of the unable latest topopular detect analysis information systems leaks for analyzing through Unix domainsuspicious sockets. files and URLs and facilitates quick detection of viruses, worms, trojans, and all types of malware. We employed VirusTotal to analyze our sample apps. None of the included analysis systems detects the malicious behavior, including Qihoo-360, McAfee, and so on. Although the analysis methods of the 55 systems are varied and some include dynamic taint analysis, at a minimum, this result illustrates that we should emphasize this issue—and indicates that hackers may not yet have realized this vulnerability in the existing analysis systems. Aijiami is another popular service that provides risk assessment for Android apps. We used Aijiami to evaluate the risk of the three sample apps. The results shows that an uncertain risk may exist based on the requested permissions, which are sufficient for leaking private data (i.e., android.permission.INTERNET and android.permission.READ_PHONE_STATE). However, the score of the three sample apps were all above 60, which means that they are considered normal apps.

5. Performance CaffeineMark is dedicated to measuring the speed of running Java programs and computing the instructions executed per second and uses an internal scoring system for relative comparisons. We used CaffeineMark to evaluate the performance of applications on unmodified Android, and when using TaintDroid and JDroid. Figure 15 shows the execution time results of a Java microbenchmark. For the sieve microbenchmark, which is how the classic sieve of Eratosthenes finds prime numbers, there is no obvious differences among unmodified Android, TaintDroid, and JDroid. The loop scores achieved by TaintDroid and JDroid are significantly less than that of unmodified Android, which indicates the testing score uses sorting and sequence generation to measure compiler optimization of loops. This result most likely occurs because TaintDroid and JDroid hook

Information 2018, 9, 54 12 of 16

VirusTotal is a free service that includes 55 of the latest popular analysis systems for analyzing suspicious files and URLs and facilitates quick detection of viruses, worms, trojans, and all types of malware. We employed VirusTotal to analyze our sample apps. None of the included analysis systems detects the malicious behavior, including Qihoo-360, McAfee, and so on. Although the analysis methods of the 55 systems are varied and some include dynamic taint analysis, at a minimum, this result illustrates that we should emphasize this issue—and indicates that hackers may not yet have realized this vulnerability in the existing analysis systems. Aijiami is another popular service that provides risk assessment for Android apps. We used Aijiami to evaluate the risk of the three sample apps. The results shows that an uncertain risk may exist based on the requested permissions, which are sufficient for leaking private data (i.e., android.permission.INTERNET and android.permission.READ_PHONE_STATE). However, the score of the three sample apps were all above 60, which means that they are considered normal apps.

5. Performance CaffeineMark is dedicated to measuring the speed of running Java programs and computing the instructions executed per second and uses an internal scoring system for relative comparisons. We used CaffeineMark to evaluate the performance of applications on unmodified Android, and when using TaintDroid and JDroid. Figure 15 shows the execution time results of a Java microbenchmark. For the sieve microbenchmark, which is how the classic sieve of Eratosthenes finds prime numbers, there is no obvious differences among unmodified Android, TaintDroid, and JDroid. The loop scores achieved by TaintDroid and JDroid are significantly less than that of unmodified Android, which indicatesInformation 2018 the, testing9, x FOR scorePEER REVIEW uses sorting and sequence generation to measure compiler optimization12 of 15 of loops. This result most likely occurs because TaintDroid and JDroid hook the JNI-related methods,the JNI-related which aremethods, invoked which frequently; are invoked thus, they frequently; increase the thus, overhead they increase considerably. the overhead For the logicconsiderably. microbenchmark, For the logic testing microbenchmark, the speed with whichtesting the the virtual speed machine with which executes the decision-making virtual machine instructions,executes decision JDroid’s-making score was instructions, markedly JDroid below those’s score of unmodified was markedly Android below and those TaintDroid, of unmodified because JDroidAndroid must and perform TaintDroid, considerable because workJDroid when must data perform is transmitted considerable across work the differentwhen data layers. is transmitted Although JDroid’sacross the other different indicators layers. are also Although somewhat JDroid lower’s other than theindicators other two are systems also somewhat because of lower its monitoring than the ofother the two native systems layer, thisbecause level of of its overhead monitoring is acceptable of the native considering layer, this JDroid’s level of improved overhead tracking is acceptable of the informationconsidering flow JDroid according’s improved to the overalltracking microbenchmark of the information that indicates flow according a cumulative to score the overall across individualmicrobenchmark benchmarks. that indicates a cumulative score across individual benchmarks.

Figure 15. Microenchmark results. Figure 15. Microenchmark results.

6. Discussion and Conclusions SinceTaintDroid works only on three Android versions, JDroid based on TaintDroid is also restricted to those same three versions. Thus, future work involves improving JDroid’s compatibility, making it applicable to a larger variety of Android versions. In addition, most virtualization-based systems have a common feature: it is difficult to completely emulate the real hardware. For example, JDroid cannot monitor some important data sources (e.g., GPS). Hence, JDroid can generate fake information related to these important sources. Nevertheless, advanced malicious apps can exploit the differences between emulators and real machines to hide by avoiding the triggering of the malicious behavior. Similar to TaintDroid and NDroid, JDroid does not track control flows. Therefore, there are some anti-taint methods in mobile device environments that can circumvent taint tracking [45]. However, if JDroid were to fully support control-flow tracking, it would suffer from even higher overhead. To ensure information flow control capability, JDroid’s trusted computing base (TCB) is comprised of both the kernel and the managed runtime. Additionally, we realized the app with root privilege can modify trusted functions. Therefore, JDroid also incorporates the functions in RootGuard [46], which monitors system calls to detect the malicious behaviors. Furthermore, we will investigate some existing evadable methods to improve JDroid in future work. In this work, we conducted a systematic study to track information flows through Unix domain sockets in Android apps. We identified a set of scenarios in which private data can be leaked that are not caught by existing dynamic taint analysis systems. Based on these insights, we proposed and implemented JDroid, an efficient dynamic taint analysis system for tracking information flows through Unix domain sockets by solving several challenging issues. The evaluation using sample apps demonstrates that JDroid can effectively track information flows through Unix domain sockets and detect private data leakage associated with the use of such sockets.

Information 2018, 9, 54 13 of 16

6. Discussion and Conclusions SinceTaintDroid works only on three Android versions, JDroid based on TaintDroid is also restricted to those same three versions. Thus, future work involves improving JDroid’s compatibility, making it applicable to a larger variety of Android versions. In addition, most virtualization-based systems have a common feature: it is difficult to completely emulate the real hardware. For example, JDroid cannot monitor some important data sources (e.g., GPS). Hence, JDroid can generate fake information related to these important sources. Nevertheless, advanced malicious apps can exploit the differences between emulators and real machines to hide by avoiding the triggering of the malicious behavior. Similar to TaintDroid and NDroid, JDroid does not track control flows. Therefore, there are some anti-taint methods in mobile device environments that can circumvent taint tracking [45]. However, if JDroid were to fully support control-flow tracking, it would suffer from even higher overhead. To ensure information flow control capability, JDroid’s trusted computing base (TCB) is comprised of both the kernel and the managed runtime. Additionally, we realized the app with root privilege can modify trusted functions. Therefore, JDroid also incorporates the functions in RootGuard [46], which monitors system calls to detect the malicious behaviors. Further more, we will investigate some existing evadable methods to improve JDroid in future work. In this work, we conducted a systematic study to track information flows through Unix domain sockets in Android apps. We identified a set of scenarios in which private data can be leaked that are not caught by existing dynamic taint analysis systems. Based on these insights, we proposed and implemented JDroid, an efficient dynamic taint analysis system for tracking information flows through Unix domain sockets by solving several challenging issues. The evaluation using sample apps demonstrates that JDroid can effectively track information flows through Unix domain sockets and detect private data leakage associated with the use of such sockets.

Acknowledgments: This work was supported in part by the National Natural Science Foundation of China (No. 61672433), the State Cryptography Foundation of China (MMJJ20170210), and the Science and Technology Foundation of Shenzhen China (201703063000517). The funds that we received cover the costs to publish in open access. Author Contributions: Xu Jiang is the corresponding author of the paper. Xu Jiang is responsible for all aspects of the paper. Dejun Mu performed part of the experiment. Huixiang Zhang also performed part of the experiment. All authors read and approved the final manuscript. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Zhou, Y.; Wang, Z.; Zhou, W.; Jiang, X. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In Proceedings of the 19th Annual Network and Distributed System Security Symposium, San Diego, CA, USA, 7 February 2012; pp. 50–52. 2. Tam, K.; Feizollah, A.; Anuar, N.B.; Salleh, R.; Cavallaro, L. The Evolution of Android Malware and Android Analysis Techniques. ACM Comput. Surv. 2017, 49, 76. [CrossRef] 3. Mariconti, E.; Onwuzurike, L.; Andriotis, P.; De Cristofaro, E.; Ross, G.; Stringhini, G. MamaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models. In Proceedings of the NDSS ‘17: Network and Distributed Systems Security Symposium, San Diego, CA, USA, 26 February–1 March 2017. 4. Tan, J.; Drolia, U.; Martins, R.; Gandhi, R.; Narasimhan, P. Chips: Content-Based heuristics for improving photo privacy for smartphones. In Proceedings of the 7th ACM Conference on Security and Privacy in Wireless & Mobile Networks, London, UK, 23–25 July 2014; pp. 213–218. 5. Gordon, M.I.; Kim, D.; Perkins, J.; Rinard, M. Information-Flow Analysis of Android Applications in DroidSafe. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 8 February 2015. Information 2018, 9, 54 14 of 16

6. Bugiel, S.; Davi, L.; Dmitrienko, A.; Fischer, T.; Sadeghi, A.R. Xmandroid: A New Android Evolution to Mitigate Attacks; Technical Report; Technische Universität Darmstadt: Darmstadt, Germany, 2011; p. 4. 7. Bugiel, S.; Davi, L.; Dmitrienko, A.; Fischer, T.; Sadeghi, A.-R.; Shastry, B. Towards taming privilege-escalation attacks on android. In Proceedings of the 19th Network and Distributed System Security Symposium, San Diego, CA, USA, 7 February 2012; p. 19. 8. Dietz, M.; Shekhar, S.; Pisetsky, Y.; Shu, A.; Wallach, D.S. Quire: Lightweight provenance for smart phone operating systems. In Proceedings of the 20th USENIX Conference on Security, San Francisco, CA, USA, 8–12 August 2011; p. 23. 9. Felt, A.P.; Wang, H.J.; Moshchuk, A.; Hanna, S.; Chin, E. Permission re-delegation: Attacks and defenses. In Proceedings of the USENIX Security Symposium, San Francisco, CA, USA, 8–12 August 2011. 10. Fragkaki, E.; Bauer, L.; Jia, L.; Swasey, D. Modeling and enhancing androids permission system. In Proceedings of the 17th European Symposium on Research in , Pisa, Italy, 10–12 September 2012; pp. 1–18. 11. Seo, J.; Kim, D.; Cho, D.; Shin, I.; Kim, T. FLEXDROID: Enforcing in-app in Android. In Proceedings of the 23rd Network and Distributed System Security Symposium, San Diego, CA, USA, 21–24 February 2016; pp. 21–24. 12. Chin, E.; Felt, A.P.; Greenwood, K.; Wagner, D. Analyzing inter-application communication in Android. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, Bethesda, MD, USA, 28 June–1 July 2011; pp. 239–252. 13. Davi, L.; Dmitrienko, A.; Sadeghi, A.R.; Winandy, M. Privilege escalation attacks on android. In Proceedings of the 2010 IEEE International Conference on Information Security, Boca Raton, FL, USA, 25–28 October 2010; pp. 346–360. 14. Grace, M.C.; Zhou, Y.; Wang, Z.; Jiang, X. Systematic detection of capability leaks in stock android smartphones. In Proceedings of the 19th Network and Distributed System Security Symposium, San Diego, CA, USA, 7 February 2012; p. 19. 15. Cinar, O.; Grant, A. Pro Android C++ with the NDK; Apress: New York, NY, USA, 2012. 16. Ghafari, M.; Gadient, P.; Nierstrasz, O. Security Smells in Android. In Proceedings of the IEEE International Working Conference on Source Code Analysis and Manipulation, Shanghai, China, 17–18 September 2017; pp. 121–130. 17. Li, Y.; Hao, Q.; Ma, X.; Zhang, P.; Fan, Y.; Jiang, J. Design and implementation of Android-based speech storage system. In Proceedings of the Information Technology. Networking, Electronic and Automation Control Conference, Chongqing, China, 20–22 May 2016; pp. 178–181. 18. Shao, Y.; Ott, J.; Jia, Y.J.; Qian, Z.; Mao, Z.M. The Misuse of Android Unix Domain Sockets and Security Implications. In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 80–91. 19. Enck, W.; Gilbert, P.; Han, S.; Tendulkar, V.; Chun, B.G.; Cox, L.P.; Sheth, A.N. TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones. In Proceedings of the ACM Transactions on Computer Systems, Vancouver, BC, Canada, 4–6 October 2010; Volume 22. 20. Qian, C.; Luo, X.; Shao, Y.; Chan, A.T. On tracking information flows through JNI in android applications. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, USA, 23–26 June 2014; pp. 180–191. 21. Backes, M.; Bugiel, S.; Gerling, S.; von Styp-Rekowsky, P. Android security framework: Extensible multi-layered access control on Android. In Proceedings of the Computer Security Applications Conference, New Orleans, LA, USA, 8–12 December 2014; ACM: New York, NY, USA, 2014; pp. 46–55. 22. Allix, K.; Bissyandé, T.F.; Klein, J.; Le Traon, Y. AndroZoo: Collecting Millions of Android Apps for the Research Community. In Proceedings of the 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories, Austin, TX, USA, 14–15 May 2016; pp. 468–471. 23. Gu, Y.; Sun, K.; Su, P.; Li, Q.; Lu, Y.; Ying, L.; Feng, D. JGRE: An Analysis of JNI Global Reference Exhaustion Vulnerabilities in Android. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, Denver, CO, USA, 26–29 June 2017; pp. 427–438. 24. Sawada, Y.; Arai, Y.; Ootsu, K.; Yokota, T.; Ohkawa, T. Performance of Android Cluster System Allowing Dynamic Node Reconfiguration. Wirel. Pers. Commun. Int. J. 2017, 93, 1067–1087. [CrossRef] Information 2018, 9, 54 15 of 16

25. Demertzis, K.; Iliadis, L. Computational intelligence anti-malware framework for android OS. Vietnam J. Comput. Sci. 2017, 4, 1–15. [CrossRef] 26. Hofmann, M.; Rünger, G. Sustainability through flexibility: Building complex simulation programs for distributed computing systems. Simul. Model. Pract. Theory 2015, 58, 65–78. [CrossRef] 27. Fox, R.; Kasten, E.; Orji, K.; Bolen, C.; Maurice, C.; Venema, J. Real-time results without real-time systems. IEEE Trans.Nucl. Sci. 2015, 51, 571–575. [CrossRef] 28. Grace, M.; Zhou, Y.; Zhang, Q.; Zou, S.; Jiang, X. Riskranker: Scalable and accurate zero-day android malware detection. In Proceedings of the 10th International Conference on Mobile Systems, Applications and Services, Low Wood Bay, Lake District, UK, 25–29 June 2012; pp. 281–294. 29. Byun, S.S. TCP over scarce transmission opportunity in cognitive radio networks. Comput. Netw. 2016, 103, 101–114. [CrossRef] 30. Lin, C.C.; Li, H.; Zhou, X.; Wang, F. Screenmilker: How to Milk Your Android Screen for Secrets. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 24 February 2014. 31. Rafnsson, W. Tracking Information Flows in Interactive and Object-Oriented Programs; Chalmers University of Technology: Göteborg, Sweden, 2012. 32. Gries, S.; Hesenius, M.; Gruhn, V. Tracking Information Flow in Cyber-Physical Systems. In Proceedings of the IEEE International Conference on Distributed Computing Systems, Atlanta, GA, USA, 5–8 June 2017; pp. 2589–2590. 33. Lantz, P.; Desnos, A.; Yang, K. DroidBox: An Android Application Sandbox for Dynamic Analysis. Available online: https://code.google.com/p/droidbox (accessed on 3 March 2018). 34. Hornyack, P.; Han, S.; Jung, J.; Schechter, S.; Wetherall, D. These aren’t the Droids you’re looking for: Retrofitting android to protect data from imperious applications. In Proceedings of the 18th ACM Conference on Computer and Communication Security, Chicago, IL, USA, 17–21 October 2011; pp. 639–652. 35. Egele, M.; Scholte, T.; Kirda, E.; Kruegel, C. A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 2012, 44.[CrossRef] 36. Fratantonio, Y.; Bianchi, A.; Robertson, W.; Kirda, E.; Kruegel, C.; Vigna, G. Triggerscope: Towards detecting logic bombs in android applications. In Proceedings of the Security and Privacy Symposium, San Jose, CA, USA, 22–26 May 2016; pp. 377–396. 37. Zhou, Y.; Jiang, X. Dissecting android malware: Characterization and evolution. In Proceedings of the 33rd IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 20–23 May 2012; pp. 95–109. 38. Portokalidis, G.; Homburg, P.; Anagnostakis, K.; Bos, H. Paranoid android: Versatile protection for smartphones. In Proceedings of the 2010 Annual Computer Security Applications Conference, Austin, TX, USA, 6–10 December 2010; pp. 347–356. 39. Burguera, I.; Zurutuza, U.; Nadjm-Tehrani, S. Crowdroid: Behavior-Based malware detection system for android. In Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, New York, NY, USA, 17 October 2011; pp. 15–26. 40. Spreitzenbarth, M.; Echtler, F.; Hoffmann, J. Mobilesandbox: Having a deeper look into android applications. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, New York, NY, USA, 18–22 March 2013; pp. 1808–1815. 41. Tam, K.; Khan, S.J.; Fattori, A.; Cavallaro, L. CopperDroid: Automatic Reconstruction of Android Malware Behaviors. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 8–11 February 2015. 42. Yan, L.K.; Yin, H. Droidscope: Seamlessly reconstructing OS and Dalvik semantic views for dynamic Android malware analysis. In Proceedings of the 21st USENIX Security Symposium, Bellevue, WA, USA, 8–10 August 2012; pp. 569–584. 43. Johnson, R.; Wang, Z.; Gagnon, C.; Stavrou, A. Analysis of Android Applications’ Permissions. In Proceedings of the IEEE Sixth International Conference on Software Security and Reliability Companion, Gaithersburg, MD, USA, 20–22 June 2012; pp. 45–46. Information 2018, 9, 54 16 of 16

44. Ghandour, W.J.; Akkary, H.; Masri, W. The potential of using dynamic information flow analysis in data value prediction. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, 11–15 September 2010; pp. 431–442. 45. Sarwar, G.; Mehani, O.; Boreli, R.; Kaafar, M.A. On the effectiveness of dynamic taint analysis for protecting against private information leaks on android-based devices. In Proceedings of the 10th International Conference on Security and Cryptography, Reykjavík, Iceland, 29–31 July 2013; pp. 461–468. 46. Shao, Y.; Luo, X.; Qian, C. RootGuard: Protecting Rooted Android Phones. Computer 2014, 47, 32–40. [CrossRef]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).