Android App Malware Detection

Thesis submitted in partial fulfillment of the requirements for the degree of

MS by Research in Computer Science and Engineering

by

Vijayendra Grampurohit 201207682 [email protected]

International Institute of Information Technology Hyderabad - 500 032, INDIA July 2016 Copyright c Vijayendra Grampurohit, 2016 All Rights Reserved International Institute of Information Technology Hyderabad, India

CERTIFICATE

It is certified that the work contained in this thesis, titled “Android App Malware Detection” by Vijayendra Grampurohit, has been carried out under my supervision and is not submitted elsewhere for a degree.

Date Adviser: Prof. Shatrunjay Rawat “Contemplate and reflect upon knowledge, and you will become a benefactor to others.” To my parents Acknowledgments

I would like to express my gratitude to my advisers, Prof. Shatrunjay Rawat and Dr. Sanjay Rawat. It was great to have this combination guiding my research as I had best of both. Shatrunjay sir helped me staying focused on problems and provided new directions to approach them, while Sanjay Sir will explain the concepts at abstract level, opening up the mind to apply the same techniques on various sim- ilar problems. Working with them, I have developed my problem solving skills, learnt about research, about life in general. I would be thankful to them for all my life, for providing me the guidance on various matters, always motivating and boosting me with confidence, which really helped me in shaping my life. I want to take this opportunity to thank Dr. Ashok Kumar Das for not just teaching me courses like Principles of Information Security and System and Network Security, but also for providing me a wonderful opportunity to be TA for Principles of Information Security which gave me even more insights into the fundamentals of security than when I was just a student in the course. Also, he gave me some insights into the ongoing research in wireless sensor networks, which stirred an immense interest in me. I must also thank all my lab-mates who are working or have worked earlier in CSTAR -Bhargav, Dharma Teja, Jatin,Ishan, Lokesh, Shubhum. I really enjoyed working with them in the lab. I would like to thank all my friends in IIIT, for making my stay in the campus a memorable one. I cannot forget the fun-filled discussions we had in yuktahaar mess and at the juice corner. During my stay at IIIT, I also got chance to have a look on the famous book by Donald Knuth, “The Art of Computer Programming”, reading the preface of this book removed all my fears while doing coding, as he clearly wrote “These machines have no common sense; they do exactly what they are told, no more and no less”. In the end, I would like to thank all my family members, my parents for innumerable reasons, my brother Vinay for unconditional love and affection.

vi Abstract

Expecting a shipment of 1 billion Android devices in 2017, cyber criminals have naturally extended their vicious activities towards Google’s mobile operating system: threat researchers are reporting an alarming increase of detected Android malware from 2014 to 2015. In order to have some control over the estimated 700 new Android applications that are being released every day, there is need for a form of automated analysis to quickly detect and isolate new malware instances. Android is an open source Linux-based mobile operating system distributed by Google. According to the latest statistics, android powers hundreds of thousands mobile devices over 190 countries [26]. [51] is the official android centralized market place maintained by Google, where any independent application developer can submit his/her android app and make it available to the users. The growing popularity of this android ecosystem also is becoming a worthy target for security and privacy violations. Highly sensitive and confidential information such as text messages, private and business contacts, calendar data, etc may be leaked through an application. Sensors such as GPS present in the phones allow applications to provide context-sensitive user experience, they also create additional privacy concerns it can exploit the data for tracking or monitoring. Apart from these issues, smart phones are also susceptible to various malware threats such as viruses, Trojan horses, worms, etc. [50]. Android security model relies highly on permission-based mechanism. There are about 130 per- missions that govern access to different resources. Whenever an user tries to install a new application, he/she is prompted to approve or reject all the permissions requested by the application. The application will be installed only after the user accepts all the necessary permissions requested by it. In this work, we use the permissions and api level information from the apps as the features to detect malicious applications. Further we observe that, android store [51] defines a category for every published application. We have done extensive studies and discovered that, certain categories are highly prone to malicious acts compared to other categories. We explicitly incorporate this information in our model and learn a naive bayes classifier for each category using the features that encode information about permissions and api calls. Given a new test application with a known category, we apply an appropriate classifier to detect if the application is malicious. We created a large data set of android applications and achieve an improvement of 3 4% by incorporating category level information. Secondly, we combine the association rule mining and classification rule mining techniques to build a classifier. The integration is done by focusing on mining a special subset of association rules, called class association rules (CARs). To select the best features that distinguish between malware from benign

vii viii apps, we rely on API level information within the bytecode since it conveys substantial semantics about the apps behaviour. More specifically, we focus on critical API calls and their package level information. Rather than simply treating the individual api calls as items, we represent an item as a combination of caller and callee api. We capture one level of control flow and context between caller and callee. Each item in our model is of the form A%B, where A is the caller and B is the callee. We use Androguard [8], a reverse engineering tool to perform API level feature extraction and data flow analysis. In summary,

combining association rule mining and classification rule mining for Android malware detection. • We achieved a detection rate of 85% over the baseline classifier of 0.69%. • Contents

Chapter Page

1 Introduction ...... 1 1.1 Motivation ...... 1 1.2 Problem Statement ...... 2 1.3 Contributions ...... 2 1.4 Outline ...... 3

2 Background Information ...... 5 2.1 Android System Architecture ...... 5 2.1.1 Linux kernel ...... 5 2.1.2 Libraries ...... 7 2.1.3 Android runtime ...... 7 2.1.4 Application framework ...... 7 2.1.5 Applications ...... 8 2.2 Dalvik Virtual Machine ...... 8 2.2.1 Hardware constraints ...... 8 2.2.2 Bytecode ...... 9 2.3 Apps ...... 9 2.3.1 Application components ...... 11 2.3.2 Manifest ...... 12 2.3.3 Native code ...... 12 2.3.4 Distribution ...... 13 2.4 Malware ...... 13 2.4.1 Types of malware ...... 13 2.4.2 Malware distribution ...... 14 2.4.3 Malware data sets ...... 15 2.5 Summary ...... 15

3 Related Work ...... 16

4 Android Malware Detection Using Permissions and Api calls ...... 19 4.1 Android Application Structure ...... 19 4.1.1 Android Security Approach ...... 21 4.1.2 Android Permission Setting ...... 22 4.2 Static analysis ...... 23 4.2.1 Androguard ...... 23

ix x CONTENTS

4.2.2 APK Tool ...... 24 4.2.3 Monkey and Monkeyrunner ...... 24 4.2.4 AndroViewClient ...... 24 4.3 Dynamic Analysis ...... 25 4.3.1 Droidbox ...... 25 4.3.2 Taintdroid ...... 25 4.4 Reverse engineering Android App ...