Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.DOI Large-scale Mobile App Identification Using Deep Learning SHAHBAZ REZAEI1, (Member, IEEE), BRYCE KROENCKE2, AND XIN LIU.3, (Fellow, IEEE) 1Computer Science Department, University of California, Davis, CA, USA (e-mail:
[email protected]) 2Computer Science Department, University of California, Davis, CA, USA (e-mail:
[email protected]) 3Computer Science Department, University of California, Davis, CA, USA (e-mail:
[email protected]) Corresponding author: Shahbaz Rezaei (e-mail:
[email protected]). ABSTRACT Many network services and tools (e.g. network monitors, malware-detection systems, routing and billing policy enforcement modules in ISPs) depend on identifying the type of traffic that passes through the network. With the widespread use of mobile devices, the vast diversity of mobile apps, and the massive adoption of encryption protocols (such as TLS), large-scale encrypted traffic classification becomes increasingly difficult. In this paper, we propose a deep learning model for mobile app identification that works even with encrypted traffic. The proposed model only needs the payload of the first few packets for classification, and, hence, it is suitable even for applications that rely on early prediction, such as routing and QoS provisioning. The deep model achieves between 84% to 98% accuracy for the identification of 80 popular apps. We also perform occlusion analysis to bring insight into what data is leaked from SSL/TLS protocol that allows accurate app identification. Moreover, our traffic analysis shows that many apps generate not only app-specific traffic, but also numerous ambiguous flows.