Webcapsule: Towards a Lightweight Forensic Engine for Web Browsers
Total Page:16
File Type:pdf, Size:1020Kb
WebCapsule: Towards a Lightweight Forensic Engine for Web Browsers Christopher Neasbitty, Bo Liy, Roberto Perdisciyo, Long Luz, Kapil Singh◦, and Kang Liy yDepartment of Computer Science, University of Georgia oCollege of Computing, Georgia Tech zDepartment of Computer Science, Stony Brook University ◦IBM Research {cjneasbi,lubao515}@uga.edu, [email protected], [email protected] [email protected], [email protected] ABSTRACT Categories and Subject Descriptors Performing detailed forensic analysis of real-world web security C.2.0 [Computer-Communication Networks]: General—Secu- incidents targeting users, such as social engineering and phishing rity and protection attacks, is a notoriously challenging and time-consuming task. To reconstruct web-based attacks, forensic analysts typically rely on General Terms browser cache files and system logs. However, cache files and logs provide only sparse information often lacking adequate detail to Security; Forensics reconstruct a precise view of the incident. To address this problem, we need an always-on and lightweight Keywords (i.e., low overhead) forensic data collection system that can be eas- ily integrated with a variety of popular browsers, and that allows Forensic Engine; Web Security; Browsing Replay for recording enough detailed information to enable a full recon- struction of web security incidents, including phishing attacks. 1. INTRODUCTION To this end, we propose WebCapsule, a novel record and replay The ability to perform accurate forensic analysis of web-based forensic engine for web browsers. WebCapsule functions as an security incidents is critical, as it allows security researchers to bet- always-on system that aims to record all non-deterministic inputs ter understand past incidents and develop stronger defenses against to the core web rendering engine embedded in popular browsers, future attacks. Unfortunately, analyzing real-world web attacks that including all user interactions with the rendered web content, web directly target users, such as social engineering and phishing at- traffic, and non-deterministic signals and events received from the tacks, remains an extremely challenging and time-consuming task. runtime environment. At the same time, WebCapsule aims to be The state-of-the-art methods for reconstructing web-based inci- lightweight and introduce low overhead. In addition, given a previ- dents generally follow two approaches. The first approach relies ously recorded trace, WebCapsule allows a forensic analyst to fully on analyzing the web browser’s history, cache files, and system replay and analyze past web browsing sessions in a controlled iso- logs [16, 20]. However, cache files and logs provide only sparse lated environment. information often lacking adequate detail to reconstruct a precise We design WebCapsule to also be portable, so that it can be view of what happened during social engineering and phishing at- integrated with minimal or no changes into a variety of popular tacks that may have occurred days in the past. The second approach web-rendering applications and platforms. To achieve this goal, leverages access to full network packet traces, which may provide we build WebCapsule as a self-contained instrumented version of some indications of how an incident unfolded. However, the com- Google’s Blink rendering engine and its tightly coupled V8 Java- plexity of modern web pages results in a large semantic gap be- Script engine. tween the web traffic and the detailed events (e.g., page render- We evaluate WebCapsule on numerous real-world phishing at- ing, mouse movements, key presses, etc.) that occurred within the tack instances, and demonstrate that such attacks can be recorded browser [18]. Such semantic gaps make it very difficult to precisely and fully replayed. In addition, we show that WebCapsule can reconstruct what a victim actually saw and how she was tricked, record complex browsing sessions on popular websites and differ- and to identify what information was consequently leaked. ent platforms (e.g., Linux and Android) while imposing reasonable To address this problem, we need a forensic data collection sys- overhead, thus making always-on recording practical. tem that satisfies the following requirements: - be always-on, so that all (unexpected) incidents can be trans- Permission to make digital or hard copies of all or part of this work for personal or parently recorded, including new attacks that follow previ- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- ously unknown patterns; tion on the first page. Copyrights for components of this work owned by others than - be lightweight, to minimize performance overhead, thus mak- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- ing always-on recording practical; publish, to post on servers or to redistribute to lists, requires prior specific permission - be portable, to operate in a variety of web-rendering applica- and/or a fee. Request permissions from [email protected]. tions and platforms; CCS’15, October 12–16, 2015, Denver, Colorado, USA. - provide critical information to greatly enhance and facilitate c 2015 ACM. ISBN 978-1-4503-3832-5/15/10 ...$15.00. DOI: http://dx.doi.org/10.1145/2810103.2813656. a forensic analyst’s investigation of web security incidents, with particular focus on attacks that directly target users, Runtime Environment (User's Device) Isolated Analysis Environment such as social engineering and phishing attacks. Browser Replay Browser In this paper we propose WebCapsule, a novel record and replay forensic engine for web browsers. WebCapsule lays the founda- User Input Platform tions for web-based attack reconstruction and analysis while meet- WebTraffic Web Rendering Web Rendering shim Engine shim Engine ing all of the above stated requirements. Our main goal is to enable System Calls an always-on, transparent, and fine-grained recording (and subse- quent replay) of potentially harmful web browsing sessions. As de- picted in Figure 1, WebCapsule aims to record all non-deterministic Record Storage Replay inputs to the core web rendering engine embedded in the browser, Remote Web Server including all user interactions with the rendered web content, web traffic, and non-deterministic signals and events received from the runtime environment. Figure 1: High-level overview of WebCapsule’s record and replay WebCapsule allows an analyst to later replay previously recorded capabilities. Non-deterministic inputs to the embedded web ren- browsing sessions in a separate controlled environment, where no dering engine are recorded, and can be fully replayed in an isolated new external user inputs or network transactions are needed. This forensic analysis environment where no new external user inputs or enables detailed analysis of security incidents that are (obviously) network transactions are received. unexpected, and allows for reconstructing detailed information about incidents that may follow new, never-before-seen attack pat- terns. In addition, by replaying all non-deterministic inputs, in- Web-rendering App cluding all content provided by the server, WebCapsule enables Web-rendering API a full forensic investigation of incidents involving ephemeral web Instrumentation Shim content, such as short-lived phishing or social engineering attack WebCapsule pages. V8 JavaScript While some previous work has studied record and replay to as- Blink Engine Rendering Engine sist the debugging of web applications [2,5,23], these studies do not focus on forensic analysis and, more importantly, do not satisfy the associated requirements listed above. For example, TimeLapse [5] Instrumentation Shim is a debugging tool based on Apple’s WebKit [30] that allows for OS / Platform API recording and replaying web content. However, TimeLapse does Platform not work as an “always on” system. Also, TimeLapse does not al- low for transparent recording because it deeply modifies the inter- nals of WebKit, for example to force a synchronous scheduling of Figure 2: Overview of WebCapsule’s instrumentation shims. threads such as the HTML parser thread [25], thus also impacting performance. In addition, TimeLapse currently only works on Ma- pled V8 JavaScript engine [28] (see Figure 2) in a way that allows cOS+Safari+WebKit [26], and is not easily portable to other oper- us to inherit Blink’s portability. ating systems and browsers. Conversely, WebCapsule can function WebCapsule’s portability has several advantages. Not only can as an always-on system (e.g., it can be configured to start record- it be readily deployed into existing Blink-based browsers and mul- ing at browser startup with no user intervention) to continuously tiple platforms, but it also allows us to fully replay the browsing and transparently record browsing sessions while introducing low traces on a device (or virtual machine) whose platform may differ overhead. Furthermore, WebCapsule is highly portable, can be em- from the platform where the traces were recorded. bedded in a variety of web-rendering applications, and can run on At the same time, our design choice of “living” strictly inside a variety of platforms. Furthermore, unlike [2, 23], WebCapsule Blink imposes a number of constraints that make the instrumenta- is not limited to only recording user interactions with web pages, tion process challenging, especially for enabling the replay of com- but instead aims to